This was an exciting week. We were finally able to remove Load.cpp
. There were some header
related issues that we had to handle but nothing tough apart from that just remove it from
CMakelists.txt
and we are good to go.
Talking about the failing cases, it seems that our new fucntion signature where we replaced
eT
with MatType
was causing this issue. The main issue was that compiler was not able to
understand the difference between the signature of Load
for dense matrix and sparse matrix.
For now we commented the code related to sparse matrix and decided that we will handle that
later.
We also worked on adding some string-related algorithms to mlpack. We created a new file
string_algorithms.hpp
and added the same to CMakeList.txt
. For start the file contains
the string related fucntions which mlpack was utilizing from boost
. List of functions
- trim
- trim_if
Replacing these two fucntionalities marks the end of boost::spirit
in mlpack. We were able
to remove boost
from the csv parse
and we will remove other fucntion soon.
Discussion is still goinging on about how to perfectly parse the categorical data. Specially
data with embedded delim inside the string, ex 1, 2, "sting, with comma", 4
.
Implementing the categorical parsing part is fun, there are soo many things we can do but currently we are sticking with implementing only the features that we had earlier and currate a list for future improvements.
I know it feels a bit bad to see that there is scope for improvement but not exploring it but Omar made me realize that this is part of software development process. The project is time constrained and first we need to finish and merge the parse and then we can think about improvements, which really seems logical and systematic.
Also in open-source maybe community maybe someone can come-up with a better solution so
we should always give some time and discussion on what we are planning to accomplish.
Mlpack is great, IRC
is soo active and it’s alway some discussion that you can learn
from.