This marks the end of first month on GSoC 2021. As planned we were able to adapt armadillo’s parser. Most of the work on the parser is done. There is still scope for some design improvements which we will surely address later.
Earlier we created new file for the parser but later we decided it makes more sense to keep everything
in load_csv.hpp
. So now load_csv.hpp
contains the parser and we removed csv_parser.hpp
.
It was a great a month of learning, although I believe that I could have done soo much more than what I was able to do. I will try to be more regular and dedicated towards the project from now on. Once again I would like to thank my mentor for being very patient and supportive.
Now that we have our numeric parser ready I think it’s time to start working on adapting it for categorical data.
In mlpack we have a class called DatasetMapper
which holds information about a dataset. This is useful
when the dataset contains categorical non-numeric features that needs to be mapped to categorical numeric
features.
Our goal for the coming days is to integrate the DatasetMapper
with the new parser, so that we can load
categorical non-numeric data as well.