Experience: Week Four

This marks the end of first month on GSoC 2021. As planned we were able to adapt armadillo’s parser. Most of the work on the parser is done. There is still scope for some design improvements which we will surely address later.

Earlier we created new file for the parser but later we decided it makes more sense to keep everything in load_csv.hpp. So now load_csv.hpp contains the parser and we removed csv_parser.hpp.

It was a great a month of learning, although I believe that I could have done soo much more than what I was able to do. I will try to be more regular and dedicated towards the project from now on. Once again I would like to thank my mentor for being very patient and supportive.

Now that we have our numeric parser ready I think it’s time to start working on adapting it for categorical data.

In mlpack we have a class called DatasetMapper which holds information about a dataset. This is useful when the dataset contains categorical non-numeric features that needs to be mapped to categorical numeric features.

Our goal for the coming days is to integrate the DatasetMapper with the new parser, so that we can load categorical non-numeric data as well.

Published Jul 4, 2021

Computer Engineer | Open Source Developer | Machine Learning | FinanceGopi Tatiraju on Twitter