Last week I created a bag-of-words as a feasibility exploration, this week I wanted to train a word-to-vect, but before I started it I realized a I had a bug on my bag-of-words and that's why the results were good. After fixing the bug the results weren't that great.
I decided to try another experiment to understand better why my results were not good. So I created a sentence embedding of 50 dimensions, the results weren't that good either so I tried a sentence embedding with 180 dimensions. Still not that great. I plotted some of the results to understand better the problem.
The graph on the left is the validation set precision per epoch. The graph on the right is the validation set recall per epoch.
Reading several articles I realized that my project-category labels were not useful. Before I planned how to processed next I decided to do another experiment, this time of picture comparison. This will tell me how feasible could be to extract the information I need from the users' pictures.
Using inception_v3 I created embeddings per picture, concatenated them into a vector and used that to compare pictures.
I need better data. So I decided to do some scrapping and get it. While my scripts are running I'm doing more data analysis to figure out what's the best way to classify the data given the information I can extract and the labels I can create.