New data & infrastructure: the Infinite Pipeline

4/5/2019

This week I used my new data, 70,000 (from the 300,0000) articles from Instructables to play with ELMO, TF-IDF and perform a Cosine Similarity. The results were much better! While making my data "trainable" I read a lot of articles that explain how cleaning and massaging the data can be 80% of the work in ML. This was a great lesson.

Overall process:

Understand the problem (see previous post)
Decide what data is needed (see previous post)
Research what data is actually available (see previous post)
Get the data (see previous post)
Understand the data
Select labels
Clean the data (separate or remove urls, remove non-useful signs, NaNs, etc)
Preprocesses it (convert it to the right format, in this case)
Design a dataset for it
Load it and store it

I used tf.transform, tf.record and tf.example for the last three steps (preprocessing , design a dataset, load it and store it). I picked tf.transform and tf.record (tf.example is part of tf.record), because I wanted to learn something that was easily scalable, could integrate with my colab notebook, allow for parallel processing and monitoring.

1 Comment

vidmate.onl link

10/29/2023 10:08:07 am

I wanted to express my gratitude for your insightful and engaging article. Your writing is clear and easy to follow, and I appreciated the way you presented your ideas in a thoughtful and organized manner. Your analysis was both thought-provoking and well-researched, and I enjoyed the real-life examples you used to illustrate your points. Your article has provided me with a fresh perspective on the subject matter and has inspired me to think more deeply about this topic.

New data & infrastructure: the Infinite Pipeline

Leave a Reply.

AI without CS or Math AI without CS or Math

Human and AI learning

Education and AI Education and AI

New data & infrastructure: the Infinite Pipeline

Leave a Reply.

AI without CS or MathAI without CS or Math

Human and AI learning

Education and AIEducation and AI

AI without CS or Math AI without CS or Math

Education and AI Education and AI