To the memory of Paul Doherty, a sea of inquiry and inspiration.
Thanks a lot to OpenAI for this opportunity and especially to my mentor Kai Arulkumaran for his guidance Project Overview
CREATURE is a Machine Learning (ML) project to help find online activities that support active learning. This post explains the origins of CREATURE, how the model was developed and tested on samples of online projects, and how this model might be extended to support educators who want to share excellent projects with students. Project-Based-Learning (PBL) is an effective and enjoyable way to learn, but teachers often struggle to find appropriate projects for their students. Despite thousands of project descriptions existing online, most are poorly labeled, poorly curated, and thus difficult for teachers to find. Accurately labeling thousands of online projects would be daunting and expensive on a case-by-case basis, so I trained a model of Machine Learning called CREATURE that labels online projects with 75-90% accuracy. The CREATURE model finds projects that are useful for learning a particular topic (e.g. Biology, Electricity and Magnetism, Anatomy and Physiology). If the model was put into wide use, teachers could use it to find good learning-by-doing projects anywhere on the internet. Students could also use the model if they are interested in exploring a particular topic. CREATURE uses Neuro-Linguistic-Programming (NLP) to “read” project descriptions for evidence that they would be good PBL lessons. I used projects from two fantastic websites: the Exploratorium and Instructables to train the model. The Exploratorium is a website made by educators and multidisciplinary professionals who created more than 3,700 activities to help children explore STEM concepts. Instructables is a website sharing DIY projects from many categories (robotics, food, crafts, etc.) With more than 280,000 user-posted projects, Instructables is the largest website of DIY projects in English. However, the goal of the Instructable website is not learning, and so many good potential classroom projects will never be found by educators. In this project I trained the CREATURE model to predict learning topics by using Exploratorium projects and apply these learning topics to the Instructables projects. The CREATURE model improved some of the Instructable learning topics searches as much as ten times. Depending on the label, this is an improvement of .75-.90 accuracy levels on the Instructables data. CREATURE also found 30%-50% out of 200 projects (again depending on the learning topic) that already have explanations or were designed for teaching. These projects would remain hidden to educators and students without applying CREATURE. This shows that this model would be extremely useful for teachers or students because they can find ready to use projects that already include explicit learning components. It uses transfer learning on BERT and has been improved with Active Learning This idea can be developed to help educators, children and parents to find more and better learning materials in the biggest learning resource available, the web. The code is available here and more details about the process of making it can be found here. |
Rationale
Adaptive expertise is when problem-solvers are able to efficiently solve previously encountered tasks (regular learning) and generate new procedures for new tasks (innovation). In a world that is constantly transmuting, helping children to become adaptive experts is extremely important. However it is extremely hard for two main reasons. The first one is that adaptive experts need to apply what they know to new and previously encountered situations so teachers need to constantly create experiences that will allow that exploration to happen. This is hard because creating those experiences normally involves multidisciplinary knowledge and scaffolding real world situations into classroom activities. The second reason is that learning by itself is hard and it requires many hours of being focused and engaged, for children to achieve that state they need to be in to do something that is really interesting, and each child finds different things interesting. Thanks to the 'Makers Movement' the web is full of Do It Yourself (DIY) projects that employ cheap and relatively easy-to-use technology. And more and more schools are adopting some type of 'project-based-learning' that uses 'making' technology (3D printers, microcontrolers, etc). Still, 10 years after the 'Makers Movement' became popular, educational institutions struggle to support their students and teachers in creating real world projects that apply knowledge in all the different subjects. 'Maker-spaces' in schools aren't integrated with the more formal subjects (math, physics, biology, etc) and teachers don't have the time and the proper training to scaffold and integrate the subject content with real world projects. Machine Learning (ML) has been used mostly as a tracking optimization system in education (e.g. ML is being employed to teach the long multiplication algorithm as early as possible measuring if children are using the multiplication algorithm accurately and fast), not as an empowerment tool for students or as supporter of agency. CREATURE explores how ML can help the implementation of 'project based learning' in schools, giving more agency to children and helping teachers select up relevant classroom projects. Additionally, CREATURE makes the connection between the project and teachers' content explicit, helping teachers to find, use and adapt content from the web, which is a huge, time-consuming process for them. |
Examples
When we look for a learning topic such as Biology on the Instructables website, the search results show 12 projects (picture below on the left), even though that website has many more relevant projects related to Biology. It's easier to understand how some of the projects relate to Biology (Low Cost and Accurate Incubator), but for others (Arduino) it's harder. Using CREATURE can we find the top 200 projects that relate to Biology. CREATURE also presents which part in the project matches the learning topic, in this case, Biology. This last information is very useful because: i) even if the user doesn't understand all the information in the project the relationship between making that artifact and what the model matched with Biology is explicit and ii) it helps to browse through the projects faster without having to open all the urls. |
Top results for Biology in Instructables
|
Top 10 out of 200 results for Biology using CREATURE
|
As you can see CREATURE provides many more projects. They are different from the Instructables's own search results. It also finds projects related to living beings, human biology, and applications of robotics in Biology. On Instructables, users label their own projects and they might not have educational intentions when posting their projects. CREATURE helps to find the educational use in the projects.
|
Let's take a closer look to two of the projects (see images below) matched by CREATURE: Motion Sensor: Teach Vestibular and Heart Dissection. Not only these are great projects for learning Biology even when they are very different and one of them is categorized as a Sensors activity (Motion Sensor: Teach Vestibular), but both projects were written having teaching in mind.
Motion Sensor: Teach Vestibular as per its description: this "instructable is designed to help a science teacher build a sensor that can be used to teach the principles of the vestibular, which is located inside the cochlea. It includes step-by-step instructions on the construction of the sensor as well as information on the principles it can be used to teach.". Heart Dissection describes not just how to do the heart dissection but explains what to pay attention to while doing it. |
Example of text matched related to the project Stroke Sensor Teach Vestibular
[CLS] with information from the joints, tendons, and skin, leading to the sense of posture and movement.
Unexpected inputs from the vestibular system and other sensory systems can induce vertigo (Human Physiology 7th edition, Vander, Sherman, Luciano , pg 253 - 255). [SEP] |
Example of matched text related to the project Heart Dissection .
[CLS]What does your heart sound like ? How fast is it beating ? What's the pattern? A few students may be able to detect the wump - wump , wump - wump part of the beat (which they've probably heard in movies, too).
This has to do with the fact that our heart has four chambers, which we'll get too soon. This comes from the two pairs of valves in the chambers of the heart. Step 2 : Check for blood Sure we've been told we're full of blood, but how do we know? maybe you've got a cut before, but you can see evidence even without an injury. Just look around. Starting with looking at an eye , you can see both arteries and veins. The retinal arteries are thinner and brighter red than the veins. If you have a flashlight, you can also see blood in any narrow part of our skin like in our fingers and in our ears. For simple viewing, what do you think gives our tongues and gums that color? Step 3 : Have a heart Starting with a pig heart, there is a lot to notice before we start cutting. What do you see? To start, let's orient our heart so that we're looking at the front. This is how it would sit inside the animal with them facing on us. We're going to refer to "left" as the animal's left, and right as the animal's "right," which is how things are described medically. That will be opposite the way we view them. Looking at the front, you might notice a groove down the middle of the heart. This is the interventricular sulcus. This divides the ventricles, which we'll get into in a bit . If you flip it around, you might find a little bit of tissue, hanging like an ear flap. Those are the auricles, which help protect the atria inside. Often hearts from the butcher will have a cut along one of the sides, which they do so the blood can drain out . You may see some coagulated blood (dark dark brown). What else can you find? Step 4 : Veins and arteries How many tubes go into and out of the heart? Try to find all four. With a heart from the butcher, the first two you'll probably notice are the superior vena cava and the aorta. The superior vena cava is a vein that brings all [SEP] |
Data
A crucial component was to find the correct data: The Exploratorium and Instructables (two fantastic websites!), and use it the train a model that could amplify the structure of The Exploratorium that is relevant for learning to the great projects of Instructables. The Exploratorium is a website made by educators and multidisciplinary professionals, they created ~3,700 activities that help children explore STEM concepts. Instructables is a website dedicated the share DIY projects from many categories (robotics, food, crafts, etc) with more than 280,000 user-posted projects, the goal of the projects is not learning but making. Instructables is the largest website of DIY projects in English. In my project I trained my model on Exploratorium data to learn to predict the learning topics from the project and apply it to the Instructables data.
|
Instructables Example
On top you can see an examples of a project from Instructables describing how to make an Electric Eel Knex Roller Coster. Some projects have long descriptions, some a list of tools and materials, they might or not have Step Titles. The articles on MakeMagazine and Hackster are very similar to Instructables. Even thought I did some experiments using the pictures, I ended up not using that data for predicting which projects match specific learning topics. I just used the text of each project.
|
Exploratorium Data
The Exploratorium's educational goal is that "by creating inquiry-based experiences and tools that spark wonder; offer hands-on experiences; and encourage questions, explorations, and individual discovery, we're transforming the way that people learn. Learning this way empowers people to figure things out for themselves—about science, but also about any topic, claim, or idea." They have a spectacular set of activities that achieve this and are designed by professional 'makers' and educators. Their projects' goal are not functionality but exploration and learning. |
(On top to the right ) This is an example of an activity that describes how to create a motor and explains what's the mechanism that makes it work. All the Exploratorium activities have a list of topics in hierarchical levels (e.g. Physics -> Electricity and Magnetism) that are explored in their activities. Most of the topics are STEM related. I'll be using their list of topics as a labels.
|
Methodology
BERT is a NLP model that contextualize the embeddings of words. This mean that in the sentence: I'm going to the bank for money and That's a bank of snow, the word bank has a different word embedding vector. BERT stands for Bidirectional Encoder Representations from Transformers, the bidirectionality refers to the fact that BERT learns to predict context of a words based on what's on the left and right of it. It was trained on free web text and it can be fine tuned for specific tasks. BERT generated embeddings based on: i) wordpiece tokenizasion embeddings which divides the words into smaller subunits and can handle more effectively rare or unknown words, ii) sentence embeddings, which separates sentences with an special character, iii) position embeddings which are learned. For a great post on NLP models check here. On the right top you can see the diagram of the Transformer Encoder model architecture from Transformer paper. The image below that one is a diagram of training objects in a slightly modified BERT model from Pre-Training Paper for classification with multiple sentences. You can check the code here. I trained BERT because it has bi-directional representations that can help to perform comparison tasks. I'll be interested in trying GPT2 soon and seeing the differences. After I modified BERT to be multilabeled classification and shaped my data to keep track of the matching text, I trained BERT on the Exploratorium data using the top 20 labels with the most examples. I trained for 2 epochs and used 20% of the data for validation. When I trained BERT for the classification task I took the final hidden state of the special first token and multiply it with a small weight matrix, softmax. I choose the top 20 labels with most examples. I also have the text being tokenized on a 512 tokens size, this was enough for the Exploratorium activities length. The results were good enough for me to keep developing the project in that direction. |
|
To be able to use this model on Instructables data I had to chunk the text of each project because many projects were longer than 512 tokens. I kept track of which chunk belongs to which project so I could evaluate the results.
Then I predicted the top 200 projects per label. The results were promising with some mistakes that I thought could be fixed with more data. So I used the tracking of the chunks to perform active learning on the 20 labels for the top 200 predictions (image to the right). I then fine tuned the model again, and improved the results. |
I used Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) to visualized the active embeddings from the Instructables data. It's based on the next assumptions:
The Exploratorium learning topics have two abstractions level -parent and child- from all their learning topics (image to the right). I started by visualizing almost all labels (Figure 1)- I removed the labels Physics, Light and Design & Tinkering because most projects have a relationship with them and they made the visualization hard to understand- and I found some interesting clusters that I decided to investigate.
|
Figure 2 shows the classification of the embeddings of the Physics learning topic: Electricity & Magnetism, Light, Mechanics, Sound and Waves. It's interesting the see how Waves is intersecting the Light cluster but not as much as the Sound as I expected.
Mechanics and Electricity & Magnetism and diametrically opposed, which makes sense, with Sound in between. |
Other Experiments
Even though I didn't explore further other directions because of time constrains it's worth mentioning them in case anyone might be interested. I used besides Instructables and Exploratorium data from Hackster and MakeMagazine (THANKS A LOT DALE!!). You can see more about the experiments I did here.
|
Next Steps and Future Directions
I hope this project demonstrates how we can use AI for human learning and empowerment and not only for tracking 'learning'. My next step is to try different data sets and train CREATURE to find learning content on the web that's outside Instructables. I'll like also to create a web app where children and teachers can input their tools and materials, time, and learning goal and CREATURE returns several projects with those characteristics. Another direction is to further develop a model that can classify 'expertise' on the projects. The experiment done with the Hackster data was promising but needs further development. I'm now looking for grants or resources to keep feeding CREATURE ;) Let me know if you know if you have any idea that can help me. |