The paper is really interesting, it shows a method to help deep neural network models to make analogies. The method is not based on a sophisticated architecture, but on what data is presented to the model. They presented contrasting cases that made abstract relational structures more salient and forced the model to learn them. The ability to make analogies is a cornerstone of human learning, intelligence and creativity. I found extremely interesting that even simple architectures of deep neural network models can learn to perform analogies. Humans learn in similar way, in the learning sciences this way of learning is known as:

The easiest possible case for coding a paper is if you find a paper that has the code, data and some results that can help you understand if you implemented it correctly. Mine didn’t have the code but it had the data, and I found code from their earlier paper: Measuring Abstract Reasoning in Neural Networks. Even when you can see the code you might not understand what’s happening with transformations and reshaping of data, neither with the main ideas of the paper, until you code it.

To code it, depending on how old the paper is, you might need to go to the referenced papers and gather more information. So I went through the rabbit hole, first with the previous paper: Measuring abstract reasoning in neural networks. This paper explained how the data was generated and describes important details about the model's architecture from the paper I wanted to implement.

| I like to take snapshots of the parts of the papers that are important so I can have them on my desktop and I can look at them while I'm coding. Like this I don't have to go back and find the information on the papers again and again. The figure on the top is the architecture, the figure on the left is information about the data, and the other two figures are details about the architecture. |

At this point is a good idea to hit the whiteboard (or piece of paper like me) and draw the architectures in the finest possible detail. This will help you to figure out what you understand and what you don’t. After some time, I realized I needed to go back to the rabbit hole, and find out more about Relational Networks. I browsed three more very interesting papers and I decided to start coding.

I used Colab for this part, you can see my work-in-progress architecture code here. I recommend to start with your own self-generated data until most details are hashed out. In this way you can test things faster. Try not to code the easiest part of the model first but the parts that you need to understand better. Once you have that the rest just flows.

After some effort and many coding cells I thought I had most of the architecture clarified. Now the real job begins: Plumbing! Coding complex models is a lot about piping data:

I used Colab for this part, you can see my work-in-progress architecture code here. I recommend to start with your own self-generated data until most details are hashed out. In this way you can test things faster. Try not to code the easiest part of the model first but the parts that you need to understand better. Once you have that the rest just flows.

After some effort and many coding cells I thought I had most of the architecture clarified. Now the real job begins: Plumbing! Coding complex models is a lot about piping data:

- You need to always know: what shape the input is and what shape it becomes after being process by the layer.
- How to transform the result so it fits the next layer: A lot of reshaping, flattening and more. I spend a lot of time in the piping process!

My final architecture sketch.

Eventually I got the right architecture with the correct size of pipes. The next challenge was getting the data and feeding it to the model. DeepMind has the data public, I downloaded some of it to GCP and opened an instance of a pretty fast machine :) Got my Jupiter notebooks ready and started the testing. Through a very easy and not super sophisticated test I found out that the way in which I was loading the data was too slow. I have two options now, either load the data in parallel or preprocess it... what would it be?? Loading the data in parallel is faster to implement and run, and can be great for experimentation but if I'm going to keep playing with this data it's better to preprocess it and store it on disk, even it takes longer to run and implement.

At this point I'm still deciding what to do. As you can see, implementing a paper with a complicated architecture can be a long process. But I'm learning a lot in the way: I'm learning to set my workflow, how to plan, implement and test an architecture. And most importantly getting a feeling of what a good plumber am I, and where I need to improve :)

]]>At this point I'm still deciding what to do. As you can see, implementing a paper with a complicated architecture can be a long process. But I'm learning a lot in the way: I'm learning to set my workflow, how to plan, implement and test an architecture. And most importantly getting a feeling of what a good plumber am I, and where I need to improve :)

Later in life I became very interested in the concept of proportion and I started seeing it everywhere: architecture, design, math, art, biology, music, astronomy, monetary transactions, and poetry. I understood the obsession that the Greeks and other ancient cultures had around them. Learning more about living systems such as animals and plants I found cases where those living systems can detect the size of other things with respect to their own. I wondered: does understanding the idea of proportions depends on having a body?

Proportions and ratios are very powerful ideas, but they can be hard to learn. Especially because we mostly teach them in terms of math, which is more formal and less experiential. Children can understand that a smaller cat has the same body proportions as a bigger cat. But it's harder to understand that we can create a Roman temple based on the golden ratio, and that the proportion of that temple are the same as the one on the a shell. In between the cats and the temple, there seems to be an understanding gap.

Proportions and ratios are very powerful ideas, but they can be hard to learn. Especially because we mostly teach them in terms of math, which is more formal and less experiential. Children can understand that a smaller cat has the same body proportions as a bigger cat. But it's harder to understand that we can create a Roman temple based on the golden ratio, and that the proportion of that temple are the same as the one on the a shell. In between the cats and the temple, there seems to be an understanding gap.

Proportion is also a concept we can study from the perspective of AI: Can AI learn the concept of proportion, and if it can, does it make the same kinds of generalizations and mistakes as humans?

To start answering this question, I tried two very simple ways to train models to recognize proportions.

The first experiment trains a model to recognize a 1:1 proportion between the height and width of a rectangle, by simple distinguishing between squares and non-square rectangles. The second experiment attempts to train the model to recognition proportions between two distinct rectangles.

I wanted to know if rectangles that are closer to squares, because their width and height are not that different, e.g. 3:4, were harder to classify as rectangles. And does rotation matter? Will "standing" rectangles be harder to classify than "lying" rectangles, even if the difference between their width and height is the same (7:2 vs 2:7)?

I decided that the easiest way to get data for this experiment was to create it. The data generating function returns a matrix with either a square or rectangle of randomly different sizes in different positions in the matrix. Example (see image below): a rectangle formed by the number 1, of width 5 and height 8.

To start answering this question, I tried two very simple ways to train models to recognize proportions.

The first experiment trains a model to recognize a 1:1 proportion between the height and width of a rectangle, by simple distinguishing between squares and non-square rectangles. The second experiment attempts to train the model to recognition proportions between two distinct rectangles.

I wanted to know if rectangles that are closer to squares, because their width and height are not that different, e.g. 3:4, were harder to classify as rectangles. And does rotation matter? Will "standing" rectangles be harder to classify than "lying" rectangles, even if the difference between their width and height is the same (7:2 vs 2:7)?

I decided that the easiest way to get data for this experiment was to create it. The data generating function returns a matrix with either a square or rectangle of randomly different sizes in different positions in the matrix. Example (see image below): a rectangle formed by the number 1, of width 5 and height 8.

The training and validation set was half squares and half rectangles. Because I used supervised learning I had to label the data I created: I labeled all the squares as 1 and all the rectangles as 0.

Creating data this way is easy and convenient because all the information you give a neural network to be trained with will eventually became a number (pictures into RGB numbers, sensing data, etc). By creating the raw input I was able to stop worrying about size, color, number, and balance of the data. If I had had to collect this dataset from images on the Web, it would have been much harder to find an equal number of squares and rectangles of specific sizes, colors, image dimensions, etc.

Using Keras I constructed a neural network and then I trained it with the data I created. Then I created another random set of squares and rectangles and asked the model to classify those images as 1 or 0.

Creating data this way is easy and convenient because all the information you give a neural network to be trained with will eventually became a number (pictures into RGB numbers, sensing data, etc). By creating the raw input I was able to stop worrying about size, color, number, and balance of the data. If I had had to collect this dataset from images on the Web, it would have been much harder to find an equal number of squares and rectangles of specific sizes, colors, image dimensions, etc.

Using Keras I constructed a neural network and then I trained it with the data I created. Then I created another random set of squares and rectangles and asked the model to classify those images as 1 or 0.

The image on the right is showing all the misclassified cases of rectangles, organized according to their heights and widths. The darker the color, the larger the number of misclassifications is for that height/width combo. The x axis represents the height and the y axis width of the rectangles being count. As we can see, rectangles which have a very different width and height are easier to classify as rectangles, and the most misclassifications occur with rectangles for which height and width are similar. It seems also that smaller rectangles are slightly harder to classify. And even though the shape of the graph is fairly symmetrical, the number of misclassified cases varies according to rotation; For this model, tall and skinny rectangles are slightly harder to correctly identify compared to shorter and stouter ones. |

I wonder how much can this model generalize? I thought it will be interesting to see if the model could be trained more directly on proportion and less on the specific differences between rectangles and squares. So I created data that has two rectangular shapes. In one half of the data, the shapes will have the same proportion between their width and height, equal ratios (figure below on the left), and in the other half the data the proportion will be different (figure below on the right). Compared to the first model, we are not trying to recognize any 'special' proportion, but rather distinguishing between proportional relations between two objects vs non-proportional relations.

The accuracy of the model was very inconsistent, in ~65% of the cases the classification was correct. Meaning, just a smidge above chance. This is consistent with a human learning process, even when children have no problem classifying between squares and rectangles the idea of correctly classifying set of shapes with equal proportions from those with different proportions is much harder.

After getting the confusion matrix, which tells me the number and type of images that were incorrectly classified it seemed like the non-proportional rectangles were more difficult to classify correctly. Interesting... but much more experimentation is necessary to understand what's going on in this model. Maybe next week I will have some insights :)

]]>After getting the confusion matrix, which tells me the number and type of images that were incorrectly classified it seemed like the non-proportional rectangles were more difficult to classify correctly. Interesting... but much more experimentation is necessary to understand what's going on in this model. Maybe next week I will have some insights :)

This week I read Analogues of mental simulation and imagination in deep learning. There so much to talk about in that paper but let's start with a core concept of RL, POMDP. POMDP is exciting to me because I believe it represents human-decision-making well enough to see how far it can take AI to meet our own human decision process. I'll keep talking about it in the next posts.

The paper's background is that we can think of two types of computational approaches to intelligence: i) Statistical pattern recognition, which focuses mostly in prediction (classification, regression, task control) and ii) model-building, which focuses in creating models of the world (to explain what we see, to imagine what could have happened that didn’t, or what could be true that isn’t, and then planning actions to make it so). The paper talks about the second approach, deep learning methods for constructing models from data and learning to use them via reinforcement learning (RL), and compares such approaches to human mental simulation.

RL is about

These are graphs from the paper. In the

Some examples of human model simulation (physical prediction, mental rotation, theory of the mind, human reinforcement learning) can be seeing on the right graph. Even-though they are different all of them can be represented by a POMDP. I'll explain (c) Theory of the mind. Theory of the mind is the ability to attribute mental states—beliefs, intents, desires, emotions, knowledge, etc.—to oneself, and to others, and to understand that others have beliefs, desires, intentions, and perspectives that are different from one's own. This example represents the thought process of an agent