Growing up in a catholic family I visited a lot of churches anyplace we traveled. All of them had a similar pattern, a very tall door and huge ceiling. As I child I wondered often, why all that empty space? Why is the ceiling so far and the door so tall? Eventually I came to the conclusion that it was for the visitors to experience our proportion in relation to a huge idea such as God.
Later in life I became very interested in the concept of proportion and I started seeing it everywhere: architecture, design, math, art, biology, music, astronomy, monetary transactions, and poetry. I understood the obsession that the Greeks and other ancient cultures had around them. Learning more about living systems such as animals and plants I found cases where those living systems can detect the size of other things with respect to their own. I wondered: does understanding the idea of proportions depends on having a body?
Proportions and ratios are very powerful ideas, but they can be hard to learn. Especially because we mostly teach them in terms of math, which is more formal and less experiential. Children can understand that a smaller cat has the same body proportions as a bigger cat. But it's harder to understand that we can create a Roman temple based on the golden ratio, and that the proportion of that temple are the same as the one on the a shell. In between the cats and the temple, there seems to be an understanding gap.
Proportion is also a concept we can study from the perspective of AI: Can AI learn the concept of proportion, and if it can, does it make the same kinds of generalizations and mistakes as humans?
To start answering this question, I tried two very simple ways to train models to recognize proportions.
The first experiment trains a model to recognize a 1:1 proportion between the height and width of a rectangle, by simple distinguishing between squares and non-square rectangles. The second experiment attempts to train the model to recognition proportions between two distinct rectangles.
I wanted to know if rectangles that are closer to squares, because their width and height are not that different, e.g. 3:4, were harder to classify as rectangles. And does rotation matter? Will "standing" rectangles be harder to classify than "lying" rectangles, even if the difference between their width and height is the same (7:2 vs 2:7)?
I decided that the easiest way to get data for this experiment was to create it. The data generating function returns a matrix with either a square or rectangle of randomly different sizes in different positions in the matrix. Example (see image below): a rectangle formed by the number 1, of width 5 and height 8.
The training and validation set was half squares and half rectangles. Because I used supervised learning I had to label the data I created: I labeled all the squares as 1 and all the rectangles as 0.
Creating data this way is easy and convenient because all the information you give a neural network to be trained with will eventually became a number (pictures into RGB numbers, sensing data, etc). By creating the raw input I was able to stop worrying about size, color, number, and balance of the data. If I had had to collect this dataset from images on the Web, it would have been much harder to find an equal number of squares and rectangles of specific sizes, colors, image dimensions, etc.
Using Keras I constructed a neural network and then I trained it with the data I created. Then I created another random set of squares and rectangles and asked the model to classify those images as 1 or 0.
I wonder how much can this model generalize? I thought it will be interesting to see if the model could be trained more directly on proportion and less on the specific differences between rectangles and squares. So I created data that has two rectangular shapes. In one half of the data, the shapes will have the same proportion between their width and height, equal ratios (figure below on the left), and in the other half the data the proportion will be different (figure below on the right). Compared to the first model, we are not trying to recognize any 'special' proportion, but rather distinguishing between proportional relations between two objects vs non-proportional relations.
The accuracy of the model was very inconsistent, in ~65% of the cases the classification was correct. Meaning, just a smidge above chance. This is consistent with a human learning process, even when children have no problem classifying between squares and rectangles the idea of correctly classifying set of shapes with equal proportions from those with different proportions is much harder.
After getting the confusion matrix, which tells me the number and type of images that were incorrectly classified it seemed like the non-proportional rectangles were more difficult to classify correctly. Interesting... but much more experimentation is necessary to understand what's going on in this model. Maybe next week I will have some insights :)