Almost a year ago, I created a 116sq ft mosaic of the Taj Mahal out of anime girls. It was formed in pieces: 15x12 pages, where each page had 14x12 tiles. In other words, I had to match 30240 tiles to the best anime girl out of 2498 other ones (giving us about 12.11 occurrences per girl).
Sadly, when I built it, I only programmed the part that downloaded the images. I let someone else’s program do the mosaic building part for me. While it turned out great, I missed out on the truly fun part of the puzzle.
You see, mosaics are actually a quite complex array of optimization problems. Given an large input matrix (the image), and large database of features (your image database), the following problems must be solved to create an optimal output image:
- Find the closest image in your database to match a submatrix (tile) of your input matrix. You want the output image to resemble your input image.
- Minimize the number of repeats for a given classification. If half of your output photo is made of the same photo, it will ruin the aesthetic.
- If repeats must be made, maximize the distance at which they occur from one another. Multiple repeats are bad, but if they are far enough away from each other, they don’t look as bad.
Thus, with these challenges in mind, I have taken it upon myself to find the best way to create a mosaic. One that looks good, isn’t too computationally expensive to create, and employs the full extent of its resources (our image database).
I figured I’d do this in 3 parts, and write about the journey of completing each part. Part I will be a simplistic approach to solving the problem–– primarily focused on completing problem 1 without worrying too much about 2 and 3.
Part II would be to solve 2 and 3 using more complex (yet still classical) statistical methods–– I am currently considering genetic algorithms, but I may find better… I might also improve upon 1 by adding texture dimensions through grayscale.
Finally, Part III would be to attempt to optimize all 3 tasks at once with a neural network. I figured II would be a nice stepping point towards this, as I would have to create loss functions for the genetic algorithm anyway.
Before I get into how I did Part I, a little background…
In December of 2016, instead of studying for finals, I started an art project. My roommate and I had surplus printing credits, and we thought it would be neat to print out something big and put it on our ceiling. After brainstorming for a while, we decided we wanted to make a mosaic, printed out on 180 different sheets of printer paper. We weren’t sure what we wanted the picture to be, but we thought it would be funny to make it out of thousands of photos of anime girls.
The first 13 iterations were Whoopi Goldberg peering over a mountain. Here are some of them.
This one is different in that it focuses on matching texture, and then re-colors the image to match.
After testing out photos of Whoopi Goldberg, I decided that a better subject would be the Taj Mahal. It had to be tweaked though, as it didn’t offer a large enough array of colors.
One of my first attempts at doing this, was to create a NearestNeighbors classifier on the average RGB value between tiles and database images. In this plot, the blue points are DB images, and orange are tiles. Needless to say (yet I shall), this was not affected by the curse of dimensionality, but rather the lack of it. I needed more dimensionality to differentiate the gradual transition of color and texture across tiles.
We first want to load all of the images we downloaded. Here, I put them in a folder named p2, where each file is named 1.jpg, 2.jpg, and so on all the way to 1312.jpg. As open each one, I resize it to a 5x5 image, and store its array as a tuple inside my images list. Once we have all of the images, sort their placement in the list by filename, and store them in a pandas DataFrame.
Now, we will take our list of (5,5,3) shaped color matrixes of the form:
and flatten each of them down to a (75,) shaped matrix of the form:
To do this:
and that’s that.
(Example, credit to Erik Bernhardsson)
From here, the classification is fairly simple. We have 1312 different data points in 75-dimensional space, so we merely need to fit them to our NearestNeighbors model. We have specified k to be 5 in this, so for each tile in our input photo, we will be getting 5 possibilities as to potential matches.
Before I load the input image, I want to be able to split it into 5x5 tiles that I can match to my database images. So, for images that are 500x313 for example, the 500 is divisible by 5 so it can stay, however I will need to resize the 313 to be 315:
Now we can load and resize the photo.
In order for me to classify each tile, they need to be of the same 75-dimensional form that my DB images are in:
(Example, credit to Erik Bernhardsson)
Now, I can classify my tile features. As each tile is classified, 5 potential candidates are given, along with how far away each candidate it from matching it exactly. I can take this information, and form a weighted distribution to pick an image from.
Say I have 5 candidates that are each equally far away from my point. Then, they will each have
1.0/5 = 0.2 chance of being picked. As I generate a random float from 0-1, I can say 0-0.2 results in candidate 1, 0.2-0.4 results in candidate 2, and so on.
Now that we have a list of what we want to change each tile to, we can re-assemble the photo with our new tiles in place.
Thus, our input photo:
As you can see, there are a number of repeats, and they are quite close to each other. Out of the 1312 photos I started with, I believe only around 600 of them were used in the final image. So, some work is definitely to be done in Part II for optimizing problems 2 and 3.
With that being said, I’m quite proud of how it turned out. I hadn’t done much with matrices before this, and I’m eager to learn more of NumPY’s useful features.
Thanks for reading. I’ll see you in Part II.