Mosaic

13-01-2012

As is my way, I decided to embark on some intelligent vandalism, only this time, in the digital realm. We have several machines in our lab at Central St Martins and access to these is completely open. There are no student passwords and as such, students here have become lazy and left all their data on the desktop or trash. I decided to write a script to pull off all the data from these machines, find the images and mash them all together in a mosaic.

Final Attempt

At first, I had to grab the images, rename them, resize them and put them in the right sort of place. This is actually a tricky problem because files have spaces, many directory levels and all the rest.

!/bin/bash
c=0
find . -type f -name "*.jpg" -o -name "*.jpeg" 
-o -name "*.JPG" -o -name "*.png" -o -name "*.PNG"
-o -name "*.tif" -o -name "*.tiff" 
-o -name "*.bmp" -o -name "*.BMP" 
-o -name "*.TIFF" -o -name "*.TIF" 
-o -name "*.JPEG" | while read FILE
do
  fn="$(basename "$FILE")"
  echo "$FILE"
  en=${fn*.}
  dn="$(dirname "$FILE")"
  cp -- "$FILE" $1/$c.$en
  convert "$1/$c.$en" -resize 64x64 "$1/$c.jpg"
  let c=c+1
done

This worked more or less. There are probably improvements we can make to this but for now, I had around 300,000 images already! Quite a lot to pick from.

Originally, I wanted to use an off-the-shelf solution. I found MacOSaiX, Mazaika and Mozodojo. The image below is created with Mazaika.

Mazaika first attempt The problem with all of these is they are limited to certain sizes, despite giving good results. I wanted this to be A0 at 300DPI when printed. The problem of course, is that the image sizes are huge. Most programs just can't cope with it. Mazaika is limited in how many you can use unless you buy it (and its too expensive for what it is) and the others can't cope. So, in the end, I decided to write my own.

Using OpenCV I came up with two programs that you can get from my github page. The first goes through the set of images and creates a text file of average RGB values. The second performs the matching based on this database. OpenCV is used throughout to process the images. The result was a program that took a while to run, but wouldn't crash.

first attempt

You can see that although it has worked, it's not brilliant. It matches well but I noticed that some areas were just too similar. I adjusted with a small amount of random jitter in the source colour.

After looking around the web, I finally found OpenMP which is a lovely concept, baked right into GCC. It allows you to take control of many cores on your machine, and create threads easily.

pragma omp parallel for

This one line basically sped up one of the steps by a factor of 7. Essentially, this task, without local neighbourhood refinement, is all parallel and so, OpenMP was a wonderful find and a free win. Great!

There is more to be done for sure - this sort of large scale processing could be useful for video, or maybe transformed into Scala or similar for parallel processing over many machines?