Lucene searching my collections

Interior of standard brewery Kanowna, showing beer kegs, bottles of beer, men at work bottling beer. Man seated holding dog.

Throughout my time at the WA Museum, one thing that I have always loved is the raw results of Apache Solr when unleashed on a new dataset.

It’s not that there is anything special (or more accurately novel) about a clever / voodoo powered search index that crawls over data and suggests relationships. Far from it.

The power of predictive algorithms, big data, customised user interfaces etc. has made this type of thing pretty standard fare. Also, these tools are a hell-of-a-lot more accessible then they were in the not too distant past. In short, they’re ubiquitous, so we don’t really take note.

However…  when you work with new, historic data (I’m aware of the oxymoron there), I think you get an additional appreciation for both the new technology and also the importance of these historic resources and collections.

One of the joys we are fortunate enough to get in museum game is seeing the results when these powerful search tools index effectively lost data. It’s the results that we get when port hundred-year old images with their metadata, raw, into Apache Solr. All of the sudden, we are automatically shown relationships between data that perhaps no-one has seen before. Or even thought of before. It’s, for lack of better word, cool.

So when we* recently imported the Dwyer and Mackay Collection (http://museum.wa.gov.au/online-collections/dwyer-and-mackay), we immediately indexed the collection in Apache Solr. I think the results speak for themselves.

We designed the site so that related images automatically populate as thumbnails underneath the main image you are viewing. These images are selected by the Solr index. Solr has since been trained a little bit since the initial release, but roughly you get what the index deems relevant.

By starting with an image, particularly of an individual, you can find a related photo. It might be other family members, or perhaps the same person at a different time, or perhaps a similar location. Over a short period of browsing through related images, you can visually start to understand that period of time and start the get a feel for life in the Western Australian Goldfields at the turn of last century.

So go have a look, and when you click or tap through the various related images, do take a moment to appreciate the value of these historic photographic collections, as well as the coolness of the Apache Lucene engine (and my team).

* we is quite a generous term considering my involvement, full credit to Andrew Rowe and Danny Murphy in the digital services team who actually did the hard work of getting the data from the source and into a portable format for use in our website.