As we continue our series on the Art Gallery of South Australia’s (AGSA’s) new website and digital transformation process, we turn our focus in this post to our work on integrating AGSA’s collection data from EMu, one of the world's leading collection management systems.
AGSA stores its collection data in EMu. This includes information about works of art, artists, images, physical locations, taxonomies, cultural and linguistic backgrounds, and much more.
What were we aiming to do?
We were tasked with finding the best way to display data and images from the AGSA collection to the public, which resulted in the recently launched AGSA Collection Search. Through the Collection Search you can browse and search more than 9,800 works the gallery has published, out of a total almost five times as large.
Here are some of the principles that guided our design of this system:
- A search interface should allow the public to investigate the collection and discover both works and artists.
- Content editors should be able to link to collection records and reuse collection images when editing pages.
- Changes to collection data should be displayed to the public as soon as possible.
- Gallery staff should be able to use EMu to control whether records are displayed by the website, as well as how much of their content will be displayed.
A note on terminology
EMu is a collection management system, which is often abbreviated to CMS. The software that runs AGSA’s website is a content management system, inconveniently also abbreviated to CMS.
In this article, when we say CMS, we always mean the website content management system.
EMu integration: high-level overview
Integrating a system as complex as EMu into a website CMS involves a large number of moving parts, with multiple systems communicating. As an example, let's step through the process for retrieving an image from EMu and adding it to AGSA’s CMS.
First, an automated process connects to EMu through its API and fetches any new or modified records, including their associated images. The process passes these records through a pipeline made up of multiple steps, one of which processes high-resolution images. At the final stage, the process uploads the newly processed records and images to the CMS.
After the CMS has received the uploads, it looks for those that are flagged as ready for public display. The CMS then uses these “public” records to create “display-ready” data that are ready for use within the CMS’s front-end, search engine and admin user interface.
Interesting problems we encountered
As with any bespoke implementation, the integration of EMu threw up some challenges. Our team rarely wavers in the face of gnarly technical implementations and integrating systems that don’t normally work together. We like to call them “interesting problems”. Here’s some we encountered when working on this integration.
Reading data from EMu, via IMu
EMu provides the IMu module, which exposes a basic, socket-based API that allows EMu’s data to be queried and read. The vendor also provides IMu libraries in some programming languages, as well as technical documentation that covers limited use cases.
The rest of the AGSA web system is written in Python, which isn’t one of the languages IMu provides a library for. We needed to implement our own library that would enable us to connect to IMu, run queries and fetch results. The documentation for IMu doesn’t document its transport mechanisms and messaging protocols, so we “reverse-engineered” this information by using the Perl language library and Wireshark.
The Perl library allowed us to send messages to IMu which Wireshark can capture and let us examine at a network level. By inspecting the payloads , we could reverse-engineer their API and build out a Python library that enabled us to pull data from EMu.
We open sourced this work to assist any other developers encountering similar problems.
Preparing AGSA’s collection data for display
AGSA’s EMu system is maintained behind a virtual private network (VPN) that does not allow direct connections from the outside world. Unfortunately the cloud-hosted CMS is outside the VPN.
To connect to EMu we needed to run the automated process that fetches EMu records within the gallery’s VPN on an internal “sync server” and then push it to the CMS.
Our initial implementation involved some data preparation on the sync server, so that the CMS would receive display-ready data. However, we quickly realised that any changes to the publishing or display behaviour would necessitate changes to both the sync server and the CMS. So we refactored both systems so that they maintain an intermediary JSON representation that closely resembles EMu’s structure.
This new representation enabled us to rapidly push data to the CMS, which now acts as the “single source of truth” for our display logic. This simplified architecture allows us to quickly respond to the changing needs of the gallery.
Transforming large multimedia assets
The gallery uses high-resolution TIFF images for works of art, events, artists and more. These images are often measured in the hundreds of megabytes and hence are not directly viable for use on the web.
To ensure that we could display these images without degradation or loss of information, we built an asset pipeline optimised for asset size. The pipeline would consume the source TIFFs and then output full-sized, colour-corrected JPEGs that are typically 1-5% the size of the source.
Due to the heavy compression in the JPEG format, these files still contain the entire source image, but are immediately ready for use in the web environment. We will be writing in more detail about this in an upcoming post.
More to follow: image editing, zoom, CRM integration
Look out for our upcoming posts on image editing and the other integrations we undertook – including a bespoke CRM integration – as part of AGSA’s new site launch. In the meantime, check out AGSA's Online Collection Search, and stay tuned for our next installments by signing up to the IC’s newsletter or follow us on Twitter.