At The IC we spend a lot of time thinking about museum data, because when we build a museum website it has to work seamlessly with the museum's collections and events data. As we've built GLAMKit, we've realised that some widespread issues need general solutions.
One of the particular challenges faced by almost every museum is how to deal consistently with dates. Most of the time, computers are used to dealing with complete, precise, unambiguous dates, such as "August 28, 1979". If everything is unambiguous, it's easy to say whether one date is the same as another, or comes before, or after.
As with many things, life isn't so simple when it comes to museum data…
If your collection is geological, a date like "August 28, 3 million BC" isn't quite right.
If your collection is at all historical, then you're very quickly going to run into uncertainties and approximations.
For example, you know a vase was from the Edo Period, then it was probably made between 1603 and 1808, give or take. You may even be able to narrow it down to a decade, say the 1770s. But getting a complete and unambiguous date is unrealistic at best, misleading at worst.
For database designers, it makes more sense to treat museum dates less like a single point in time, and more like a region of time.
In the collections databases we've worked with, like Axiell Emu and Vernon CMS, it's most common to store a 'display date' string, which can be any text, plus two precise dates that indicate the start and finish of a region of time.
Occasionally, there's no computer-readable date information, which leaves a museum with no way to query its collection by date!
But even when there is, this approach still presents several problems –
- The 'display date' string doesn't have any consistency - in theory one institution might be consistent with how they talk about dates, but there are few standards from one institution to another. As a result, there's not much computers can do with it.
- One institution may prefer "c. August 1969",
- another may prefer "Aug 1969 (approx)";
- let alone something like "Printed c. 1865; reprinted 1869-70".
Extended Date/Time Format
This is where EDTF (Extended Date/Time format) comes in. It's a format specified by the Library of Congress that gives us way to distinguish between these nuances, in a way that computers can make sense of.
For example, EDTF lets us specify things like,
- "approximately August 1984" (In EDTF: "1984-08~"), or
- "a day in August 1984" ("1984-08-uu").
All normal ISO 8601 dates are valid EDTF dates too (e.g. "1984-08-28").
There are several levels of EDTF. At the deep end, it gets complex. The EDTF text
… means "An interval in June 2004 beginning approximately the first and ending approximately the 20th".
[1760-01, 1760-02, 1760-12..]
… means "January or February of 1760 or December 1760, or some later month".
We've implemented the EDTF specification in Python, and have just released a new 2.0 version that for the first time covers the entire spec (you can get it here), and includes a Django model field for storing EDTF values in the database.
The EDTF library means we can take a date like "1969-12" (December 1969) and store it in a database, and derive a range of time from it (e.g. 1st-31st December). That means it becomes easy to sort and filter collections that have imprecise or approximate dates. It gives us an incredibly powerful way to deal with time in the way museums think about it.
That leaves us with one problem: that no-one knows how to write EDTF, or if they do, they almost certainly aren't using it in their collections data. Instead, they're probably using the display-plus-2-dates approach in their collections management system. We need a way to derive the EDTF text from a plain English description of a date.
So we also made an EDTF natural language parser. It converts real-world display dates that we found in collection data into EDTF form.
Here are some examples – natural language on the left, EDTF on the right after our parser has done its work. First some basic examples, then some more complex cases:
'January 12, 1940' => '1940-01-12' '90' => '1990' #implied century 'January 2008' => '2008-01'
Uncertain and Approximate Dates
'1860?' => '1860?' '1862 (uncertain)' => '1862?' 'circa Feb 1812' => '1812-02~' 'ca.1860' => '1860~' 'approx 1860' => '1860~'
'1860s' => '186x'
'Summer 1872' => '1872-22'
'earlier than 1928' => 'unknown/1928' 'later than 1928' => '1928/unknown' 'before January 1928' => 'unknown/1928-01' 'after about the 1920s' => '192x~/unknown'
'year in the 1860s' => '186u' 'month in 1872' => '1872-uu' 'day in January 1872' => '1872-01-uu' 'day in 1872' => '1872-uu-uu'
'1st century' => '00xx' '10c' => '09xx' '19th century?' => '18xx?'
Just showing off now...
'a day in about Spring 1849?' => '1849-21-uu?~'
See it in action
The SFMOMA collection uses our EDTF library behind the scenes to query and sort works by date.
You can see EDTF 2.0 in GLAMkit's collections models (install GLAMkit from here). GLAMkit uses the EDTF library to handle its museum collections, but the EDTF library isn't GLAMkit- or even Django-specific. Any Python project can use it.
If you're a Python coder, you can install it with –
pip install edtf
If you want to start using EDTF in your collections data, we can work with you to automatically create EDTF dates from your existing date information.
Get in touch to start a conversation.