Gavia Libraria

The Law of Conservation of Complexity

When digital-library projects in academic libraries were young, each one was sui generis, tailored to the local staff’s expertise and perception of value. Image collections were doubtless commonest, but here and there libraries specialized in newspapers or maps or TEI or finding aids, and built technology stacks around those specializations.

Along came the institutional repository in 2004 or so. The IR as a technology stack is a touch deceptive: it claims to accept any old thing so long as it’s bits and pixels, but in practice, its ingest and end-user interaction models are heavily optimized for the single-article-PDF use case. (Go look at an image collection in a DSpace repository to see what the Loon means. It’s like putting shoes on a goose.)

Nonetheless, the IR troubled the old digital-library paradigm just by pretending to be a catchall. Libraries realized that they were responsible for looking after all kinds of digital stuff, from institutional records to ETDs, and they weren’t necessarily doing the world’s best job of it.

This is roughly where libraries sit today, and there isn’t a one not looking for a “simple” way to solve the problem. The Loon doesn’t think there is a simple way. She believes in the Law of Conservation of Complexity: when one suppresses complexity in one area, it will only burst out in another.

Quite a few libraries find themselves in the awkward position of having several digital-stuff silos, often differentiated by content type (as Omeka for images) or by workflow (as with many IRs; few digital-library software stacks make accommodations for ingest from folk outside the library). Complexity here resides in multiple software stacks solving identical (or at least analogous) problems in parallel, as well as in cross-silo discovery (can your library search across all its digital collections?).

For most of the 2000s, this was the best libraries could do. New kind of stuff? New use for stuff? New silo. Developers are embarking upon a new paradigm, however: a single, highly customizable, technology stack.

Sounds wonderful, right? Systems administrators are dancing with glee at the thought of only having to install, upgrade, and performance-tune a single stack. Unfortunately, the Law of Conservation of Complexity still holds.

One locus of complexity with these stacks is that they are the result of software bricolage. Whether it’s Fedora Commons or curation microservices underneath (and those are the two commonest choices), single-stack solutions currently pull in all sorts of code (in all sorts of programming languages, even) by the ears. This isn’t much less of a maintenance problem than silos, to be perfectly honest. It would be nice if the stack developers did all the bricolage-wrangling, but realistically, local systems administrators won’t be off the hook.

“Highly customizable,” of course, means “configuration hell” in practice. It also means content and data modeling, skills with which most librarians are not terribly familiar. A good content-model exchange would help, but once again, it won’t eliminate the need to get to grips with local content needs and sort out how to express them in software-actionable forms.

With all that, does the Loon prefer multiple silos? No. No, she doesn’t. Where she goes aground with silos is the solving-problems-in-parallel question. No library should have to sort out how to do file checksumming more than once. No library should have to sort out how to do full backups for more than one software stack. That’s ridiculous—and as unwieldy as bricolage solutions look, they do allow any given problem to be solved once and once only.

Do not, however, forget the Law of Conservation of Complexity. Nothing about any of this is simple.