On ebook formats and preservation

When the Loon teaches research-data management to future data-stewardship staff, she leans hard on the notion that stewardship starts well before the data even exist. It’s not something that can be haphazardly tacked on at the end.

An inescapable corollary of this point of view is that not all formats or versions of data are meant to last. An inescapable corollary of that is that the most durable product of a given process may not be the targeted end-product. And this is a notion that all too many librarians and archivists struggle with, unsurprisingly given their typical position at the very last end of data- and information-production processes.

It’s for this reason that the Loon thinks trying to preserve or can-open old-school end-user ebook formats is the wrong way to go about the preservation of ebook content (which is, she must presume, the aim in view; one doesn’t generally reverse-engineer formats just for the fun of it).

Most of the end-user formats referenced in the above post were distilled out of what is now the .epub standard. Half-hysterical objections thereto aside (the Loon was not especially impressed with that report), most preservationists will be fairly pleased at what they see in even old-school pre-.epub ebook packages: XHTML, CSS, JPEG and PNG, odd bits of XML. Eminently preservable materials, in other words—if they’re got at before they’re DRMed and distilled into binary worthlessness.

The notion that the end-user formats were essentially throwaway but the production format from which end-user formats were to be distilled should be both archivable and revisable/migratable was, in fact, exactly what drove the committee that originally designed what is now .epub. (The Loon can say this with some authority because she was there at the time.) There would always be new end-user formats, the committee conceded. To keep ebook production from bankrupting publishers, said the committee, those new formats should be trivially distillable from the same old archival files! Oh, and revisions for a new edition? No problem; the same old archival files are easily updated and re-distillable.

So it’s the archival files digital preservationists should be targeting, those .html and .css and .jpg and .png and .opf files. The end-user formats are transitory. They were meant to be! Back in the old days, there was occasional talk of archival escrow for ebooks, much as national libraries achieve print escrow now. The Loon wishes it had happened; that’s the correct approach. Perhaps the DPLA could take up that drumbeat?

Can-opening RocketBook files and .lits and whathaveyou isn’t wholly useless; publishers were and are rubbish at understanding and implementing digital preservation, so for a few ebooks these end-user formats are all we have. (Historical reasons obtain as well: the death of some early ebook conversion firms left a litter of IP- and bankruptcy-encumbered archival files in a limbo from which most never emerged.)

As the only ebook-preservation strategy, however, it’s shortsighted, techno-messianic, and rather absurdist. The parable of the man seeking his lost wallet under the streetlamp despite having lost it in the dark applies. The archival-quality files aren’t the ones that were released to the public’s ebook readers!

Creative Commons License
On ebook formats and preservation by Library Loon, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 3.0 United States License.

This entry was posted in Digital librarianship. Bookmark the permalink.

Comments are closed.