Gavia Libraria

Stoning Goliath

As soon as the Loon heard about Clarivate/ExLibris’s “MetaDoor” initiative, she knew OCLC would not be pleased. She placed a good many mental quatloos on a lawsuit, which has now materialized.

The Loon’s read of the lawsuit materials so far is “fishing expedition.” If OCLC had proof that WorldCat records had ended up in MetaDoor, it would gleefully have exhibited that proof to the court. It doesn’t. It’s fishing for the slightest shreds of evidence that Clarivate/ExLibris might have wink-wink-nudge-nudge hinted to its MetaDoor beta participants that copying over WorldCat records would be acceptable, or that MetaDoor does not make any effort to keep WorldCat records out. Lacking those, OCLC wants to make extraordinarily clear that WorldCat records just better not end up in MetaDoor.

What an ignominious development in the history of the MARC standard, which was explicitly designed for record sharing. If the Loon thought Skippy the Unconscionably Overpaid Pritch had any scruples whatever… but there, it’s clear he doesn’t after his video response to the cogent, clear ICOLC report on OCLC’s pencil-mustachioed villainy. (The said video has vanished, or the Loon would link thereto.)

The Loon has no particular love for Clarivate or ExLibris, it must be said, no more than for OCLC. A pox on both their houses; if this were merely a matter of Godzilla and Kong slugging it out for filthy lucre, she wouldn’t care who won. It’s not, though. There’s a David eyeing up the OCLC Goliath: libraries themselves, whom OCLC has extorted and hung out to dry for years, if not decades.

The Loon can’t say David doesn’t in some small part deserve Goliath’s ill-treatment. David could have seen the monopoly coming and worked against it, or built a properly communal alternative to it. It is a true pity (and embarrassment) that libraries are so bad at collaborating on matters involving technology; Hathi Trust is very nearly the field’s only success in that space. David could have forced Goliath into open licenses for WorldCat data. David could have yelled—librarians are quite good at yelling, actually—for Goliath to stop playing dog in manger. David sat around herding sheep catalog records and whistling, instead, while Goliath busily enclosed his pastures, started charging him usurious rent for their use, and stole his sheep records without recompense.

Parallels to scholarly communication are rather stark. Publishing is a set of services. So are catalog-record enhancement, correction, and aggregation. Yet both big-pig publishers and OCLC turned a service set into egregious rent-seeking through squatting on the results. (The analogy falls apart a bit when it comes to copyright, admittedly: the copyright status of catalog records is unclear and disputed, and outside Europe OCLC would have no hope of making intellectual-property arguments about WorldCat as a whole or its record enhancements and corrections—which are indisputably real!—in particular.)

All that said, it is likely in David’s best interest for Clarivate/ExLibris to win this specific lawsuit. MetaDoor has been designed, as best the Loon can tell, to be open enough for forkability: CC0 licensing, peer-to-peer record sharing, and so forth. That doesn’t completely remove the possibility of Clarivate/ExLibris pulling a 1990s-Microsoft “embrace, extend, extinguish,” of course—but the point is, OCLC already has a de facto monopoly desperately in need of breaking. If MetaDoor can accomplish that, more power to MetaDoor. Can it? The Loon cannot read judges’ minds, ergo dares not opine.

The Loon also thinks there is a third path here. It is a steep, rocky, difficult path, but a better one than either MetaDoor or WorldCat: ditching MARC altogether for a linked-data cataloging infrastructure. Smartly built, this could supplant and ultimately destroy current-generation catalog recordstores and make the embrace-extend-extinguish strategy far more difficult to re-implement. Let OCLC and Clarivate/ExLibris sue each other into extinction over MARC-based assets inexorably shrinking in value. A pox on both their houses.

Such a replacement infrastructure obviously cannot spring forth like Minerva from the head of Jove. Worse still, it faces significant headwinds, some technical but most organizational. Technical headwinds include:

  • the utter badness and unimplementability of linked-data standards, from RDF through OWL to SPARQL and pray do not mention the excrescence that is RDF/XML in the Loon’s presence—linked data standards were just badly designed and the Loon is entirely done pretending otherwise
  • the sparseness, gappiness, and badness of linked-data tooling, both inside and outside librarianship
  • the genial badness of BIBFRAME’s design, and the grotesque worseness (if that is a word) of its toolsets; competitor RIC-O achieves the signal distinction of an even worse design than BIBFRAME
  • the global multiplicity of linked-data models for bibliographic description—catalogers are, for good or ill, accustomed to One Standard to Rule Them All
  • the near-nonexistence of crosswalking tools from MARC to any of the available linked-data models and back (which is, in fairness, a hard problem; MARC is a disaster, computationally)

The organizational headwinds—beyond the staringly obvious “most libraries and librarians have been all-but-impossible to move off MARC”—are even more complicated. One blockage is NISO, run by a wannabe Skippy the Pritch named Todd Carpenter, which nominally owns the USMARC standard and is adamantly (if quietly) opposed to any move away from it, as that would diminish NISO’s importance (such as it is) to libraries. (The Loon and her Boring Alter Ego are adamantly opposed to NISO’s continued existence for many other reasons not germane to this post, she feels she should say. The BAE is not at all quiet about this, so the Loon need not yodel much on the topic.) The Library of Congress should be a natural leader in the move away from MARC, but largely is not; the Loon cannot prove that NISO is partly or wholly responsible, but based on the BAE’s prior direct experience with NISO, she suspects it to be.

Another headwind lies in the concentrated near-monopoly integrated library system (ILS) market. ExLibris absolutely has the technical and UX chops to architect a linked-data catalog, and the market muscle to impose such an ILS organizationally, but to do so would mean severe depreciation in its investment in MetaDoor, so the Loon can fairly safely say it’s unlikely to happen. Of the few other survivors (there is no other word) in this market, the Loon thinks none of them can pull this off. Even the open-source Evergreen and Koha, which have no reason the Loon can fathom to oppose the notion, don’t have the money or developers to accomplish it.

A steep, rocky, difficult path, without question. What is the way to climb it? Is there a way?

Technically, the Loon sees little to be done presently about the rotten and rotting standards edifice underlying linked data. The Loon would cheerfully serve on a committee to replace the whole putrefying stack with something that actually works—designed with tables rather than triples in mind (this ship has sailed: the real world runs on tables, not directed acyclic graphs), in a syntax as essentially human-reader-friendly as Turtle (which is quite good), keeping the URL/URI identifier system (which was a genuine stroke of brilliance), integrating a validation system (including a requirements/constraints language and a spec for data-validation tools based on that langauge), building a query system that (unlike SPARQL) doesn’t knock servers over dead at the least typo and is not an invitation to DDoS attacks, and so on. (The Loon’s service would likely be quite similar to that of the I’m Bored Girl at the IgNobels: repeating “this is unimplementable; fix it” over and over until it’s not.) But that’ll be the day. We shall have to make do with what we have.

Technical possibilities do exist for smoothing the path, however. MARCEdit, were Terry Reese so inclined, could build even more linked-data tooling than it already has. The most useful innovation would be a linked-data retrieval service configurable to pull bibliographic linked data from any and all peer catalogs or other sources, with shareable configuration/translation/ETL files to spread the load of adding new queryable sources. Presently, the Loon believes, MARCEdit only works with the Library of Congress, and only for specific kinds of author and subject URI reconciliation. More is emphatically possible, even now, and what the Loon just proposed would lessen centralized datastores’ stranglehold on cataloging by making them provably replaceable.

With NISO likely stalemating the Library of Congress over BIBFRAME, the Loon would look to European national libraries—those few that aren’t abandoning their linked-data experiments, anyway, which is another severe organizational wound to linked bibliographic data—for sources of bibliographic and authority linked data. Evergreen and Koha could get into the game also, by making linked data bibliographic-record representations (most likely in JSON-LD, which the Loon dislikes but accepts as the best of available serialization options) available on all catalog pages. This embraces and extends MetaDoor’s notion of peer-to-peer record sharing—more so if they also build configuration/translation/ETL files for the Loon’s posited MARCEdit retrieval system.

The above enhancements, if successful, would likely gel into a pragmatic de facto cooperative-cataloging standard set. Once they have, pushing the said enhancements through a lightweight standards group like IETF would be a good idea. Or perhaps the Library of Congress could be prodded to adopt them. (Under no circumstances should NISO be allowed anywhere near them. NISO cannot be trusted.)

On the organizational side, the Loon sees some possibilities in the so-called “collective collection” in academic librarianship. (Yes, she is sensible of the irony in mentioning a construct named and popularized by OCLC Research.) The ripples if, for example, the Big Ten Academic Alliance were to swap collective-collection records among their ILSes in linked data instead of MARC would be substantial. That being a tall ask, a smaller but still-useful one would be a BTAA-specific linked-data aggregation of its collective-collection records, something like a Hathi Trust for bibliographic and authority linked data. A working model for consortium-level linked-data exposure and use is important, in the grand scheme of things. LibraryThing is another natural ally here; if their TinyCat were to both consume and produce linked data, that would be a game-changer.

Book and ebook vendors fear and hate MARC (and for good reason). The Loon thinks they could be inveigled into a less-persnickety system of making granular publication data web-available for their customers to snarf up via the Loon’s posited toolset. Indie and self-publishers would also cheerfully climb on board, were their platforms and tools capable of it—a WordPress plugin would go a long way here, as would a browser plugin that massages Amazon listings into a suitable catalog import. Google doesn’t care about books these days, but it seems just barely possible to talk them into turning their developer feed into a suitable catalog input.

In other words—and the Loon is grinning a beaky grin just now, because her BAE said exactly this to a roomful of library linked-data people about a decade ago—make it easy to rely on linked data, easier than it is to rely on MARC, and the library world will shift, from the smallest and poorest libraries upward… and David will at last stone Goliath to death with his linked-data slingshot.