(Apologies for the post title, which is a groaner even for the Loon. The Loon simply could not resist it.)
The Loon, as she mentioned, has quite a few reasons to want to see OCLC staked and turned to dust, but let us get one out of the way as quickly as possible. Any organization, for-profit or non-, with the naked audacity to think hiring female furniture is a dandy livener for a professional conference reception needs to be buried and the earth over its grave salted. There is no excuse for that. It was and is sexist, dehumanizing, and wrong.
Enough of that repellent subject. (It’s not as though anything will be done about it at this late date. OCLC ignored social-media outcry at the time.) A little more technobabble-inflected history for you. Now, the Loon and her Boring Alter Ego are not and have never been catalogers (digital-collections metadata is more the BAE’s métier), so it is likely she will miss or mistake some details; she apologizes in advance for such solecisms. In the main, though, she thinks she can explain this pile of guano. Oh, and the Loon should perhaps mention, just for the most transparency possible, that her Boring Alter Ego was paid once to present at an OCLC event (at which, naturally, she bit the hand that was feeding her good and hard), and has been headhunted for positions at OCLC more than once (to the horrified amusement of some of the BAE’s work colleagues). The jobs would have meant immensely more money, to be sure, but the Loon’s conscience straitly forbids, as does her unshakable need to eschew workplaces that hire female furniture.
Shared cataloging did not begin with computers; thanks to early standardization of catalog and catalog-card size (and, it must be said, the early near-monopoly of one small-l loon named Melvil Dui), the Library of Congress began printing and selling catalog cards for US libraries sometime in the earlyish 1900s (the Loon isn’t sure exactly when; 1910 or 1920 maybe?). The efficiency of centralized record creation and card production should be obvious—every single library Library Handing or typing up a card for every single commodity-published book is a fairly ridiculous notion if there’s any other way to do it.
Libraries began the process of computerizing catalog records in the 1960s—likely even earlier, considering. (There exists a recently-revised book about corporate data management that contains the unaccountable sentence “An organization without metadata is like a library without a card catalog.” The Loon just… hasn’t the fight in her to tell its publisher what steaming guano the library side of that analogy is.) The data structure called MARC (for “machine readable cataloging”) designed by the peerless Henriette Avram shaped that transition, and is still (pace encoding changes, loosening record-length restrictions, a few other smallish tweaks, and the scourge that is “local [cataloging] practice”) in use in most US, UK, and Australian library systems today. Variants on MARC and MARC-based cataloging practice exist elsewhere in the world as well.
The Loon cannot bugle loudly enough that MARC was not designed for efficient information retrieval, neither searching nor browsing nor querying nor filtering nor faceting. It’s really quite bad at all that; ask any ILS developer, if you can abide the ensuing swearing. MARC was designed as a source format for mass-producing catalog cards. (The Loon does wonder sometimes what Avram knew about SGML, if anything. She might well have eschewed it for storage inefficiency. Will some enterprising Ph.D candidate in library history kindly get off their tail feathers and write a biography of Avram as their dissertation? It’s long past due.)
Mass production of catalog cards, easier record correction and updating, and sharing the cataloging load was the whole point of MARC. Remember, at this time photocopiers were not a thing, or at least not a thing within reach of most libraries. Unfortunately, libraries were already accustomed to mostly-centralized cataloging and computers were not common enough or well-enough networked at the time to build a viable peer-to-peer system, so OCLC swept right in to take the Library of Congress’s place as central record provider, swiftly becoming a de facto monopoly across most of the English-language-cataloging world.
It is no mystery how monopolies behave; economists are rather tiresomely repetitive on the subject. We see ourselves today in a situation where purported “non-profit” OCLC pays Skippy the Audacious Pritch over a million and a half a year (per the ICOLC report, which the Loon strongly recommends that you read), and sues everyone in sight, from a hotel with the temerity to paint Dewey numbers on its wall (OCLC also controls DDC) to a potential competitor (SkyRiver; long story the Loon won’t tell except to point out the anti-competitive behavior) to Clarivate/ExLibris. Charming people, OCLC. Just amazingly gracious and collaboration-minded folks, not threatening or self-dealing or absurdly entitled witches at all.
In whatever limited fairness the Loon can muster about all this, she will say that by and large she admires the work of OCLC Research. OCLC didn’t build the unit from scratch, though; it bought Research Libraries Group sometime or other in in the late oughts. Even so, a lot of good and worthwhile work has come out of that particular think tank, though the Loon will never understand why they employed the rather awful Jackie Dooley for a time. In fairness to the fairness, OCLC also has something of a track record of abandoning useful projects it can’t work out how to make money from, sometimes ones originating in OCLC Research; the way it dumped the PURL(.org) permalink scheme without a word beforehand to anyone relying on it was simply slipshod.
So OCLC’s main reason for existing is the tooling (“OCLC Connexion”), aggregation, correction, enhancement, sale (via subscription/membership model), and presentation (via WorldCat) of bibliographic and holdings records created by its member libraries. Yes, you read that right—unlike the Library of Congress back in the day, OCLC doesn’t itself catalog anything. Enjoy the parallels with scholarly communication! And yes, this also means that WorldCat’s name is a lie; it is not a truly global union catalog because many libraries, especially those where English is not a first or common national language, are not OCLC members. This is not to say that OCLC does not do real work; like even the laziest, most entitled of the big-pig publishers, it does. It is absolutely to say that OCLC, like the big pigs, holds libraries (collectively) to grossly elevated ransom hugely disproportionate to the actual work it does. The surplus seems to go to Skippy the Overpaid Pritch and lawsuits, mostly… and remember that the said surplus is extracted from libraries.
(There is a rant the Loon may yet rant on how often libraries blunder into this type of exploitation, partly due to inability to initiate, much less maintain, a proper commons. OCLC, big-pig publishers, the ILS, the institutional repository, many kinds of proprietary software, more… but not today.)
The key to this whole system, computationally, is an OCLC-specific identifier for bibliographic records called the “OCLC number.” (If any OCLC numbers turn up in MetaDoor, the Loon thinks Clarivate/ExLibris will be wholly unable to make any case in court that the records containing them didn’t come from OCLC, one way or another.) The OCLC number ties together record creation, record merging (i.e. of records from different catalogers/libraries/vendors describing the same “thing,” and please do not ask the Loon to define “thing” here because she will only weep copious linked identified FRBR-shaped res-and-nomen-flavored tears from her beady red eyes and no one wants that), record corrections and updates, and connections to other library systems and processes such as interlibrary loan. For probably-obvious reasons, this number, though it properly identifies only a bibliographic record, is often used as shorthand for whatever thing (see above about defining “thing”) the record describes.
Rather like—and if the Loon had hands she would be jazz-handsing—a linked-data URI. Like URIs for RDF, the OCLC number is the lynchpin of OCLC’s bibliographic enterprise. Indeed, if not for OCLC playing dragon-on-the-hoard, the OCLC number might have near-seamlessly evolved into the linked-data identifier for the biblioverse, for the objects (not to say “things”) within its purview. As it is, any competing system, especially one with an eye to linked-data friendliness, will have to whomp up a whole new record identifier. The Library of Congress can’t easily step in here; its holdings and recordset aren’t nearly as extensive as OCLC’s. Wikidata, for all its curious and generally delightful boldness, is not a bibliographic-record database, and (from what little the Loon understands about Wikibase) might well not scale to one without the servers falling over dead.
With that for background, what is it that MetaDoor is supposed to be, and how is it supposed to compete? Huge caveat for the ensuing discussion: the Loon has no insider knowledge, and Clarivate/ExLibris is playing its cards close to its chest at present. The Loon is of necessity making some educated guesses here.
Like OCLC, MetaDoor is intended to be a database of bibliographic records contributed by library catalogers. Unlike OCLC, MetaDoor is (to start, anyway) not playing dragon-on-hoard; the records will be open-licensed such that any given record up to the entire database is takeable and forkable. A later enclosure play is quite possible—“records contributed until now are CC0; henceforth, we are pulling an OCLC, such that we own whatever you put in, and you buy it back from us.” There would be outcry, but Clarivate/ExLibris need simply bet that libraries are too foresightless, cheap, and fighty to work out how to fork the database and collectively maintain and add to it—and the Loon must say, that is a very smart bet on C/ExL’s part.
Conspicuously missing from what the Loon has seen about MetaDoor is any mention at all of the sort of record deduplication, correction, and enhancement processes that OCLC routinely performs on contributed records. Bluntly: MetaDoor will be a wild abyss of near-total chaos. The Loon doesn’t think Clarivate/ExLibris (which, after all, builds a major ILS) harbors any delusions that the quality of contributed records will be high, or even uniform. Instead, she suspects that libraries will be gently encouraged toward a sort of peer-to-peer copy-cataloging system, in which catalogers look for libraries that do good work and set up their systems to adopt those libraries’ records. If MetaDoor is thinking toward linked data, another way to approach this would be to start breaking down MARC records into granular datapoints that could be queried to suit, or built up into decent-enough MARC records. (If the FRBRoids had actually known anything about real-world relational database design, which they did not, this breaking-down and reconstitution could have begun two decades ago, but once again, here we are. Some days the Loon just despairs of librarians, or at least librarian standardistas.) The other consequence of this free-for-all is that MetaDoor will not easily, or perhaps at all, be able to build an analogue to WorldCat.
Other developers might, however, if they are willing and able to take on MetaDoor’s chaos and withstand the probably-inevitable lawsuit from OCLC. Other developers might do a lot of things, possibly quite useful and attractive things, with the data in the MetaDoor database. At least to start, Clarivate/ExLibris will be happy to let them! Any win for MetaDoor chips away at OCLC’s de facto monopoly. Beware the day OCLC folds, however; the logical business thing for Clarivate/ExLibris to do then is pull a Twitter, destroying useful APIs or charging through several available orifices for access to them.
In the Loon’s trawl through the abovementioned corporate data management book, she learned that there is a business term for the kind of chaotic mess MetaDoor is likely to be: “data lake.” Throw all the data in, forget about quality, just dump it in and see what falls out. Over time, if the data in the lake is at all useful, busy IT beavers will start cleaning up and organizing the data they’re interested in, prodding data creators into better data-quality practices, separating out coherent chunks of data into data marts, building data marts up into data warehouses, and so on. Is Clarivate/ExLibris cynically hoping that library developers will cheerfully fix up MetaDoor’s data lake at no cost to it? Seems likely. Also a decent bet, the Loon thinks—cheerful librarian fixers are a big part of how OCLC became what it is, and ExLibris constantly dumps a ton of uncompensated quality-control, usability-testing, accessibility, assessment, development-strategy, and other work on systems librarians as it is.
If both OCLC and MetaDoor sound like grossly exploitative and unfair systems, well, this is why the Loon wishes a pox on both their houses. Is there a path out?
If the Loon ruled libraryland, she would pull together a bunch of library CIOs (they’re not often called that, but they do exist) and knock their heads together until they agreed to cough up the collective funding and development/systems effort to mirror (well, mirror plus delta) MetaDoor data on a regular basis, and provide value-add services such as APIs at low (including sweat-equity) or no cost. She would then pull together a bunch of library cataloging luminaries and knock their heads together until they agreed to build record pipelines into the CIOs’ record commons alongside whatever other pipelines they have—transparently-documented pipelines that do not invoke the wrath of Skippy the Litigious Pritch and his horde of slavering lawyers.
That way, if MetaDoor tries enclosure, a replacement record commons will already exist (and not have to be built from scratch in a tearing hurry) and the cutover for libraries and their catalogers will be a lot less painful than it would otherwise be. As that commons becomes cleaner and more sophisticated, even if MetaDoor doesn’t try enclosure, it will become an increasingly viable, likely less-expensive alternative to both MetaDoor and OCLC. Virtuous circle!
But the Loon does not rule libraryland, so we shall all have to wait and see.