Gavia Libraria

Can linked library data disrupt OCLC? Part one.


Conflict has been simmering for a couple of years now over OCLC’s attitude toward the bibliographic records contributed to it by libraries and their catalogers. Certain elements within OCLC consider WorldCat and certain of its associated services to be a cash cow, and they seem to believe that the best way to protect that cash cow is to (insofar possible) disallow use of records contributed to and taken from OCLC beyond the confines of OCLC and its paying customers’ own ILSes.

This dog-in-manger behavior has not gone unchallenged. The Open Library got a bit of a boost from OCLC’s lawyers’ shenanigans, and OCLC member the University of Michigan released its entire MARC recordset locally-created MARC recordset under CC0 license, the subtext being a dare to OCLC to do anything about it. OCLC didn’t.

The Loon is not a lawyer, but in her best non-lawyerly estimation, OCLC couldn’t. The copyright status of bibliographic records is unclear and untested in court, but considerable portions of such records—certainly anything directly transcribed from an item in hand—are probably “facts” that in the US cannot be copyrighted, originality of expression being a necessary precondition for copyright assignment. The US also disallows sweat-of-the-brow copyright, so OCLC likely cannot protest that the considerable work it does normalizing, cleaning up, and deduping records earns it copyright in those records. Subject-heading assignment and much that comes under the “local practice” rubric might meet the originality standard, but OCLC does not do that work (and often obliterates local practice on import into WorldCat, for that matter), so OCLC can’t very well claim copyright in the resulting records.

OCLC might be able to claim a tenuous compilation copyright, but that wouldn’t much help it police individual record reuse. The situation in Europe is different, owing to sui generis database rights, but OCLC doesn’t seem to be raising any fusses in Europe based on this segment of law.

It just so happens, though, that WorldCat recently lost a potential European customer, the National Library of Sweden, because the terms OCLC offered would have laid unacceptable limitations on reuse of the records by the Library’s partners in and outside Sweden. Did enforceable database rights impact this negotiation? The Loon daren’t opine, but she wonders.

Lacking a clear copyright claim in the US, OCLC has been trying to jam reuse-restricting contracts down the throats of its members. The said members, as well as certain competitors, raised Cain over this, but OCLC has mostly prevailed thus far. (The SkyRiver lawsuit is ongoing, but seems stalled on a motion to dismiss. The Loon can’t guess whether this means settlement talks are happening in smoke-filled rooms.)


To what extent does a linked-data bibliographic universe, as currently posited by the US Library of Congress and others (since other national libraries and library consortia have been busily endorsing the LoC’s approach on the BIBFRAME mailing list), disrupt OCLC’s dog-in-manger business model surrounding WorldCat records?

What linked-data–based technological structures and processes would need to exist to supplant the MARC-related jobs OCLC currently does for libraries? What technologies become possible in a linked-data universe that actually improve upon what OCLC currently provides?

What will happen to OCLC and WorldCat in a linked-data bibliographic universe? What should?

Incipit discursus

Recall that linked data is based on individual statements (“triples” in RDF parlance), not anything that catalogers would know as a record. This unbinding (so to speak) means that bits and pieces of, say, a web page displayed to a patron about a book in a given library could come from several disparate sources: the publisher, library catalogers inside or outside that library, a third-party triplestore, whatever.

In theory, this is a let-a-thousand-flowers-bloom environment. Can’t get the title of a book from WorldCat? Query it from the Library of Congress, or (heaven forfend) the book’s publisher. (Please don’t hassle the Loon about capitalization. AACR2’s capitalization rules accomplish nothing whatever in an online environment; they may even be actively harmful to catalog-display usability, as they make titles look less like titles to patrons. Patrons, after all, are used to looking at book covers, where titles are in title case!)

Need a URL for an author? As OCLC itself points out, it is no stranger to linked open data; VIAF lives firmly in that world and bids fair to be an invaluable resource, WorldCat or no WorldCat, as the bibliographic linked-data universe assembles itself. (The Loon, incidentally, believes there is a significant disconnect between OCLC’s lawyers’ opinions and behavior and that of its researchers. She has therefore been careful to attribute the dog-in-mangering around WorldCat to the lawyers, not to OCLC as a whole.)

In theory, though, WorldCat could expect to find itself somewhat decentered, both as data supplier and data aggregator, once MARC gives way to RDF as a data carrier. In practice, the Loon doesn’t think it’s that simple. And as this post has grown to considerable length and involution, she’ll save the why-not for another. Do feel free to speculate in the comments.

5 thoughts on “Can linked library data disrupt OCLC? Part one.

  1. Peter Murray

    I think there is a part of OCLC that realizes the strategy of holding the WorldCat bibliographic database close to its corporate body is not a sustainable strategy. That is why we are seeing them move rapidly up-market to leverage the bibliographic database with acquisition and circulation services on top. From the perspective of the health of the cooperative, I think this is a good thing and I hope we do see a loosening of restrictions on the bibliographic data itself. OCLC Research is gaining quite a bit of experience running linked data systems at scale, so there is hope that somewhere there is a serious effort to turn the WorldCat bibliographic database into a huge linked data graph that is published freely to the world. That would be good for libraries in general and well in tune with its public purpose statement.

    Can’t to read the next part of your series.

    1. LibraryLoon Post author

      Yes, but do OCLC’s lawyers realize the strategy is unsustainable?

      If they do and they’re still writing those contracts, why do they still have jobs?

      If they don’t realize it at all… why do they still have jobs?

      (The Loon realizes that she may be traducing lawyers who are under strict orders from OCLC management. The same questions hold for whoever is ultimately responsible for this nonsense.)

  2. Ed Summers

    What a thought provoking post to start the year! Like Peter, I’m also looking forward to the next installment. As you point out the barriers to OCLC continuing to fulfill its mission as a de-centered hub among hubs for bibliographic data seem to be legal and not technical. I think WorldCat’s re-use of OCLC identifiers on the Web (e.g. http://www.worldcat.org/oclc/41238513) has set the stage nicely for a deployment of library Linked Data. If only we could have a bit of clearly licensed Microdata, RDFa or externally referenced XML/JSON data sprinkled into those pages we would be well on our way. OCLC needs to be thinking about how to get more people to use these URLs, to link to those pages as an authoritative source for what a book is, much as people link to Wikipedia for topical information.

    A few technical challenges I think are ahead include: 1) using VIAF identifiers to reference authors, subject, etc in book metadata 2) establishing web identifiers for Works, that bundle together the various editions, and relating them to works elsewhere on the Web (LibraryThing, OpenLibrary, Freebase, etc) 3) establishing data update streams so that interested parties can keep their local copies of the data synchronized.

    But much of this is predicated on a cultural shift in how WorldCat data is licensed, so that it is clear it can be reused. The use of the Open Data Commons Attribution license in the recent release of FAST as Linked Data is a really good sign. I’m no lawyer either but I half wonder what relationship http://www.oclc.org/research/activities/fast/odcby.htm has to http://opendatacommons.org/licenses/by/1.0/ They licenses have the same name, but are they the same since the OCLC license requires you to use their URI?

  3. Adrian Pohl

    Yes, a thought provoking post.

    I got a minor correction which is quite important in this context:
    The University of Michigan library hasn’t published its “entire MARC record set” but only the “bibliographic records that were originally created by University of Michigan Libraries”. Obviously, they didn’t want to test OCLC’s reaction to publishing the whole UMich dataset. (See John Wilkin’s post at the OKFN blog.)

    Regarding linked data and openness, I don’t think that linked data somehow naturally brings openness with it. Because you are free to publish Linked Data and – after your service is successful and heavily linked to by others – you can put it behind a paywall later, so that you have to pay for getting to your triples. (Think about Googles move to charge for heavy usage of the maps API.) That’s why Linked Open Data is so important for a sustainable data ecology. “Open” according to the Open Knowledge Definition does not only mean open licensing of the whole dataset as well as of its parts. It entails making the whole of your data openly available (via dumps and updates) and not only providing APIs or a SPARQL endpoint. Like this, long-term open access to and re-usability of the data is guaranteed because anybody is -legally and technically – able to fork a data service if it is taken off the web or put behind a paywall.

    As Ed says, publishing FAST under an open license is a good sign but there is still no dump of the data available…

    1. LibraryLoon Post author

      You have anticipated some of the Loon’s arguments. Thank you for that!

      The Loon will amend her post to reflect your correction; thank you for that as well.