I attended Putting the Link in Linked Data, the 2nd webinar in the From MARC to BIBFRAME series, offered by the Association for Library Collections & Technical Services (ALCTS). Each webinar in the series builds off the one before it, so there will be several more articles published about the series, which runs through November, 2016.
The presenter, Nancy Lorimer (Stanford University), began by talking about where we are right now with our bibliographic data. You may have heard the statement that moving our library data towards linked data will make it more accessible on the web. This can be a confusing statement; a search in Google for the novel All the Light We Cannot See returns results from the Stanford library catalog. However, this is one webpage (Google results page) linking us to another webpage (record from the Stanford Discovery Layer). A match for a text string typed into the Google search box was found through Google’s algorithm. In this example, documents are linked together through text strings, but linked data is about linking data elements through identifiers in the form of URIs (Uniform Resource Identifiers). If you type a text string into a Google Search box, a machine doesn’t necessarily know that it is searching a title. As linked data, the information would be encoded in the data so a machine would know it was searching on a title.
Next we looked at the MARC record for the book All the Light We Cannot See. Many of the values in the MARC data use controlled vocabulary, with various subfield 2’s ($2) to show the type of vocabulary used, such as this geographic subject that shows the term is a FAST subject heading:
651 7 France ǂz Saint-Malo. ǂ2 fast ǂ0 (OCoLC)fst01213320
The MARC record we looked at also contains many identifiers, such as multiple ISBNs, the LCCN number, and control numbers that link to authority records for terms used in the MARC subject fields. There is also some information that could be using controlled vocabularies but it is hidden in notes within the MARC record. Despite all the identifiers that are in this MARC record, the relationships are not transparent to a computer. A human could read through the MARC record and see the relationships between all the data and the resource All the Light We Cannot See.
So what does linked data offer that MARC does not? In MARC, data elements are only linked together within the context of the record in a larger database, identifiers are text strings, and relationships are expressed in free text notes, or in controlled text strings that require a human to interpret the data. With linked data, elements are linked together as independent statements and not tied to a record format, identifiers are URIs (Uniform Resource Identifiers) and are machine actionable, and the relationships are machine actionable, expressed as triples using URIs.
So how do URIs and linked data fit together? The presenter covered why URIs are important, starting with the 4 rules that govern the basic structure of linked data:
- Use URIs as names for things.
For example, following this rule, an author, Doerr, Anthony, 1973- would be expressed not as a text string, but as a URI such as this: http://viaf.org/viaf/79550093
- Use HTTP URIs so that people can look up those names.
For example, A URI for a web document is a URL, such as the following URI that takes a user to the Wikipedia page for the novel All the Light We Cannot See: https://en.wikipedia.org/wiki/All_the_Light_We_Cannot_See
A URI for a thing (like an author) could look something like this: http://viaf.org/viaf/79550093. This takes a user to the entry for the author Anthony Doerr in the Virtual International Authority file.
- When someone looks up a URI, provide useful information, using the standards (SPARQL, RDF).
Your URI should link to something a person could read, like a webpage. For example, following a URI to the music artist Leonard Cohen brings me to the MusicBrainz page for Leonard Cohen, and there’s a lot of other useful information on that page for the user to browse like a list of albums produced and biographical information about the artist.
- Include links to other URIs, so that they can discover more things.
For example, Leonard Cohen’s authority record in the Virtual International Authority File (VIAF) has additional URIs that take a user to IMDB, Spotify, AllMusic, etc.
Where do we find URIs to use in our bibliographic data? There are many hubs in many subject domains on the web. Some of the well-known ones for library names include ISNI (International Standard Name Identifier), the Union List of Artists’ Names (ULAN), and the Virtual International Authority File (VIAF). Other sources of URIs for entities include the Getty Art & Architecture Thesaurus (AAT), GeoNames, Wikidata, and Library of Congress Subject Headings, to name a few.
Authority Creation vs. URI-based Entity Management
How does this relate to our own authority work and authority records we create in libraries? Using URIs instead of text strings to uniquely identify entities opens up new possibilities, and challenges. Using URIs moves the emphasis from time intensive authority record creation to the task of disambiguation. At it’s most basic, it’s assigning a URI to an entity (like a name/author) to help disambiguate it from another entity. URIs can be taken from a national source like ISNI or VIAF, or minted locally. Entity management could also mean linking one URI for the thing to a different URI for the same thing (e.g. a locally minted URI is the “same as” an ISNI URI). Viaf.org is already including many same-as relationships in their data. As mentioned earlier, the viaf URI to describe Leonard Cohen has links to other minted URIs for the same Leonard Cohen, like the ISNI URI, the dbpedia URI for Leonard Cohen, etc. This means that if I assign the VIAF URI for Leonard Cohen, I’m already connecting a user to several other resources.
URIs do not sit on their own, they are connected through relationships with other entities. For example, a URI for a person would generally be related to a URI for a particular publication. Often, intervention by humans (catalogers) is still needed to create the correct relationships between entities.
URIs and BIBFRAME
Where does BIBFRAME fit in when it comes to URIs? BIBFRAME is based on RDF, the Resource Description Framework, which is the basic specification used for modeling data as linked data. RDF structures data into triples that includes a Subject, Predicate, and Object. Let's model the statement "This book has the title All the Light You Cannot See" as an RDF triple.
Subject: This book
Predicate: has the title
Object: All the light you cannot see
This is just one triple. We can go a step further and use URIs in the triple like so:
Subject: http://www.worldcat.org/oclc/852226410 (This book)
Predicate: has the title
Object: http://sul.stanford.edu/101010 (All the light you cannot see)
In BIBFRAME, we can assign more URIs that reflect the ontology.
This thing is also a text, so we can assign this BIBFRAME URI too: http://id.loc.gov/ontologies/bibframe/Text
Predicate: has the title
We can assign the URI for the BIBFRAME title property: http://id.loc.gov/ontologies/bibframe/title
We can assign this BIBFRAME URI too: http://id.loc.gov/ontologies/bibframe/WorkTitle
I didn't include the full example that the presenter shared, but hopefully you can see how the BIBFRAME ontology is used in RDF, and how URIs are used as well.
Potential Workflows for Inserting URIs in MARC
When we convert MARC into BIBFRAME (or any type of linked data), URIs don’t just magically appear on their own. The presenter shared a few possible workflows for getting URIs into MARC, in preparation for converting to BIBFRAME. There are four basic points at which you might add URIs to a MARC record to ready them for conversion to BIBFRAME.
- Receive copies from a source that already supplies URIs (Casalini Libri is one vendor that is ramping up to provide this service). Or for copy cataloging, if you were getting your records through OCLC, there could perhaps be an option to download the records with URIs into your local catalog.
- Get URIs from an enhanced version of your own local authority file. Especially if you don’t want to wait for vendors to do this work, you could enhance your own local authority records with URIs. Stanford has been experimenting with this method in their work with BIBFRAME.
- Add identifiers while cataloging through various local lookups or by minting local URIs. Using this method, ideally you would use a tool or application that could go out and look up the URIS for the entities in your bibliographic records. If you go the route of minting your own local identifiers, you still might want to also reconcile these with national identifiers at some future point in time.
- Bring them in after you convert to BIBFRAME, through an outside service like an authority vendor.
The next article will include highlights from the 3rd webinar in the series "Embedded URI in MARC: an Essential for Linked Data."