I hosted the NISO virtual conference Open Data Projects on June 13, 2018. Quite a variety of open data projects were discussed in the packed agenda. What follows is a summary of only the first two presentations. If you are interested in open data projects, I encourage you to view the recording of the conference through Minitex My Library when it becomes available.
Shout it Out: LOUD (Linked Open Usable Data)
Robert Sanderson from the J. Paul Getty Trust kicked off the conference with Shout it Out: LOUD (Linked Open Usable Data). Showing a slide with the 5 stars of linked open data, he remarked that these steps are all about publishing standards, and not about consuming (using) linked data. If we really want that return on investment on publishing our data openly, we need to think about how the data is going to be used. We should be thinking about not just Linked Open Data, (LOD) but Linked Open Usable Data (LOUD). For libraries, archives, and other organizations working on linked data projects, we should be thinking about how that data is going to be consumed, and build that into our projects from the start.
Who is the audience for linked open data? The end user is likely the researcher, but the direct audience is the developer. Developers get to determine the usability of the data. The Linked Open Usable Data five star data principles would be:
- The right abstraction for the audience. Think of a car. The steering wheel works well for drivers to be able to operate the car. But the mechanic needs access under the hood to do their job.
- Few barriers to entry.
- Comprehensible by introspection. We except developers to read documentation about how to get their API to work with our data. Data needs to be understood directly without developers having to read the entire ontology created by publishers of linked open data.
- Documentation with working examples. The ontology needs to be up-to-date and easy to access.
- Few exceptions, many consistent patterns. Take a look at this M.C. Escher work he used to illustrate this point.
Open Data in Special Collections Libraries; or, How Can We Be Better Than Data Brokers?
The next presentation was by Scott Ziegler from Louisiana State University titled Open Data in Special Collections Libraries; or, How Can We Be Better Than Data Brokers? As digitization becomes more common in archives/special collections, new interactions with the materials become possible (mapping, text analysis, visualizations), but there are also risks, such as harmful misrepresentation of the open data we provide.
Louisiana State University recently published a suite of applications for viewing historic prison data from the 1800s. For this project, they didn’t open up everything (there is some cultural sensitive information that represent groups in racist, sexist, and other ways, along with personally identifiable information).
This project was under development while there was a lot going on in the news that brought to light how open data can be misused. This was the time period of the Equifax data breach, the Cambridge Analytica misuse of personal data of Facebook users, and the European Union new privacy law. Algorithms of Oppression (Safiya Noble), Automating Inequality (Virginia Eubanks), and Weapons of Math Destruction (Cathy O’Neil) are all recent books published about the misuse of data.
Data brokers collect information about individuals from a wide variety of sources, they package the data to create a profile of a person, and then sell the package to advertisers, credit agents, and government entities. Representing people in this way is often harmful, and the presenter remarked what bothers him about the practice is that these data brokers are representing people without their consent. Ziegler asked, are we better than data brokers? Our intentions in libraries, archives, and special collections are usually better. Cultural heritage organizations are not working on open data projects for money. Our subject is different as the individuals portrayed are often historical and no longer living. But, as noted in Safiya Nobel’s book Algorithmns of Oppression, intent is not particularly important. Outcomes and results (whether there is harm) is what is important.
How could we be better? The presenter admitted he does not have all the answers, but he has some suggestions.
- We can take advantage of the help already out there and benefit from the expertise of others. For example, we can bring humanities/social science research to the project implementation team.
- We can standardize the practice of asking for help and recruit representation officers for projects. A representation officer is a person in charge of investigating who is being represented in a digital project, and they can research possible partners from that group/community that we are providing information about. We should act as though the people being described will be looking closely at the descriptions.
- We can also clarify why we did what we did. In the description of the project, be transparent and include scholars or community groups you worked with during the planning phase.
It takes a lot of work. It’s work to read books and do research in this area, apply these ideas to our job, listen to criticism, and to try to get people to participate. But it’s also work to help us, and it’s work to explain things to us in a way that we’ll understand.