Panel: Metadata! Metadata! Metadata! | Library Publishing Coalition

Friday, May 10, 9:45-10:45am
Room: Barrick Gold Lecture Room (1520)

Title: Collaboration Through Richer Metadata

Presenter: Shayn Smulyan, Crossref

Description: Metadata is crucial to discovery, access, citation, linking, and metrics. Crossref, as a member organization, represents the collaborative efforts of over 10,000 publishers to build and maintain an infrastructure which collects and distributes that metadata to the broader scholarly community.

This presentation will illustrate ways that library publishers can participate in the Crossref community and benefit from the corpus of enriched metadata they collectively create. I’ll highlight recent improvements to metadata deposition and tracking tools; outline metadata practices that promote connections between published content items and other scholarly objects; and preview some upcoming partnerships between Crossref and allied organizations like ROR, Metadata 2020, ORCID, and Datacite. These collaborative efforts are working to build an interconnected network of scholarly metadata, which makes published content easier to find, cite, link, and assess.

Title: Lemons into Lemon-aid: An Update on Turning PKP’s Metadata Problems into Actionable Challenges [slides]

Presenters: Mike Nason, University of New Brunswick Libraries and PKP; James MacGregor, PKP

Description: PKP’s applications have been around for 20 years, and can be found distributed all over the web. This distributed nature, combined with the one-size-fits-most approach of designing the software, has resulted in a fascinating ecosystem of metadata abuse enablement, where publishers seek to fit their metadata into OJS and OMP in such a way that makes sense from a display perspective, but often is incorrect from a metadata perspective. This metadata abuse significantly impacts the journal authors and publishers; has implications for citation accuracy; and impacts the work of any researcher who use this data corpus as a means to evaluate global scholarly publishing trends.

As part of Coalition Publi.ca (a national scholarly publishing research and infrastructure project that requires high quality metadata to support its aggregation, preservation and data corpus research components), as a sponsoring organization for publishers in Crossref (which harvests metadata and provides additional resources such as reference linking, plagiarism checking, and Crossref Event Data), as a provider of a long-term preservation system (PKP’s LOCKSS-based Preservation Network), and as a direct participant in research on scholarly publishing (which demands the availability of high quality metadata to ensure accuracy of any data evaluation) PKP has a duty to ensure accurate metadata across this distributed ecosystem, regardless of whether we personally host the content.

This presentation provides an update on where we are with this problem. We will discuss how we have worked with Coalition Publi.ca systems staff and metadata experts, PKP researchers, Crossref support staff, an iSchool practicum student from the University of Toronto, and our own support staff to establish the nature and extent of the metadata accuracy problem. We will also discuss the tools we are developing, and the changes we are proposing to the PKP development team, that will a) help to identify and correct legacy metadata issues for longstanding publishers using PKP tools, and b) ensure better metadata hygiene going forward.

Title: Wikidata: Open Linked Data for Library Publishing

Presenters: Jere Odell, IUPUI University Library; Ted Polley, IUPUI University Library; Mairelys Lemus-Rojas, IUPUI University Library

Description: Wikidata, a collaboratively edited, open, linked data knowledge base hosted by the Wikimedia foundation, includes a growing collection of open citation data. As of November 2018, more than 20 million publications and 160 million citations have been contributed to Wikidata (http://wikicite.org/statistics.html). Many of these data items have been added by bots that contribute data from open bibliographic databases, including PubMed Central, and from data made available by Crossref and the Initiative for Open Citations (I4OC). Although this approach may be the most efficient way to build a large corpus of open citation data, many scholarly journals will be missed. Journals that cannot meet the requirements of a Crossref contract (for financial or technical reasons) will be invisible in growing open citation network. The journals that are likely to be missed are also those that have not been well-served by for-profit publishers and large university presses–including print journals that flipped to open access and journals in fields that are unfamiliar with or unconvinced of the value of a Crossref DOI (e.g., law reviews and some arts and humanities journals). In this presentation we demonstrate how a library publisher can contribute bibliographic data to Wikidata. By using both manual and batch-processing methods, we contributed complete runs for selected journals hosted on our library’s instance of Open Journal Systems. We share our methods for contributing data for journals that mint DOIs and for journals that do not. We also provide a demonstration of the short-term benefits of building this collection in Wikidata and reflect on the challenges of including Wikidata in a library-publishing program.