Encoding Standards: ALA Annual Report 2017

Report of Meetings of Interest Pertaining to Encoding Standards at the Annual Meeting of the American Library Association, Chicago, 2017

Prepared by Jim Soe Nyun for the Music Library Association Cataloging and Metadata Committee

OLAC Cataloging and Policy Committee

Friday June 23 7:30-9:30
Kelly McGraff reported that because of the 3R project RDA development is frozen. Pay attention to IFLA LRM. Meeting Monday with RSC to hear about lacunae.
Cate Gerhart: Report on what MAC will be doing. Mentions ISO 3166 (UN list) has a countries list, some without official cred, and they have a disclaimer there. Mentioned proposal for accessibility metadata that was coauthored by OLAC and CCM (Canadian Committee of Metadata Exchange). Mentioned 007 proposal for maps, another attempt to tie up some remnants from format integration. Raised issue about why maps are different from the multi-007 situation for other materials.

Janice Young report. Cleaned up 60Kish musical score records from pre-MARC days that were mis-coded as books, a part of a silo project to bring materials that were silo’ed. Quoted from Congressional omnibus budget legislation the language that addressed subject headings, language that mandates how terms are to be decided and how open the term-assignment process should be. Art form/genre terms project continues; close to being done, with new terms by the end of this year or soon afterwards. MPT manual, 18 instruction sheets, have been published in draft form. Need to open up suggestions soon, not just through SACO. Proposals for LCDGT a pilot still, and SACO methods can be used to add to it.

Jay Weitz, OCLC report can be distributed. OCLC has a new job opening for consulting database specialist. OCLC will be implementing the MARC update and they’re looking for guidance on implementing format updates. $6 will be added with the update where things are in the format but not in OCLC.

Karen Peters, MOUG Liaison report. Available online.

SMACR update, to look at how CAPC resources are updated.

DVD by July, Streaming by October aligned with changes to RDA.

Questions about Playaways cataloging…

Julie Moore, quick note on Objects Task force.

LC PSD has state they don’t have the bandwidth to take on adding video terms to LCGFT.

OLAC has about 75 terms that could go into a vocabulary. Hopefully LCGFT, maybe published on Open Metadata Registry.


Discussion of BP, single source combining all existing BFs to feed into toolkit.

Intro to 3XX project for new vocabulary values that have just been published. OLAC and MLA will collaborate on this.

OCLC Linked Data Roundtable: Stories from the Front

Saturday June 24 10:30-11:30

John Chapman, moderator

Karen Smith-Yoshimura

Representing translations as linked data
Discussed a project that looked at 5 philosophical works. The works had 4,073 WorldCat records for their various manifestations.
The study found different qualities of metadata; don’t always need to worry that there are good records that can make sense of ugly ones. Wikidata used to make sense of some of the mess. The project pushed into Schema.org extensions for “translator,” “translationOfWork” and whether Chinese script is represented using traditional or simplified characters. Wikidata has ca 180,000 titles and is limited (and limiting).
BIBFRAME at the Library of Congress, Jodi Williamschen
BIBFRAME (BF) 2.0 pilot, 42 editors from 1st test are now working on 2.0. Additional training and with more staff in July. Cataloging each item twice, as before. BF, then MARC. Distribution via MARC driving some of this. May include copycataloging and CIP authentication. Looked at conversions of title and name/title conversions. Includes “bflc” for things like LC demographics. Conversion includes labels. “Moonlight sonata” added as “related heading.” LC training through summer; BF 2.0 test scheduled for 6 months. Editor will be available soon, download and experiment.
Linked Data in Action, Amber Billey (Columbia’s Hyacinth metadata editor)
After stretching Omeka’s limits, decided to develop Hyacinth. Ruby Hydra/Hyrax tool pushes to “Publish Targets” of different sorts. Makes derivatives of resources, and permits custom metadata fields to support different content standards. Metadata export via spreadsheet. Autocomplete to point to controlled vocabularies. Mints local URIs and can harvest community ones. Columbia’s publish target supports their digital collections site. (They’re using MODSXML right now, serialization to RDF is next.)
Re-use is a big goal. Autocomplete helps with this, data stability, correctness, and reduces duplication of effort. Karen S-M mentioned that unique resources will be found, however. We start in a MARC environment, how to get to more linked data possibilities: starting with good descriptions, linked data can help, well, link; different labels can be pulled from identifiers attached to resources.
Show people more stuff, they’ll find more errors. Opportunity to correct errors.
ARTFRAME: descriptions of art objects in RDF linked data. How to display? Visualization is an opportunity
to develop new ways to showcase data. (No more linked data bubbles!)
Are we moving towards more “constrained” linked data? Re-use of terms allows for more limited vocabularies. (RDA is “overwhelming” from Amber Billey.) BIBFRAME worked so well because of the flexibility of BF. RDA is rigid, but there’s an unconstrained set that can be much more useful. Need to achieve balance of restriction versus flexibility.

“If we remain so specific we end up talking to ourselves.”–Karen Smith-Toshimura.
Gene Godby will be talking at the BF discussion tomorrow. They aren’t asserting FRBR work-ness in BF.
Amber: “Car without an engine” said 2 years ago, things are better now, but still a ways to go.
Look at bio taxonomies as a model for linked data possibilities for folks working in cultural heritage. We still need the tools. E.g., Connexion has needs, like piping in URIs without typing in everything.

Catalog Management Interest Group: Preparing for the transition to the linked data environment June 24, 1
-2:30 p.m.

ALA Task Group on URIs in MARC Group, Jackie Shieh

Looked at: RDF and MARC
Preferred and/or allowed URI syntax, and what it means to use it in catalog
Semantic issues
Workflow, usability without disruption
Community outreach
Developed subgroups on:
Pilot testing, scanning landscape for how inserting URIs would impact systems, led to some of:
Real world object
Work ID
MARC Objects/reconciliation

Formulating and Obtaining URI document: Stephen Folsom has a paper out later this year

MARC 2016-DP18: deleted wrapping parentheses from some URIs
MARC 2017-01: Redefining Subfield $4 to  Encompass URIs for Relationships in the MARC 21 Authority and Bibliographic Formats: was passed, defined $4 for all relationships (subsumes the relator and relationship usages of $4), either as strings or URIs. When using $r use the canonical URI, not necessarily the string that appears in your browser address window since they may not be the same.

URIs for authorities ($0) versus real world object ($1). 2017-08 passed this morning at Marc Advisory Committee.
The above two MARC changes will be announced in the MARC updates.
Generally, $0 for OBJECTS in RDF statements, $4 for PREDICATES.
Forthcoming survey: Enhancing MARC records with RDF URIs in $0.

Preparing for the Transition to the Linked Data Environment: Working with the Library.Link Network, Jack Ammerman
Boston University goals for initial linked data efforts:
Build internal LD capacity
Explore automated metadata enrichment with content beyond MARCC data
Move library resources into user workflow, how to construct a search
“Make Google better”
Current projects include: publishing bibliographic records to the web with Zepheira’s help; consuming their own linked data on their own platform. Sends out 3.4 million bib records enhanced with $0 for records with controlled authorities. Looked to enhance minimal MARC output with external links via scripted lookup. Looks at various sites, VIAF, LC, IMDB, WorldCat, etc. This is a scripted process. 12-16 hours for entire dataset.

Library.Link work includes expanded subject term by the script above.
Wants next to look at user experience, improve metrics for assessment, do more work with Ex Libris and Zepheira around linked data discovery, BIBFRAME readiness.

Newton’s Third Law, Nate Cochran, vendor (Backstage)
Looking at vendor-library relationship.
“Frankenstein-MARC” used an example of marking up a MARC record in areas where vocabularies exist. Issue with partial-match: PCC recommends not adding URIs in this case.  For collaboration with Stanford, got identifiers from LC, VIAF, ISNI and ORCID. Supplied identifiers into 92X fields. 100% LC, then less in VIAF, then further less in ISNI. ORCID has grown from less than 1M to 3.x M in just a couple years. Light abstraction layer intermediates between the enriched MARC and further output. Series collapsed into 490 0s, added URI. Was okay for Stanford, though could be problematic for others.
Welcome Library project, working with home-grown XML format for images. Inserted MESH and LC IDs into XML. Generated a near-match report based using looser match rules. Updated headings report generated.
Resources: PCC URI Task Group webpage
Open Community Registry of LOD Tools page
Library of Congress BIBFRAME Compare Tool: compare MARC input and BF output

None really

What Happens to the Library Catalog in the Age of  Linked Data?
3-4 p.m., arrived ca. 3:30
Phil Schreur on LD4P
Intro to project
Tracer bullets project to facilitate pathways for metadata management, eventually leading to Blacklight.
Copy cataloging component has been completed, copy cataloging done in acquisitions department, started with this for the project. Stanford Blacklight instance, Searchworks, used for this first phase. Developing Blacklight to do something with BIBFRAME linked data resources. Still working on mapping BF elements into repository elements, will be sharing plugin with Blacklight community. Working on an integrated discovery experience, integrating BF with other content. Casalini Libri developing BF enriched records. Casalini a SHARE-VD tool shown, allowing handy display of BF content.

Derek Pimenthal on The Library.Link Network. Denver Public Library participant, started in August 2015. MARC XML export, converted to BIBFRAME elements via Zepheira. In house working in MARC. Output 45000 records last year. Not efficient to switch to BF for cataloging at this point. DPL considers this a way to consider their cataloging in an outward-facing way. Discovery is the focus. Would like library-validated metadata to be easily distinguishable on the web. Near-terms we should improve clustering resources, FRBR-like. We should experiment and try out possibilities.

Linked Data and the Catalog: What Discovery Systems Can do Today, Steve Meyer, Madison, Wisconsin
Search-LD.library.wisc.edu: production catalog plus some linked data. Can pull things about authors as part of what the user is returned. Embrace a pluralistic attitude towards metadata. Linked data can help show other relationships, including exposing the influence of a work or creator. Talked about the extensibility of simple 3-part RDF statements. Complex assertions are possible. Talks about how MARC comes from a descriptive world at odds with the knowledge acquisition model in RDF, different goals.

MARC Format Transition Interest Group, What happens to the library catalog in the age of linked data.
David Pimentel (Denver Public Library), Philip Schreur (Stanford), John Chapman (OCLC), Stephen Meyer (UW-Madison Libraries) Eric Miller and Gloria Gonzalez (Zepheira)
Talking about different perspectives on current linked data work. Link. Share. Panel will look at their intentions of their work and what they want linked data to do for them. Talked about one site that integrates emotional Musical characteristics of music harvested from external site.

Responses to questions from previous session:
How do you see how libraries can participate in curating information on the web?
Stephen Meyer talked about integrating datasets that include attributions of their content. Philip Schreur
would like to focus on provenance, and look at tools that can help curate what information is used. At OCLC, websites are weighted by trust level, but feels that revealing the provenance could be important. Maybe fake news could be revealed.

How do you see the evolution of linked data impacting catalogs and cataloging workflows?
PS: MARC isn’t dead as predicted, and remains a looming presence, and permeates how we create metadata. Once we get past MARC we can get on to new ways to think about description. Let us work collaboratively and use the best of what’s available, and help develop it. Help users discover resources using languages to their liking, facilitated by linked data. For smaller libraries, linked data workflows need to be improvements in order to make the transition make sense. (Aside: One rejected name for RDF is “Relationship Description Framework”.)

With a small library that would only move when their service provider does, when do we move?
Talk to your service provider, and at every opportunity. Wisconsin published their work on GitHub, hoping their vendors and other would hopefully see what’s being done and get ideas for services and products. OCLC wants to make APIs and apps that are easy to use and make good use of good cataloging. There’s a lot of inertia so that a change may be 5-10 years out. Even the big implementors don’t quite know what to ask their ILS’s will provide.

When do get a unified commitment from communities so that the libraries and libraries can commit? What is OCLC’s commitment?
There are serious problems with data models that are needed (OCLC). They have to serve the bleeding and trailing edge. They will need to service MARC “for a long time.” No industry has a 10-year plan, the ILS industry needs to remain responsive. The ILS is dead, maybe? How can the library community be broken up in ways that make sense?
Library.Link is about a data solution and not a systems one. Working in distributed ways will need to be the future model; there’s too much we want to work with. We don’t want to go down a federated search model. Even MARC is a distributed model. If libraries develop a system of trust then things like search cards could possibly display that curated information.

What will $0 do?
It will make a big difference. BN is staying with Intermarc, expanding 3 to 4 digits, rejecting production within RDF, but would convert to RDF at the back end. Something for which OCLC’s Entity.JS project may still be needed, but $0 will help make sense of descriptions.

June 25, 10:30-11:30 a.m.
Sally McCallum intro

LC pilot 2.0, Beacher Wiggins
Pilot 2.0 began June 1 with training sessions on modeling and vocabulary. 40+staff from Pilot 1.0 going forward to 2.0 after retraining (complete). 35 or so new recruits. Project covers many formats and languages and scripts; includes rare materials for the first time. Developed training materials for all participants. All materials will be mounted and shared on their website. 2 general intro sessions, followed by format-specific training. First phase of training for new people on the project covers an intro the semantic web. Module 2 of training is hands-on editor training; includes comparisons of MARC and BF descriptions. We will have a MARC universe for the foreseeable future because of LC’s need to distribute MARC records. Progress report at Midwinter.

Sally McCallum: They have a goal to creat a realistic cataloging environment. They’ve converted the whole of the LC catalog to BF, 17M MARC bib records, 1.2 uniform titles converted to BF Works, merged and matched BF Works, continuing to refine machine algorithms. Enabled input of new name authorities in BF system (“token” capability right now). Already have NAF, LCSH, TGM via Linked Data Service. 3K new names daily. 6K new bibs daily.
BF Editor:
Has Profiles, customized, pre-populated, with popups, type ahead, ddropdowns, RDA hotlinked from within editor, continuing development wth tools. “Cross system flows” include 60+ Pilot catalogers, 200+ non-Pilot catalogers; MARC-based unit records; RDF-based BF descriptions.
Workflow is to create a work in BF, then in MARC.

Sources online at LC:
BF 2.0
MARC-to_BF conversion specs
MARC-to-BF conversion programs
MARC-to-BF comparison viewer
Soon: editor software and profiles

Explore machine creation of BF from MARC, including splitting of electronic from print records when two manifestations are on the same record.
Experiment with using vendor BF RDF, working with Casalini to load both MARC (as before) and BF “records.” Talking about inserting PMO work into BF and editor. LC needs to work out internal problems with identifiers. They want to look at converting from BF to MARC.

BIBFRAME and OCLC Works: Defining models and discovering evidence, Jean Godby and Diane Vizine-Goeltze, OCLC
Work to align OCLC Works with BIBFRAME works.
OCLC Works is looking for discovering work information in MARC records. Work information comes from bib and authority records. Authority work with VIAF involves VIAF clusters. Has Work and expression records (for translations). Work entitymined from bibs consists of work cluster, “same” as VIAF clustering. Using data mined from WorldCat to evolve into more uniform title records.

Demo project in WorldCat: Cookbook Finder. Work-level description based on most prevalent associated subjects, has work ID. WorldCat can also be displayed in parallel viewer form, and will give you the RDF output, not filtered or massages.
Did a BF conversion to compare OCLC Work identifiers with BF works. Output is split between work-level and instance-level information.
WorldCat Works
As of about 2 weeks ago 178,375,018 singleton works out of 394,838,538 records. Non-singleton work clusters average 4.18 records. Results in 230,113,951 works. They don’t have the confidence to cluster further. Data-driven perspective.

See PCC Work paper for more detail. Comparisons of FRBR, BF, and OCLC Works.
OCLCWorks has a light dividing line between Work and Expression, maybe not significant enough to distinguish. Maps into Schema, which also flattens down to CreativeWork.
OCLC has subtle distinctions, mainly translations, for expression-level distinction between Work/Expression. BF has bf:translation property that might be enough to relate “work” and “expression.” Humans are much bettter at distinguishing, supplying the more intricate relationships based on RDA that BF has. They may look at different resource types with a different level in the future.

LD4P Tracer Bullet 1: an RDF copy-cataloging pipeline, Philip Schreur
Briefer reprise of earlier session. We need to make the transition to BF soon. Presentation
vendor-source copy cataloging. Others elements are original cataloging, original single-item contribution into repository, collection into repository. Always need operational records to support functions like acquisitions, end-holds, bookplating, retained as MARC data in the ILS. Revised workflow is to the benefit of the discovery layer. ISNI and VIAF #s supplied by Backstage in MARC XML ==> BF conversion ==> reconciler ==> post-processing ==> triple store ==> SOLR query ==> SOLR document ==> discovery. A short-term discovery environment, using Blacklight, integrated into existing discovery mechanism.

Transformation, BIBFRAME, and the Library.Link Network, Eric Miller, Zepheira
No notes on this segment—Had to leave earlier.

ALCTS Metadata Interest Group Meeting

June 25, 2017, 8:30-10 a.m.
Evaluating Metadata Standards –Principles into Practice
Jenn Riley, Lauren Corbett, Erik Mitchell
History of Metadata Standards Committee
Started ca. 5 years ago, in part as a replacement for MARBI, but still finessing its raison d’être. Main redefinition is to move towards helping evaluate metadata standards instead of developing or endorsing a monolithic standard. The hope is that these standards can help local implementer make decisions about which standards to use, to keep things in mind when developing standards.
Looked at NISO Standards Tag Suite (STS) to see how metadata standards could be used to evaluate the standard. This is to look also how metadata standards could be applied to principles for markup languages. Gave feedback on documentation, clarifying relationship of NISO standard to ISO predecessor. Recommended providing a lightweight tag suite. Other comments included pointing out that the abbreviation used “trans” for translation, could be viewed as insensitive considering there’s a trans community using the same term.

Opened up session to see if this process or metadata standard would be useful E.g., DPLA will be updating their metadata application profile this year. Discussion about how DPLA was getting guidance for development. One questioner suggested that it would be useful for standards developers asking for input to provide something like a call or webinar to introduce the standard to facilitate input, introducing to document to be commented on. ISO person asked for feedback about they manage the comment period for standards; possibly give a longer lead-up warning time, instead of dropping a 30-day review period without warning.

NISO has released for comment a standard for vocabulary management.
Mentioned the upcoming MSC meeting this afternoon.


MLA report
CC:DA report, much information on 3R
Officer report
Discussed role of officers, positions opening up, recruiting new talent for Vice-Chair/Chair-Elect; Program Co-Chair (2017-19); Secretary (2017-19)
Interested in new members for the offers. Have 5 names for 3 positions.
Vote here:
The winners:
Vice-Chair/Chair-Elect: Ann Neatrour
Program Co-Chair: Anne Washington
Secretary: Wendy Robertson

Metadata Standards Committee

1-2:30 p.m.
Erik Mitchell, co-chair
Jenn Riley, co-chair, founding chair, rotating off this year
Small group, with moved rooms.
New charge is fully approved, is up on Metaware.buzz
Announcements, Lauren, Jenn, Mike and Erik presented this morning at Metadata Interest Group. Well received. DPLA mentioned being interested in this group being a review service. NISO ended up revising contextual documentation, and appreciated the detailed input. About the NISO standards feedback process: Looking at how to develop this into the future. Should the group be involved in similar attempts? Discussion that it would be advantageous to combine people who have knowledge of the comment process with those who are not as experienced. OUTCOME: Will continue the work, with cross-seeded groups. Small groups,
bounded timeframes.
Questions about how frequently for the MSC members to meet. Has been monthly so far. Continue for now.
Question: Does this group have a relationship with the ALCTS Standards Committee? No, but MSC has a charge to maintain relationships.
Metaware.buzz: Frustrations with the web ALA support in the past led to the site’s origin.
Posting hasn’t been happening. Connect does a reasonable job of storing documents. But is there a reason to do more than that? The committee serves two divisions, so a division information resource/platform may weight things inappropriately. So…continue as is for now until more is known about the impending web migration.
Question about $0: How involved would MSC should be in discussions about MARC or other things are more marginal to the charge. The group has tried to stay out of MARC for the most part. They will be here as a resource if there is a need. They will keep this back under consideration. MSC will also think about monitoring work in other parts of ALA.
Things to look at for next year? Possibilities:
PCC Standards
Had to leave early…

RSC Focus Group

Monday 9-11:30 a.m.
Vocabularies may be published through the RDA Registry, though would need to be managed by others
Comments from communities:
Rare materials

In the audience:

RDA Linked Data Forum, introduced and moderated by James Hennelley

Two presentations, Gordon Dunsire and Dianne Hillmann

RDA Linked Data Vocabularies Data Management and Use Workflow, Gordon Dunsire
Explanation of RDA expressed in linked data
Open Metadata Registry houses the RDF serialization of RDA, has a record of versions, nothing is deleted
Periodically pushed to the GitHub platform RDA Vocabularies
The above pushes out to RDA Toolkit, RIMMF3 data editor, RDA Vocabulary Server, RDA Registry (contains maps between RDA and other standards)
Demonstrated how metadata moves through the food chain, using “audio disc” as an example.
Life begins in a spreadsheet that is imported into the OMR (treated there as a SKOS:Concept).
Next is pushed to GitHub. Goes to RDA Toolkit, where the GLossary is the key destination, where it populates associated places in the text via the CMS. Also populates RIMFF, where it generates the terms list in a pick list, where behind the scenes lurks a URI.

RDA and Linked Data : Where’s the beef
Dianne Hillman, Metadata Management Associates
Look at RDA Registry.info for further information.
Presented 4 of TIm Berners-Lee 4 stars of linked data plus “RDF Semantics Versioning” (?), but really what linked data was, rather than a way to evaluate how wonderful their data is. Issues with going to linked data from current representations
Thoughts that RDA is being referred to as “guidance instructions,” that this perception does not encompass some of the more linked data aspects of what’s being tried. She calls RDA “the whole package.” MARC ==> RDA is the least lossy stream.
Talked about infrastructure: Infrastructure matters; versioning and stability critical; “differences in approach are downplayed and cast as ‘political’ in nature”; politics in libraries is getting in the way.
[A number of discussion about web and linked data was not quite correct….]
Managing change is an issue, going from MARC to linked data. Mentioned NISO library vocabulary issues; we’re still in the comment period. “RDA Registry is optimized to take library data to the next level”; mentioned the multi-lingual nature of RDA as a feature that could be mined for non-English-speaking users.

Maintenance issues:
OCLC has a master record concept with clusters of holding libraries; emphasizes trade publications; research arm doesn’t lead to services and tools.
Need more flexibility to develop other models of maintenance.
“Reality and possibility”:
Grant projects often don’t end up with long-term usable products. Maybe there are ways to better fund research.

From audience (Thurston Young): lauded the publication of the RDA vocabularies How to change mindsets: Stop thinking about “records,” move to conceptions from an RDF graph. Start to trust or use information from multiple sources. Think about data integrity.
Question about semantic versioning, that changes can break earlier understandings. How to manage vocabularies available in the public space. Use dataset vocabularies? E.g., how to manage keeping meaning intact when the meaning of triple components might change. Gordon Dunsire mentioned that there’s going to be a 3.0.0 RDA suite because of LRM’s changes to entities, that not everything will map over cleanly; there will be a break in semantics.
When we we be doing RDA in linked data on a large scale? Even now much of what’s needed is already out there. It could happen soon. Gordon Dunsire was pushing that RDA could be there to get going. The system handling the metadata is a limiting factor, that cataloging potentially could be done as linked data, only the underlying system isn’t there yet.
RDA will be the major implementation of LRM.
Gordon Dunsire spoke of all this being a big cultural shift. Ideas like those in LRM that no entities are more privileged more than others. Libraries are responsible for 10-15% of metadata that the public consumes.
RSC has reached out to many stakeholders but have they reached out to OCLC? Gordon: “We’ve thought about it.”