ALA Midwinter, January 25-28, 2019
Midwinter Liaison Report, and Notes on some meetings of interest for Encoding Standards Submitted by Jim Soe Nyun, Chair, Encoding Standards Subcommittee
MARC Advisory Committee, Liaison Report
Session 1, January 26, 2019, 8:30-10:30 a.m.
Session 2, January 27, 2019, 2:30-5:00 p.m.
Meeting agenda with links to papers: http://www.loc.gov/marc/mac/mw2019_age.html
(Summaries in proposal/paper number order, not necessarily in the order in which they were
Proposal No. 2019-01: Designating Open Access and License Information for Remote Online Resources in the MARC 21 Formats
ACTION: Generally considered an important set of changes. Passed with amendments:
- 506 $g definition change: Date for the end of an embargo, when the resource becomes freely available.
- 540 $g definition change: Date for the end of an embargo. W[w]hen the resource changes its use and reproduction rights.
- 856 $7 that it be made Not Repeatable, and that it be defined to refer only to URIs in $u
- 856 $7 Position/0: Labels and definitions reversed from proposal (now 0=Open Access,
- 856 $7 Position/0, Value 1 definition (now Value 0) modified to removed second sentence
which refers to the overly-restrictive Budapest Open Access Initiative definition (MLA
- 856 new $e definition changed: …It may contain a free-text term, a standardized term, [or]
a URI, or a mixture of them.
- Similar changes will be populate in the MARC Holdings where appropriate
Proposal No. 2019-02: Defining Source for Names and Titles in the MARC 21 Bibliographic Format
NOTES/ACTION: This proposal changed fairly significantly from Discussion Paper No. 2018-DP07 which preceded it. The original DP sought only to use $2 for names in these access points, even when the access point was for a work which used a conventional string including the name of a creator. Among other things this would have caused confusion with the meanings of relationship designators and indicators. The proposal now requires that $2 cite the source of the entire strings. This is an imperfect solution, with the 100/240 combination being a main point of potential confusion in the future.
The final proposal was not universally loved, but grudgingly accepted and passed (2 no votes and several abstentions), with a requested rewording to the definition in 3.3:
MARC code that identifies the source list from which the name or name-title heading was assigned….
Proposal No. 2019-03: Defining Subfields $0 and $1 to Capture URIs in Field 024 of the MARC 21 Authority Format
ACTION: Passed unanimously with the understanding that the MARC Authority Format Appendix A be amended to align with this proposal’s $0 definition.
Discussion Paper No. 2019-DP01: Coding Externally Hosted Online Publications in the MARC 21 Holdings Format
NOTES/ACTION: Some discussion about whether 008/06 was the best location for recording that a resource is hosted remotely, but general sense this would be appropriate. The paper definitely addresses a major real-world use case. It is likely to return as a full proposal.
Discussion Paper No. 2019-DP02: Subfield Coding in Field 041 for Intertitles and Transcripts in the MARC 21 Bibliographic Format
NOTES/ACTION: General support for the changes outlined to introduce subfields to record language of intertitles and transcripts. In defining what a transcript is, the possible $t was defined with wording that was meant to distinguish it from music’s use of $e. However some confusion remains, and OLAC will work with MLA to finesse an accurate definition for a final MARC proposal. This may be possible without rewriting $e’s definition to better reflect that we use it for more than librettos. The paper will be developed into a final proposal, probably for ALA Annual.
Discussion Paper No. 2019-DP03: Defining a Field for a Subject Added Entry of Unknown Entity Type in the MARC 21 Bibliographic Format
NOTES: The Bavarian Library Network’s Gnomen Thesaurus (http://www.englisch.gnomon- online.de/) includes a variety of entities that cross the 600/700 membrane of MARC. There was general support to developing a new MARC field dedicated to recording these and other entities of an “unknown” entity type. There was a suggestion to call these “unspecified” rather than “unknown entities. Where to place the field? Three fields were set out in the paper as possibilities: 620, 652 and 670. Two were rejected: 653 (because it has been used in MARC for another purpose, albeit many years ago) and 670 (for its similarities in name with the 670 in the Authority Format but also for the dissimilarities with how it would be used). The 680 region was also mentioned as a possibility, even though there had been some historical local use of the range by Canadian libraries. No definitive final recommendations or preferences, but it is possible that the paper will return in proposal form.
ALCTS Metadata Interest Group, Liaison Report
The IG’s blog: https://www.alcts.ala.org/metadatablog/
I provided the following very brief MLA metadata update in advance to add to the notes of their business meeting:
- Earlier reporting has mentioned MLA’s participation in the development and review of the Performed Music Ontology, a project of the recent Linked Data for Production grant effort. The ontology is now published at: http://performedmusicontology.org/ontologies/PerformedMusicOntology.html Elements from the ontology have already been made part of music metadata profiles in the Library of Congress’s test of BIBFRAME 2.0.
- Several changes to the MARC format requested by MLA are now live in with OCLC’s most recent (September, 2018) implementation MARC updates. These include changes to Field 382, Medium of Performance, and Field 384, Key. MLA is watching with interest and has commented on papers and proposals being presented at the MARC Advisory Committee meetings during Midwinter weekend, including an OLAC-sponsored discussion paper looking at recording language of intertitles and transcripts.
There was also a program on crowdsourcing metadata. Some general notes:
Samuel T. Barber
Motivations for crowdsourcing are heavy on the lack of resources to process workloads, and hidden collections can suffer. Crowdsourcing should be considered augmenting professional staff. Quality control is a big fear.
Use case: Operation War Diary
Used tool, Zooniverse: A cross-discipline, international citizen science effort. Zooniverse Project Builder: tool for developing/content via crowd input
Their project requires 5 volunteers to look at a page before the work is considered complete. An odd number is useful to employ a majority view consensus on transcriptions. Their input system potentially could capture variants, but doesn’t right now. Their volunteers can drop a pin on a page to mark text of interest. Geonames can be linked to in order to go from the name to a map. A place query can look at the context of a project to limit results (e.g., Fullerton England, not other Fullerton’s)
Wisdom of the Crowd: Successful ways to engage the public in metadata creation
At Utah State, all crowdsource information is curated within the metadata service department
Tools: Outsourcing / Coordinating efforts with volunteers or students as part of a class (Includes class time discussing metadata standards)
They use ContentDM and have a webform for gathering comments: Reached by a link that generates a link in a digital object. The form generate an email with a digital object #.
Folklore collections: Hal Cannon Folk Collection. Started the project with an interview of donor, and the donor provided content.
Other of their projects that involved oral histories: Jackson Hole Dude Ranching Tradition
Compton Studio Photographs
One form had DC elements that were difficult to implement Webform had problems with accuracy of information
They have moved away from a DC-centric form
1 liaison community member
1 family member
2 cataloging and metadata reps (one to interact, another to take notes) Packets of printouts to discuss with interviewees
Community Events to draw information from the public Plan the event, talk to knowledgeable community members who know the community, do PR Often ended up creating print booklets that non- computer-literate contributors could mark up
- There was a Zooniverse project to look at published 19th C Bodleian music scores: Was not
really successful, not enough enthusiasts to make the project a success, the public didn’t provide
the needed information
- OCLC has now put up on GitHub one of their crowdsourcing tools to help people fill in the blanks
- A curator can filter terms from interviews to add tags for objects
- With Samuel Barber’s project, their process won’t process materials until the threshold of 5
metadata reviewers have looked at an item
- Have controlled vocabularies been expanded from user input: the Utah projects have not since
they are using things like LC vocabs
- Volume of web comments? Low. 10 a month? For the Utah project. Low, but useful, with
occasional flashes of information that they weren’t expecting.
(Program continued, but I had to leave for the LC BIBFRAME update.)
Other meetings at Midwinter with some impact on encoding standards
Ask OCLC 10:30-noon, notes from final few minutes, after I arrived in town
$0 will export if you select so in Record Manager, but non-LCAF identifiers, including $1 real-world objects, will get wiped out in the process.
End of life of Connexion: “Within 43 years…” E.g., still TBD. It will happen, someday. Connexion is a frozen product. Record Manager is a live product and will have updates. Connexion will not go away until a product with generally equivalent features is ready.
Authfile@oclc.com for OCLC to have them create NACO records for you (Laura Ramsey the person @ OCLC who works on these).
Next OCLC Office Hours presentation features Robert Bremer on provider-neutral cataloging.
OCLC Linked Data Round Table
Nathan Putnam, moderator
Xiaoli Li, UC Davis
LD4P2 from 2 perspectives, from PCC’s perspective and UC Davis’
- Background on LD4P2: Pathway to Implementation, nucleus of 4 institutions plus LC and PCC. Elements of project: Cloud-based metadata transformation; metadata reuse and transformation; linking to external authorities and web context; discovery; production workflows for native linked data descriptions; community collaborations
- 17 participants sent bib records for Casalini to transform into BIBFRAME; creating SINOPIA, a cloud- based metadata reuse and creation tool with integrated lookups to thing like NAF and VIAF; various projects to look at workflows, BF structure, discovery implications, cost analysis
- UC Davis developed out BIBFLOW. Linked data discovery system to enhance external content that could enhance discovery. SINOPIA-to-local triple store
- Links to Folio for circulation and acquisitions functions
Kevin Ford, LC
Update on LC’s work with BIBFRAME and streamlining LC’s BF dataset What LC has done since ALA Annual: continued pilot work, refined conversion (on Github); collaborations with SINOPIA group, and authorities group to extract metadata from id.loc.gov; BF Editor updates (cloning works and instances, bettter interaction with database and editor); trying to reduce verbosity in RDF and trying to reduce blank nodes (anonymous resources in RDF)
Re blank nodes, resources identified with blank nodes lack URIs that Candice be shared easily. They’re unavoidable in RDF, are written into the spec for RDF. Part of the processing. Should everything have URIs (“URIs are commitments”)? Kevin Ford’s current bugaboo. Results in a lot duplicatation; less efficient scaling.
Example from providers in BF: Blank nodes for “United States” and “Columbia Pictures Home Entertainment” strings. They worked with an experimental Provider file. A data analysis showed that out of ca 15 million records contained only 1.2M had unique strings. Out of 1.2M providers they came up with ca 800K providers after parsing agents in ID.LOC, loaded into ID.LOC, larger than many other files there. The test file can be accessed at: http://id.loc.gov/search/?q=memberOf:http://id.loc.gov/bfentities/providers/collection_Providers
(For an example of clustering and reducing blank nodes: http://id.loc.gov/bfentities/providers/4599ff4baa77b72ddd0b65a9972c8b15.html)
These are NOT MEANT TO BE AUTHORITY RECORDS.
BF-to-MARC conversion tool by summer; BF update later in Midwinter
Karen Smith-Yoshimura, OCLC
Linked data survey repeated in 2018
What has changed in linked data implementation during the last 3 years? A survey instrument developed to follow up on an earlier survey.
Code4Lib last November has results from 2018 survey update following up on 2014 and 2015 respondents.
- Mostly research libraries and national libraries; latest survey includes responses from service providers
- How long have projects been in production? Shows longer length of production —Survey responses need to be taken with a grain of salt
- Moth implementations both publish and consume linked data
- Successful or mostly-successful results
- Publishing linked data: reasons include publishing to the web
- Types of published data: bib stuff, personal information
- Similar barriers to adoption
- Consumption reasons: SEO optimization less of a focus than before
- ID.loc and VIAF still heavily used, but with a 4x increase for Wikidata
- Barrier to consuming include matching and disambiguation
- Advice: learn from others, focus on use cases, collaborate, integrate linked data into workflows, analyze what to convert, never underestimate the amount of data cleanup required, use existing identifiers and ontologies, listen to user feedback, expect benefits only at scale
- Service providers are emerging
- Diversity of LD implementations, many outside of library domains
- Most implementations are educational or experimental
- Oslo Public Library only one to do original metadata description in production mode
Would every new topic string require its own URI? Kevin Ford: Maybe. TBD.
SINOPIA / Record Manager, competing? Complementary? Nathan: apples and oranges at this point. Record Manager will develop out in some directions that may move towards SINOPIA For the publishers, would there be an interest in linking to established NAF forms? Kevin: many would not have entries, and issues with different forms. Things like “s.n.” were removed.
Will there be work on a different editor? Not really, but the LC BF editor can actually be simple to extend and to look at what’s under the hood.
Faceted Subject Access Interest Group
Update from Judy Jeng, co-chair, of the FAST Policy and Outreach Committee (FPOC) Background on FPOC and its requests of members
Various uses for FAST: Used for some minimal cataloging, for knowledge cards, etc. FAST service infrastructure to be implemented in March 2019, including 24/7 support
The main program/activity was for the room to break up into several discussion tables:
Faceted vocabularies, do they matter to serials? (lead by Sophie Dong and Les Hawkins)
Implementing faceted vocabularies in digital repositories (lead by Sai Deng)
Evaluating the use of faceted subject terminology in a cataloging environment that lacks a discovery layer (lead by Joshua Hutchinson)
Practical and scalable approaches to implement new facets in discovery systems (lead by Erin Grant)
Genre form terms in cataloging (lead by Nicole Smeltekop and Angela Yon)
Everything about FAST (lead by Judy Jeng)
Detailed notes on the table discussions will be available in the future. Some brief notes on one of two tables discussing “Implementing faceted vocabularies in digital repositories”:
Compared to the MARC world the archives side employed a really wide range of platforms for their digital repositories, including Content DM, Islandora, home-grown systems and Hyrax. Some were full institutional repositories, others more digital asset management systems. Many at the table were catalogers and not so involved on the non-MARC side. Several of those at the table were active FAST users. There were some issues of how the terms were supplied: sometimes by catalogers upstream, some by the digital library program, some by conversion of MARC with reconciliation, some from exploding apart LCSH strings when there were no FAST terms.
Library of Congress BIBFRAME Update
They want 100 people on the project test by the end of the Fiscal Year in October, all formats, all languages
They want to test BF-to-MARC conversion tool
Expanding the Pilot
BF to MARC conversion to stop the double-keying that is going on now.
Main problems are the differences in the data models.
What happens to BF works? Map to bibliographic work.
Need to have single MARC record with topics.
Where do AAPs go?BF:Work.
Records for every component? (Yes, in BF model, may not be needed in MARC.)
Non-Latin script models are different. Transcription Model A used in bib records, Model B in authority. (See MARC appendix D: http://www.loc.gov/marc/bibliographic/ecbdmulti.html) Probably no 880s in resulting MARC output.
URI issue: in BF URIs may be in place instead of labels. E.g. for topics, they have URIs for the string and need something for URIs at the subfield level
They have AAPs and Relator in URI form, may not work for many fields where it’s not permitted Punctuation: No punctuation at ends of elements in BF. Internal punctuation will be kept.
Conversion results so far: Records are sound; no punctuation at subfield boundaries; information maps into 264 not 260; where do the MARC works go? URIs?
They need to test that these records will work for users. Doing by October 1 if possible.
LC still needs to supply MARC to the community, so their BF to MARC tool is important to have so that they won’t have to catalog everything twice.
Anonymous Resources, Blank Nodes and Providers, Oh My! [More detail in OCLC Linked Data Roundtable presentation reported on above]
Working towards a slimmer BF resource with fewer blank nodes. They can use an RDF:about to identify the resource. The URIs are shareable and usable beyond and within a system.
URI problems: these are commitments
Some things require URIs, like names and topics (See Sally McCallum’s earlier discussion). Lots of duplicates of anonymous resources Look at BF/Entities Providers in id.loc.gov [How does this impact authorities versus transcription?] Label service at id.loc.gov can try to privilege URIs Place, unauthorized names, many other issues
LD4P Status Update
Focus on LD4P2, building on first phase and implementing, partnering with PCC, other libraries
30 apps selected, 17 selected
Stanford is developing Sinopia editor tool that integrates LC BF editor, including looking metadata creation and reuse Working with SHARE-VDE to convert MARC to BF, 2 working groups will issue recommendations by end of January, and conversion can begin.
Sinopia use: Training for Cohort, but open to community SINOPIA is results will be part of a data cloud for others to use SINOPIA will need to hook out to external resources, e.g. Wikidata Discovery developments: Working with DEVELOPING BLACKLIGHT TO INCLUDE LINKED DATA. ADD DATA PANEL, SEMANTIC SEARCH, incorporate schema.org and other information into “cages.”
LD4: 1 conference a year, 2/8/2019 application deadline.
Project will produce a data pool that will be up for 5 years at Stanford. Cohort members to lead transition to LC; MARC enhancement policies with how to make conversions from MARC cleaner; PCC policies on how to enhance member metadata ; partnership with Wikimedia; enhanced discovery with Blacklight ; LD4P for international collaboration ; end by June 2020
European BIBFRAME Workshop (https://www.slideshare.net/sollbruchstelle/european-bibframe-workshop)
The above-named group includes many European implementers, with Sally McCallum and Philip Schreur included in core panel
Working with mainstreaming the move from MARC to BF.
Hashtags for recent meetings: #eubfws2017 #eubfws2018
Two workshops so far, with participants from 20 countries, including US, Canada and Qatar
Intro of BF data models and editors
Programs include SHARE-VDE: How the project meets the BIBFRAME model (Tiziana Possemato) / BIBFRAME in production: Libris XL, the Swedish Union Catalogue (Sweden is already creating original cataloging in BF) / The Hungarian Common Catalogue in BIBFRAME: using FOLIO for cataloging work / Michalis Sfakakis, MARC toe BIBFRAME: Evaluating the extraction of bibliographic families / Osma
Suominen, Converting BIBFRAME to Schema.org / Richard Wallis, Three Linked Data Choices for Libraries, Beyond MARC Breakout sessions, including RDA with BF, Handling a BF dataset, training, workflows around BF data, work-to-work relationships : practice and plans
Included vendors, Ex Libris, Alma, Index Data: FOLIO, OCLC (2018)
SUMMARY: “Nothing fits everything” ; it’s all about expanding the community ; different stages of development, from experiments to production; different approaches, “critical mass” has been reached
European BF workshop 2019 in Sweden (Stockholm)
Nathan Putnam: OCLC Update
They have mocked up linked data editor project
Looking at the BF converter: Work IDs are important, URIs are important, OCLC doesn’t have Instance data, working with BF works OCLC Hash URIs replacing blank nodes.
Removed duplicate entities if already VIAF or FAST.
Reviewed BF Administrative data Tested modifications on WorldCat records
Now what? Results shared with Global Product Management
What are workflows? Use cases (e.g. circle, provenance, etc). What are desired outcomes? They will work with member libraries, PCC, and advisory group to look at needs
Framing BIBFRAME @ OCLC / OCLC.lc/BIBFRAME-interest / email@example.com
By ALA Annual: more concrete plans about next directions
BF Works only in bib records?
(To Kevin Ford:) Are providers being reconciled to authorities? No. They’re different kinds of data. What is a BIBFRAME Work record? It’s a new invention. Corresponds to name/title records. Conversion to BF eats up features form the current authority record.
How will general catalogers’ works fit into a model that tries to get rid of blank nodes? It’s definitely an issue. Wikidata, vendors, reconciliations may help with this problem.
“Seymour Lubetzky must be turning in his grave from the good news about BIBFRAME work records.”
Metadata Standards Committee
Some issues with both co-Chairs being on the same rotation cycle, issues with people’s appointments not being staggered.
Some talk about LITA and ALCTS merging (more later).
They have used webpage lists for distributions, but now need to morph to email lists, using the Connect email services.
Status/future of Metaware.buzz (http://metaware.buzz/) : 4 1/2 to 5 years old, goal was to share resources and analyze/recommend metadata sources and share topics. A LOT OF WORK to maintain. Has been pulled back to posting just some committee publications or meeting minutes. Posting onto Connect is limited behind a wall, so that may not be a good place to park the content for a site originally planned to be a public resource. Some opportunities for sending out content through other means, e.g. LITA blog. Some questions, too, about the purpose the group, and should it still be around.
Mike Bolam will talk to previous Chairs and gather some thoughts towards another or more focused direction for this group.
Presentation by Jennifer Bowen, ALCTS President-Elect: Proposal to join LITA and ALCTS and LLAMA to form a new division within ALA. There are some timetable issues for an immediate election. There are enough things to resolve before going further ahead. There are big differences in corporate styles, ALCTS is higher overhead, LITA is not. There will be changes at ALA that might encourage joining of some of their 11 divisions. Currently there are some silos within divisions and merging could help. “Something will change.” But the merger might not necessarily go ahead in its current form. From the LLAMA side, there was a thought that LLAMA’s branding might be lost if subsumed within a Tecch services bent. Some thoughts the opposite could be true. One idea that the LLAMA side could offer a growth potential for people who begin their careers on the technical side. Look at the ALCTS merger page.
Metadata quality assessment framework should be out from this committee, hopefully before ALA. First news on LITA blog.
RDA Linked Data Forum
Gordon Dunsire (presenter), James Hennelly (moderator)
Impact of the 3R Project on the RDA infrastructure
Slides above just posted, notes below repeats some of the content but might provide more context or explanation.
GitHub has been used to distribute RDA Vocabularies (overview of features: provides full version control; ability to access old versions or roll back to previous if needed)
- Explanation of RDA release #s major.important.minor (e.g. 3.0.15)
- 2.7.3 last official release, supports current RDA Toolkit
- GitHub has a “pre-release” flag, 16 of these so far
- LRM has made big changes that have impacted release numbering, these begin with v 3.0.0 and currently up to 3.0.15
- In GitHub you can get zip or tar.gz files, includes all the terms, maps (including partial mapping to Dublin Core)
- Next full release is imminent, Version 3.1.0, element sets are “stable” with no big changes envisaged, getting it out soon will make it available to translators
- Further release will take care of fixes, plus responses to feedback
- Value vocab have been available since 201805
- Registry data in 3R: parts are machine-generated from the RDA Registry (standard sections and headings)
- DITA standard for XML now replaces former extinct standard used for CMS underlying the Toolkit
- Breakup of Agent into Collective Agent, Corporte Body, Family, person updated in GitHub then extracted
- Workflow: GitHub to Toolkit data extraction (XML) into RDA content management system, then into RDA Toolkit with CMS stylesheeting
- Element page I for: definition and scope; element reference; related elements
- Resources entirely from GitHub: Glossary, Vocabulary encoding schemes; relationship matrix
- This has resulted in a significant saving of time needed to update the Toolkit
- Identifiers are generated in the extraction process and are used to link parts of the toolkit
- Prerecording and Recording sections are not entirely automatic generations from GitHub
SKOS:label (Toolkit label) – canonical property wraps the term
SKOS:definition (Toolkit definition) – canonical property
Owl:inverse-object property<== used for Inverse properties
Skos:altLabel – canonical property
Breakout of Agent roles
Triple: Work – has author – Agent
Explained structure of Agent somewhat currently problematic, with subclass Collective Agent with subclasses Corporate Body or Family
Person subclass directly off of Agent
Work – has author agent – Agent
Work – has author collective agent – CollectiveAgent
Work – has author corporate body – Corporate Body Etc. for each type of Agent subclass
Context of instructions
Application profiles, selection
Consistency, many agent elements already explicit
Gender-inflected roles in RDA translations (e.g., autor/autora) could be managed via further subclassing
Single metadata statement or set is treated as a Work, (a metadata work)
Metadata provenance described using RDA: author? When created? Where did the value come from? Plus other factors as needed from all of RDA
Metadata reification, e.g.:
Work>has identifier for work>”2049-3630” <== this can be wrapped up:
Work [statement above]>has source consulted > ISSN International Register BF has values that may not be scalable (“has ISBN” etc.)
13 entities, 2900+ elements
Modeling question: Are ships and spacecraft or performing groups corporate bodies (versus Collective Agent)?
RSC Technical WG, one of only 2 permanent WGs (the other is Translations WG) Gordon still leading the standing group Entire WG structure will be reset at the end of April, 2019 Call for participation WGs in April/May 2019
Context of instructions: all Person relationships in Person entity chapter. The MARC relator list and came from similar sources but it and RDA are diverging more and more. There’s no liaison between the two; RDA maintains a mapping; may not be a huge issue, with lots of likely future subproperty relationships
Hebrew is a gender inflected language and was the push to insert the gender typing; the library networks in Israel were unhappy with using just the male form, and were a force in pushing RDA to internationalize with their relationship designators.
Practically, RDA might not sell so well if there are no gendered terms to fit it better into the countries.