Skip to page content
A Symposium for Publishers and Librarians\n \n

Report on OCLC's Symposium for Publishers and Libraries

March 18-19, 2009

On March 18th and 19th representatives from libraries, the publisher supply chain and organizations supporting these communities met at OCLC's Conference Center in Dublin, Ohio to discuss metadata needs, practices, lifecycle and economics across the communities and to explore opportunities for change.

This report is a high level summary of proceedings, outcomes and proposed next steps. Participant biographies, agenda, presentations, related reading and upcoming events can also be accessed from this Symposium website.

Purpose of the Symposium:

  • Explore current models for creation, distribution and maintenance of publisher supply chain and library metadata
    • Are they sustainable?
    • What are the common needs?
    • Are they subject to duplication of effort across communities?
    • To what extent are they shared and interoperable?
  • Explore new paradigms for metadata creation, distribution and maintenance that:
    • Are more easily shared and interoperable
    • Start upstream and allow metadata to evolve over time
    • Engage multiple communities in the metadata lifecycle

Karen Calhoun welcomed the group and Jay Jordan then addressed the symposium.
View Jay's slide presentation [PDF]

Renee Register provided an introduction to the symposium and provided an outline of the agenda.
View Renee's slide presentation [PDF]

Panel Discussion:  Upstream metadata

Reference: Renee Register's slide presentation [PDF]

Philip Madans, Hachette, Director of Publishing Standards and Practices

  • Phil has been with Hachette for 18 years, 25 years in the industry. He talked about the evolution of publisher metadata and systems and the need to create and distribute more robust metadata in the web environment.
  • Ten years ago metadata was considered data for internal use and not for external, with a definite separation between IT and publishing.
  • Most metadata was contained within transaction systems.
  • Exposure of this legacy metadata from varied internal systems within new systems exposed problems with consistency and quality of metadata.
  • Relationships with online retailers and direct end-user relationships have driven the move to more standardized electronic metadata and the growth of ONIX over the past ten years.
  • Amazon lobbied for the importance of metadata for sales and the exposure of metadata on Amazon made errors and skimpy metadata more visible.
  • Publishers continue to look for new ways to connect to readers through enhanced metadata.
  • Moving to XML workflows allows metadata to live with content.
  • Control of metadata is also important to publishers and it's sometimes difficult to find a balance between publisher control of metadata and others in supply chain that may add value or change metadata.

Question: Is the main publisher motivation to improve metadata to sell books to end users?

Response: The goal is to sell books but also to get our books to people in any form and help them to discover our materials. A recent example is Twilight by Stephanie Meyer—sales benefited from end-user exposure to information about the title. The online community motivates people to buy the books. We want to control the metadata—there are lots of third party metadata providers but the publisher's data will be more accurate and reflect current information.

Question: Are most North American publishers depending exclusively on BISAC for subject headings?

Response: Yes, there really isn't another source but Hachette has explored other options too.

Brenna McLaughlin, Electronic and Strategic Initiatives Director, AAUP

  • The Association of American University Presses has 134 non-profit members.
    • Members are humanities and social sciences publishers
    • Average staff size of members: 40 FTE but many are much smaller
    • Most members are mid- to small-size and struggle with ONIX to serve distribution channels
    • Members want the association to explore a service for them that will help with ONIX conversion and distribution
    • Johns Hopkins has a single proprietary database that creates ONIX data and they license database to other presses
    • ONIX standard is just not standard enough
    • ONIX is great but not inexpensive or trouble free
    • Cornell University Press distributes data through templated spreadsheets
    • Bibliographic data is exported into sales system
    • Sales assistant spend 20-25% of time on metadata
    • Trying off-the-shelf data conversion programs
    • University of Georgia press also exports
    • Concerns: authority of data, that third party metadata will override press metadata
    • University presses believe some areas of ONIX do not conform well to their needs for description/identifications. Many university presses have their own standards.
    • University Presses can be disconnected from the institution and may use outdated IT systems.
    • Even when the University library catalogs all the publications as part of the CIP program and/or for University OPACs, there is no feed back to publishers systems

Diane Boehr, Head of Cataloging, National Library of Medicine

View Diane's slide presentation [PDF]

  • The role of cataloger is changing.
  • Manually creating data is inefficient and redundant and we would like to stop doing that.
  • Libraries will not be able to catalog everything—we need to be more flexible.
  • Reviewed recommendations from the LC report on the future of bibliographic control—eliminate redundancies, make more use of upstream metadata
  • NLM does original cataloging for CIP.
  • Literally highlighting and pasting records—need to find a way to stop doing that. (potential CIP data workflow)
  • Fully automate CIP process. CIP workflows are still costly. Let catalogers focus on authority work and subject analysis and making unique and hidden treasure available.
  • Only 10% of publishers using ONIX but they account for 80% of titles cataloged at NLM.
  • Where are these conversion programs going to come from? OCLC? Split up amongst groups?
  • The three U.S. national libraries catalog in their own ILS (Voyager)
  • Doesn't have to be OCLC—NLM has created a form of its own conversion program and LC could do it too.
  • Specify mapping between major vocabularies
  • Author identifiers—many people in communities looking at this but we still need one standard or we will not be any better off

Patricia Payton, Senior Director, Publisher Relations & Content Development, R.R. Bowker

  • Bowker supports four areas of business: Identifiers—ISBN Agency; discovery products such as Books in Print and Syndetics, transactional services such as Pub-Easy and Pub-Net; and business intelligence products and services
  • Data coverage: Bowker accepts all kinds of files to get a base record and build on it. Concerned about timeliness of updates. Have web portal where updates can be loaded.
  • Add value to data and sends downstream, more discoverable, take both MARC and ONIX
  • Format requirements: Prefer ONIX but will take other formats and will reformat the data.
  • Agree that there are many flavors of ONIX but we need to acknowledge why this is.
  • Bowker is looking for what is quality of data and what we can add to improve it
  • Bad data in an old system can just be converted with ONIX.
  • Bowker services include:
    • Authenticate ISBNs.
    • Monitor use of ISBNs in the market.
    • Will do cross market comparisons across different countries.
    • Hold educational webinars on value of ISBN for publishers. Do a lot of marketing to publishers to help them understand importance of these items (focused on size of publisher).
  • Comment: Nielsen offers similar services in the UK.

Suzanne Kemperman, Publisher Relations at NetLibrary

  • Metadata is required for production--in order to convert files NetLibrary must work with the metadata
  • Most publishers do not have data in ONIX—of 500 +, 20 have data in ONIX
  • Most publishers they work with do not have data in a content management system
  • Also use data for cataloging materials to provide to libraries—using metadata from the publishers to start the process but cataloger labor is used as well
  • Would all like to see more metadata that brings value to the end user when searching for the data.
  • Need to arrive at a place where data standards are clear and agreed upon.
  • Lots of international publishers are not using ONIX data.
  • Electronic materials may not be going through same system as print publication—that is why ONIX is missing
  • Trade publishers have embraced ONIX more

Question; What percentage of metadata is coming in ONIX?

Response: 10% of publishers, 50% of content.

Breakout session 1:  Metadata formats, standards and best practices

Host: Maureen Huss

Participants: Mark Bide, Editeur
Ruth Fischer, R2
Judy Luther, Informed Strategies
Michael Healy, BISG
Ken Chad, Ken Chad Consulting
Bob Pearson, OCLC
Richard Roberts, OCLC
Maureen Huss, OCLC
Moriana Garcia, Kent State
Jean Godby, OCLC
Brenna McLaughlin, AAUP
Phil Madans, Hachette
Athena Salaba, Kent State
Karen Calhoun, OCLC

Summary of group discussion

  • 300 publishers using ONIX—Hachette, Simon Schuster
  • The recent best practices explain and humanize the standards. Many believe ONIX is not easy to understand. Information on best practices distills why it matters and what really matters.
  • BISG has added a certification layer—look at records and assign a grade of quality.
  • ONIX is deliberately permissive and not designed to be restrictive.
  • Allows ONIX to be appropriate to local markets where things differ.
  • Hachette went through certification process. Surprised by what they found.
  • Does there have to be an intermediary like Bowker?
  • Good metadata is not anything that a publisher can do on its own. They have to achieve it with others like Amazon. Ex. Firebrand. Publishers don't become publishers to produce metadata.
  • Curious about question of motivation; publisher to sell books vs. libraries to provide information.
  • Common motivation is discovery.
  • The library is somewhere that people are still going to search. Users still want a common experience whether it's Amazon or the library.
  • An intersection here is the internet—convergence of info between libraries and publishers.
  • Libraries are paying for services to link to TOC but what if you could link to that data from an ONIX record? It would save a lot of time and money.
  • Administrative information—there is a disconnect between library and publisher. Commonalities in bibliographic data.
  • How serious is it that the goal of publisher and librarian are different? Never been clearer than when I was at managing Nielsen for 6 years. Refined changes were in price and availability
  • Is our mission fatally doomed because we're 2 communities with 2 different missions.
  • It is fatal if we think we can solve it in a systematic way. We don't want to homogenize the data, but we have to homogenize the way we share the data.
  • Issue of discovery is an important concept.
  • As a retailer I am interested in getting someone something on beekeeping—I don't care which one is better, I want to give the customer what they want. Libraries have a different approach.
  • We do have some common ground.
  • I don't know why we have to map it—we just need an identifier that links it.
  • There is a brittle technology that could be a barrier to the seamless linking. When you link something you have to have the bandwidth to do that.
  • Just because you want the data doesn't mean Worldcat will give it to you.
  • What are the impediments of having a standard?
  • Legacy systems—moving from print to electronic--trying to bolt electronic to the side of print just doesn't work. Management of e products just can't be done on the side of a system designed for print.
  • Some publishers have enacted their own restrictions on that standard.
  • ONIX philosophy of the format to be more flexible—will see this even more with ONIX 3.
  • Distinction between publisher/library--issue with series treatment--issues can be immense for a library. Series titles changes or even changes in abbreviations can cause immense work further down.
  • There are rules but can you make people obey them!
  • We're a creative business [publishing] and it's a marketing issue. We pay people to be creative and try things differently.
  • Some of that goes back to the different motivations.
  • If we are going to look at our backlist and try to sell that stuff—we don't even have metadata for that. Libraries have better data on that than we do.
  • Sometimes my titles change by accident. My system should reflect my business rules. Does the system reflect my business rules—error or done on purpose? It's not a standards fault.
  • Am I wrong in thinking there are only half a dozen fields that would be different within ONIX. Is the CIP data in a form that could be fed back to publisher?
  • You can put it in an ONIX record but you can't put it in a publisher system.
  • Everything starts with a common map that links up to an agreement between individual organizations.
  • That's what best practices rules should be driving us towards. That solves the issue of too many publisher agreements.
  • What is the barrier? Cost benefit. Who is going to bear the cost?
  • Creating efficiency in the supply chain creates an imbalance—someone benefits more and another bears the cost unequally.
  • Is there something we can do to mitigate the costs?
  • Are libraries a minority market for publishers and does it have any affect on the ability to get to common ground? So it doesn't rise to the top of the agenda.
  • It's a cost of service.
  • Anecdotally it's dropping for university presses too.
  • Because of the economy we're [libraries] are switching budget focus buying less books (hard copies). And trying to get as much for free as we can.
  • With all of the different trends within libraries today—budgets, staffing, professionals. Is this the right time?
  • We're [libraries] really switching to consortium model. We don't really have the staff—the only thing we have is each other. Cooperative collection development and cataloging.
  • Another idea is patron-driven acquisition. This changes the dynamic quite a bit. Given the kind of budget pressure we're going to see over the next few years, this is going to catch on. It strikes me that it is going to be more important.
  • Library hasn't leveraged its niche as an intermediary of helping users connect to book regardless of where it comes from. Example: secondhand booksellers on Amazon.
  • Issue viewpoint on patron as a taxpayer.
  • Different approach as academic vs. public library and how people want to access info—online versus going to the place.
  • Libraries can be more open and they can add to the ONIX record.
  • Libraries are inflexible in a way because of standards and because of their systems. It's not free, it becomes a problem. The systems don't support.

Breakout Session 2: Subject Analysis and Terminologies

Host: Joan Mitchell, DDC

Participants: Diane Boehr, NLM
Diane Vizine-Goetz, OCLC
Michael Panzer, OCLC
Tschera Connell, OSU
Kevin Clair, Penn State
James Yanchak, Taylor & Francis
Renee Register, OCLC
Libbie Crawford, OCLC
Marcia Zeng, Kent State

Summary of Group Discussion

  • Discussion of terminologies as used in approval plans.
  • Discussion about terminologies as used in production.
  • Use data about data to create tools.
  • Library representatives should serve on the BISAC Subject Heading Committee
  • Definition of subject squishy across user groups.
  • Use of social (user-supplied) metadata
  • Libraries have little access to user generated info
  • Why is CIP data not in ONIX?
  • Not all libraries are the same -- children's vs. research.
  • Need to explore subject info in other contexts—ex. Geographic in map form
  • BISAC as first piece of evolving subject metadata
  • Library considers longevity of subject
  • N. American focus of BISAC
  • International?
  • Need to open up multilingual
  • Why BISAC
  • No library data is available at the time most BISACs are assigned learning curve to applying library metadata. The downside to standard is "how do I use it". How to change silos? How to change tools to be more intuitive?
  • BISACs may be assigned very loosely by editors or others without subject knowledge
  • Some pieces in a record may even be in silos.
  • Power of mapping.
  • Important to understand that DDC is no longer about shelving.
  • Different types of libraries need and use different vocabularies.
  • How to structure terminologies so they are of most use?
  • Break down record as a whole think about each element.
  • Consider encoding (and source) at the field level.
  • Promote flexibility when taking in different data.

Breakout Session 3: Identifiers and Authorities

Host: Janifer Gatenby

Participants: Patricia Payton, R.R. Bowker
David Martin, Editeur
Brian Greene, International ISBN Agency
Jane Burke, Serials Solutions
Suzanne Kemperman, NetLibrary
Thom Hickey, OCLC
Bob Van Volkenburg, OCLC
Phil Schreur, Stanford
Eric Childress, OCLC
Lorcan Dempsey, OCLC
Todd Carpenter, NISO
Timothy Dickey, OCLC
Cindy Cunningham, OCLC
Diane Boehr, NLM,
Don Hamparian, OCLC
Lorcan Dempsey, OCLC
Kay Covert, OCLC

The session consisted of a presentation by Janifer Gatenby, followed by group discussion.
View Janifer's slide presentation [PDF]

Summary of Group discussion

  • ISNI and how it'll work; joining library and trade data to create synergy
  • Resources in WorldCat without identification
  • How does the author discover their ISNI number?
  • How do publishers discover numbers that have been assigned to their future authors?
  • Closely linked to ISTC
  • Discrepancies between FRBR and ISTC
  • Concern over not processing enough unpublished material
  • Including institutional repository and museum data
  • Look at Claim ID and ZoomInfo to see if potential link
  • Need for WIKI for authors to be able to update things
  • Advantages and disadvantages of adding ISBNs to long tail in WorldCat
  • Even if ISBN is assigned it wouldn't be in the physical item; would be useful for linking digital to original resource
  • We need a way of linking identifier—reproduction to original and mechanisms for broadcasting those
  • Administration—how do we do it?
  • Scoping out all of the audiences that would be effected before a decision is made to centralize or de-centralize

Panel discussion:  Uses of Metadata and Interoperability

Reference: Renee Register's slide presentation [PDF]

Moderator: Todd Carpenter, NISO

Panelists:
Mark Bide, Editeur
Ken Chad, Ken Chad Consulting
Rick Lugg, R2 Consulting
Karen Calhoun, OCLC

Rick Lugg, R2 Consulting

  • Spent time working for an approval plan vendor and that really hasn't changed much since he left 10 years ago.
  • In most cases metadata has multiple purposes in the library workflow. Subject bibliographers use records for business and for acquiring materials—usually a brief record. Cataloging records usually don't have a field for price. Point: metadata has many different purposes.
  • After materials are purchased and received the library is looking for a full record—might get from cataloging partners but still not intended to be descriptive data. Data might be used to create an item record or holdings record.
  • Title status information is very important when record information is harvested. Libraries hate to encumber funds before an item is actually available.
  • Title status is important to publishers and the supply chain as well.
  • Classification is interesting too—LCSH, NLM, Dewey, etc. Various classifications are based on different concepts and different audiences.
  • Some things that people do—verify every field, authority checking, parse call numbers, might rekey a TOC for a record, adding fields in 856 for electronic version of the title.
  • Value add—who adds value at what point—this is labor intensive
  • Not strictly bibliographic metadata—a lot of transactional uses of the data too. People are adding a lot of layers and value on the local level too.
  • Post cataloging work at libraries is largely verification
  • 856 maintenance is time consuming for e-resources.
  • Question: Any guess of proportion of these processes?
  • Response: Depends on libraries. More time spend at smaller libraries—because still older model of value add. Could be up to 40% of cataloger time spend on other things.
  • Question: What can libraries do to be more efficient in this process?
  • Response: An enriched OPAC with evaluative content is attractive to folks, people are falling under the weight of cataloging CJK, media, etc. Opportunity for savings in streamlining

Mark Bide, Editeur

  • What can we do with standards like ONIX and MARC? We can do some things but we can't do it all.
  • It's not enough to say we have to have communication standards, we have to have systems that can use them. It's not enough for me to send you data -- do you have a system that can use that data?
  • Cost benefit analysis—where can we find the efficiencies. If I'm going to invest money but you get the benefit why should I do it? Is improving somebody's bottom line to my advantage?
  • Silos exist on both sides.
  • There is a great tendency to talk about ONIX as a design of systems but that's not what it's about. It's about a communication into ONIX and out of ONIX.
  • MARC was a communication standard and it became a data system. It's not a very good one.
  • I would not build an ONIX based system; I would build a system that can ingest data.
  • We don't want systems that are all the same because we want competition.
  • It doesn't make sense to do everything in MARC if it doesn't fit your business needs.

Ken Chad, Ken Chad Consulting

  • Libraries are a diminishing niche
  • In many ways, Google is a library company
  • ILS systems are part of the problem. Now they are obstructing the interoperability and flow of records.
  • It's all about investment.
  • I don't think inefficiencies in the library are the motivator
  • Impetus for change will be to remain competitive. This will be more of a motivation.
  • E-Books—librarians don't know what is available. The diversity of platforms and arrangements makes it difficult. And quality of data received is very poor.
  • Institutional repositories—don't look at how to ingest need to consider the motivation of putting it in there.
  • Why isn't all of this data free? Allow others to innovate by using this data?
  • We need to build new business models to get out of current hole
  • Working on UK report about metadata workflows
  • Libraries need to be more competitive. Recommender systems?
  • The quality of data available for e-books is perceived to be poor
  • Who are stake holders?
  • Is there a business model?
  • What are the implications of open data for new business models?
  • Why does the British library charge for data?

Karen Calhoun, OCLC

  • Challenges the notion of a disconnect between libraries and booksellers.
    1. Jay's remarks this morning
    2. Metadata quality report—what end users want and what libraries want are very different
    3. More product, less process
  • Web scale—WorldCat now has an extremely diverse source of metadata…..many agreements including publishers.
  • WorldCat is a very large aggregation of [library] metadata that also distributes and manages data for many other sources.
  • I see collaboration with publishers as being a key to creating a compelling user environment for end users.
  • The recent OCLC metadata quality report:
    • A lot of the difference has to do with workflows in the library and with end users
    • End users want more evaluative content, more links to online content (not just text but media too). Librarians bring expectations for data around what they do but classical expectations too.
  • ONIX has a great deal to offer in the marriage of these expectations.
  • In order to be effective on the web you have to have scale.
  • More product, less process—report from Green and Meissner (archivists)
    • Conclusion was that archivist tradition does not scale and must change.
    • Will our traditions scale? And how might we change to make it better.
  • Comment: Libraries spend almost as much time trying to keep people out as they do trying to get people to use it.
  • Karen: Yes, but we need to make sure we're getting the right people to use it.
    • Some important uses of metadata:
    • Search engine optimization
    • Content creation
    • Author royalties
    • Approval plans
    • Production driven by subject
    • Collection analysis
    • Link publisher and library data

Panel Discussion:  Collaboration and Innovation

Moderator: Cindy Cunningham, OCLC

Reference: Renee Register's slide presentation [PDF]

Ruth Fischer, R2 Consulting

  • R2 is working on a report commissioned by LC regarding the distribution of MARC records in North America. This list of distributors is much larger than originally anticipated and consists of a very diverse group of entities.
  • LC wants to understand how these entities are distributing, what is their purpose for distributing, do they make money distributing records or would they rather not distribute records.
  • The report will attempt to identify redundancies.
  • Libraries will be categorized by type and level of expertise in cataloging
  • Reporting will not be on a granular level so as not to identify any one entity
  • The report is due back to LC by the end of June
  • Report will not recommend what LC should do but will describe the current circumstances in regards to distribution of bibliographic records
  • Examples of survey questions:
    • Where are backlogs?
    • Who should approach authority control?
    • Upgrade in local catalog only or also in bibliographic utilities?
    • Identify who makes z39.50 database available.
  • Publishers sell books but similar goals for libraries -- they want collections to be used
  • More product less process mix up metadata
  • Webscale -- not two dimensional
  • Growing importance of user contributed data
  • Mass digitization
  • Point of concentration of metadata
  • ONIX has a great deal to offer
  • Move from value in creating to value of transmitting
  • Q: Why is LC doing this?
  • A: There are not sure they can sustain the current level of bibliographic work.

Judy Luther, Informed strategies

  • Judy is working on a report commissioned by NISO and OCLC examining current workflows and practices for metadata creation, distribution and maintenance across the life cycle of titles for the publisher supply chain and libraries.
  • Global Data synchronization network approved by GS1. Realtime synchronization exchange of data as with Wal-Mart. What are the implications for metadata?
  • Only about one-quarter of book data is sent to LC for CIP record creation.
  • Publisher incentive to use CIP is that it gets distributed into the supply chain and allows libraries and booksellers to know there are books in the pipeline.
  • But number of records vendors are processing is 2- 3 times the number of books actually created because of the different formats—books, CDs, ebooks, etc.
  • MARC record stability is a myth.
  • It makes more sense to have the metadata created closer to the source—much of which can be pulled from the publisher's XML files.
  • There is financial argument to be made for pay-per-use which can be driven by quality of metadata provided.
  • There is no overarching organization that can handle management of MARC records and other metadata on a global level—including libraries, publishers, booksellers, etc.

David Martin, Editeur

  • David provided the group with an update on ONIX 3.0—was to be published week of March 30
  • Not backwards compatible with 2.n series—changes with new series are significant enough that they will break with previous versions.
  • New version takes a more thorough approach to description of digital materials.
  • New ways to handle packaged materials or multiple items
  • New ways to describe publication dates and how thing supplied in different areas of market.
  • New ways to link to various related products or works
  • Great flexibility in how they update records—delta files instead of complete replacement
  • Probably six months before we see much market adoption
  • Allows for more structured data including facilities for describing chapters as separate entities

Brief presentations, OCLC Programs & Research

Reference: Renee Register's slide presentation [PDF]