Research project idea
OCLC Research will prototype a service which:
- Resolves ISBN prefixes to publisher name
- Resolves variant publisher names to a preferred form
- Captures and makes available for use various attributes of individual publishers (specifics TBD, but the following are anticipated:
- Location of publisher
- Language(s) of materials published
- Genre(s)/format(s) of materials published
- Dominant subject domain(s) of the publisher's output
- Parent company and subsidiaries)
The Publisher Name Server prototype maps variant publisher names to a preferred form and resolves ISBN prefixes to publisher name. It also compiles all known information regarding relationships among publishers: acquisitions, imprints, subsidiaries, mergers, joint ventures, etc.
To facilitate collection intelligence investigations and services, both internally and externally.
This project will benefit acquisitions, collection analysis, and data mining projects.
The primary deliverable of the project is a service, which will support advanced collection intelligence by: 1. facilitating the reliable clustering of collected objects based on their issuing entity (as can be determined via metadata about the objects), and 2. gaining intelligence about the nature of individual publishers which can in turn be used alone or in tandem with other data sources (e.g., usage logs, holdings) to reveal critical collection intelligence, acquisition patterns and user behavior.
The primary high level requirements are for the service to achieve acceptable reliability in resolving:
- ISBN prefixes to publisher name
- Variant publisher names to a preferred form.
- Primary emphasis: addressing names in Latin script
- As time and resources allow: addressing names in other scripts
Although the impetus to undertake the project is chiefly to facilitate collection intelligence investigations and services, it is anticipated that the prototype service may have value to a wide range of parties inside OCLC including units engaged in activities such as:
- Content acquisition/licensing: the service may be useful for revealing publishers with desirable output/consumption patterns.
- Metadata processing: "publisher" can be a valuable match point for duplicate record resolution and other activities.
This project likely will have potential synergies with OCLC Research's FRBR-related activities such as xISBN, and that the project may itself prove instrumental as a tool in other current or future OCLC Research activities.
Success will be measured in two ways:
- Mechanical: The delivery of a working prototype service
- Data reliability: The various associations made in the database must be complete and reliable in accordance with specified standards
The project will be considered complete when:
- The prototype service has been built and delivered
- The data delivered are complete and reliable in accordance with specified standards
- Alternatively, if the proposed service cannot be built satisfactorily within the time allowed and with the resources available, the project shall be considered concluded when a formal determination has been made of same, and the project is discontinued.
The project will adopt two primary research modes:
- Consultation: Experts within OCLC will be consulted as specifications are written to assure the best possible results are achieved. Additionally, in anticipation that the service might prove useful beyond the bounds of the research project, input will be sought about non-research requirements and the relative value of various data that might be included in the database.
- Prototyping/trial-and-error: Interested OCLC staff will be invited to test and provide feedback on the prototype
This will be a twelve-month project, divided into three phases:
- Phase 1: Resolve ISBN prefixes to publisher name
- Phase 2: Resolve variant publisher names to a preferred form
- Phase 3: Capture and make available for use various attributes of individual publishers. The specifics of this phase will be determined as work progresses, but the following are anticipated:
- location of publisher
- language(s) of materials published
- genre(s)/format(s) of materials published
- dominant subject domain(s) of the publisher's output
- parent company and subsidiaries.
The current Publisher Name database (as of September 2009) contains information on more than 1,850 publishers and imprints including:
- the top 25 publishers (by ISBN prefix) in WorldCat from the United States;
- the top 20 publishers from the United Kingdom;
- the top 10 publishers from Canada, Australia, Germany, France, the Netherlands, Japan, Italy, China, the Russian Federation, Spain, Finland, Australia, Taiwan, and New Zealand;
- the top 10 university presses;
- any publisher involved in a merger or acquisition since 2001.
The imprints from this set of top publishers represent roughly 9 million WorldCat records and 440 million holdings
Of the single preferred publisher name forms identified through programmatic datamining and research, 93% correspond to the established form in the LC/NACO Name Authority File, Books in Print, or the International ISBN Registry. The variant and former names were data mined from 57,000 imprint statements in WorldCat records.
In addition to the internal use of the database, aggregate data from the OCLC Publisher Name Authority File will be available in the prototype WorldCat Publisher Pages.
- Connaway, Lynn Silipigni, and Timothy J. Dickey. 2008. "Data Mining, Advanced Collection Analysis, and Publisher Profiles: An Update on the OCLC Publisher Name Authority File." Presentation given at the XXVIII Annual Charleston Conference, 7 November 2008, Charleston, South Carolina (USA). Available online at: http://www.oclc.org/research/presentations/connaway/charleston2008.ppt (.ppt: 761K/33 slides).
- Lynn Silipigni Connaway, and Timothy J. Dickey. 2008. “Beyond data mining: Delivering the next generation of service from library data.” Presented on panel, “Transforming Data into Services: Delivering the Next Generation of User-Oriented Collections and Services” at the American Society for Information Science & Technology 2008 Annual Meeting, Columbus, OH, October 28, 2008.
- Connaway, Lynn Silipigni, and Akeisha Heard. 2005. "Publisher Name Authority Project: An Attempt to Enhance Data Mining for Collection Analysis & Comparison." Presentation given at the XXV Annual Charleston Conference, 4 November 2005, Charleston, South Carolina (USA). Available online at: http://www.oclc.org/research/presentations/connaway/charleston2005.ppt (.ppt:183K/39slides).
- Connaway, Lynn Silipigni, and Akeisha Heard. 2005. "Publisher Name Authority Project: An Attempt to Enhance Data Mining for Collection Analysis & Comparison, A Selective Bibliography." Available online at http://www.oclc.org/research/projects/publisherns/bibliography.pdf (.pdf: 22K/3 pp.)
Most recent updates: Page content: 2009-08-11