Please note: This experimental research project has concluded.
The research prototype application is no longer supported or maintained by OCLC services, and information on this page is provided for historical purposes only. Some portion of this content may be out-of-date and include broken links. Please visit the OCLC Research website to learn more about our current research.

Audience Level

There are a variety of ways to characterize library materials. The type of reader believed to be interested in a particular item is one. Such an indicator, generally known as the audience level, is potentially useful for a variety of activities, including the development of new ways to improve information relevance for retrieval, reference services (including readers advisory) and collection development. Audience-level filters could be implemented in existing retrieval systems to assist users in finding content based on their information needs.

The Audience Level prototype and its related research project were part of a broader data mining activity at OCLC Research, which sought to explore various ways to leverage intelligence from system files, and to "make data work harder." Determining a monograph's audience level is a challenge because cataloging rules generally do not require inclusion of this information. Thus, many bibliographic records have no explicit indicator of target audience. OCLC researchers hypothesized that audience level could be inferred from the types of library (such as Association of Research Libraries (ARL), non-ARL academic, public, and school) holding the material.

 

Background

Determining a monograph's audience level is difficult because there is no bibliographic practice or standard requiring the inclusion of this information in the bibliographic record, except for the fixed field in the Machine Readable Code (MARC) record and the Library of Congress Subject Heading (LCSH) subdivision often used to identify juvenile literature and fiction. Thus, many bibliographic records have no direct indication of the target audience for the item represented.

Impact

 

The findings from this research were envisioned as benefitting the development of new ways to improve information relevance for retrieval, reference services (including readers advisory), and collection development.

Audience level filters could be implemented in existing retrieval systems to assist users in finding content based on their information needs.

This effort was one of several data mining projects whereby OCLC Research seeks to extract intelligence from the data we have, and use it in different ways that provide value to libraries.

 

 

About the Audience Level Prototype

This prototype system, developed in conjunction with the Audience Level research project, used library holdings data in WorldCat to calculate audience levels for books represented in the WorldCat database.

The audience level was then expressed as a decimal between 0.01 (juvenile books) and 1.00 (scholarly research works).

The Audience Level prototype was accessible in two ways:

  • a user interface
  • as web services that accepted either OCLC number, ISBN or ISSN

An initial experiment with Greasemonkey scripts for Firefox proved to be exciting but too high-maintenance for long-term support.

The user interface

Users could access the Audience Level prototype and input an OCLC WorldCat number, an ISBN (international Standard Book Number), or an ISSN (International Standard Serial Number) for a periodical. (See sidebar for additional information on how to find one of these numbers.)

The system would return an assessment of the likely audience level of the item based on the holding patterns and bibliographic characteristics of the item, as described in the WorldCat record.

This assessment was represented numerically, along with title, author, and a summary of the WorldCat holdings used to calculate the audience level of the item.

The audience-level assessment also was represented graphically by a bar chart.

More information about the audience-level calculation was available by clicking on the "Manifestations" link that appears on the chart. This will display a list of all the different physical realizations of the work used to calculate its audience level.

Manifestation-level data displayed included OCLC number for each manifestation, language and date of the manifestation, and number of libraries holding the manifestation.

In addition to the stand-alone Audience Level prototype, aggregate data from the OCLC Audience Level was made available in the prototype from the WorldCat Publisher Pages activity.

In addition to the user interface described above, the Audience Level prototype is available as a web service:

Web service

Audience Level web services were available for returning XML from OCLC database number inputs, and for returning XML from ISBNs.

Methodology

Recognizing that different types of libraries typically serve different populations, OCLC researchers considered whether library types could be related to audience levels. They decided to explore whether the pattern of holdings of materials in WorldCat might be leveraged to provide an audience-level indicator.

OCLC researchers hypothesized that audience level could be inferred from the types of library holding the material, if the holdings symbols were weighted by a numeric code for library type.

OCLC's WorldCat database provided an excellent data source for this project because it contained more than 50 million bibliographic records and a billion holding locations at the time this work was conducted.

The fixed field in the Machine Readable Code (MARC) record included a "Target Audience" indicator (008/22), described as: "The intellectual level of the audience for which the item is intended." The following table lists these codes and the audiences they represent, along with the weight we assigned to each code.

If the Target Audience indicator existed in a title's MARC record, the title was assigned the Audience Level as indicated in this table.

MARC code Description Audience Level
a preschool 0.0
b primary (K - 3) 0.1
c elementary and junior high (grades 4 - 8) 0.15
j juvenile (through age 15 or grade 9) 0.15
d secondary (grades 9 - 12) 0.25
e adult N/A
f specialized N/A
g general N/A

If the Target Audience indicator did not exist, an audience level was calculated for the title based on the library holdings data attached to the bibliographic record.

Each bibliographic record in OCLC has some number of holdings symbols attached to it. These symbols represent the individual libraries that are said to "hold" the item represented by the record.

Researchers determined the type of library for each holdings symbol in the database. They used 4 main categories: Association of Research Libraries (ARL) members, academic (non-ARL), public, and school. Any of the library symbols that did not fit into one of these groups were discarded.

After the library type of each holdings symbol was determined, researchers assigned a weight to each library type:

Library type Weight
ARL 1.0
Academic 0.67
Public 0.33
School 0.0

Once the weights were assigned, researchers constructed an indication of audience level by averaging the weights of the holdings symbols on the record. The formula for this averaging is:

(Number of ARL holdings symbols on the record * 1.0)
+ (Number of academic-library holdings symbols on the record * 0.67)
+ (Number of public-library holdings symbols on the record * 0.33)
+ (Number of school-library holdings symbols on the record * 0.0)
/ (Total number of holdings symbols on the record)
= The average library-type weight of libraries holding the item.

For example, say we have a record with the following holdings symbols:

1 ABC DEF GHI JKL MNO

where 1 is the OCLC number for the item, and ABC, DEF, etc. are the holdings symbols. Suppose ABC, DEF, and GHI are academic libraries, JKL is a public library, and MNO is a school library. The formula used to determine audience level for this item would be:

(3 * 1.0) + (1 * 0.67) + (1 * 0.33) / 5 = 0.8.

Furthermore, we used this method to determine the audience level of a FRBR work by finding all of the items in that work and computing the average (weighted by holdings) of each of their respective audience levels. For example, consider a workset containing the following items:

1 5 .8
2 10 .76
3 7 .94

Where {1,2,3} are the OCLC numbers, {5,10,7} are the holdings counts that were used to compute the audience level, and {.8,.76,.94} are the respective audience levels of each item. The average audience level for the work would then be computed by:

[(5 * 0.8) + (10 * 0.76) + (7 * 0.94)] / (5 + 10 + 7) = 0.826

This approach can be used to calculate overall audience-level measures for collections or other groups of records.

The overall audience-level assessment for the WorldCat database itself was 0.63, as of January 2008.

A wrinkle

We believe this approach produced interesting and usable results. For example:

Title Author ISBN Audience Level
Operations Research for Libraries and Information Agencies Kraft & Boyce 012424520X 0.78
The Kite Runner Khaled Hosseini 1573222453 0.43
The Da Vinci Code Dan Brown 0385504209 0.43
Harry Potter and the Sorcerer's Stone J.K. Rowling 0590353403 0.15

These values, which are for the FRBR work, are approximately what one would expect.

Of course, one must remember what this approach measures. For example, if one were to assign a 'reading level' to Nietzsche's Thus Spake Zarathrustra (ISBN 0394608089) one might expect it to be high - maybe .8 or higher. However, we return a score of 0.61.

As a classic of philosophy this title has a wide potential audience, and is widely represented in public, academic and ARL collections. The manifestation-level records display audience-level measures ranging from 0.33 to 1.0.

OCLC Researchers explored various ways to account for and manage such distributional effects.

Feedback

This approach provided an indication of audience level. Was it useful? How could it be used? We are interested in your ideas! Please let us know what you think.

 

Outputs

 

  • Lynn Silipigni Connaway, and Timothy J. Dickey. 2008. "Beyond Data Mining: Delivering the Next Generation of Service from Library Data." Presented on panel, "Transforming Data into Services: Delivering the Next Generation of User-Oriented Collections and Services" at the American Society for Information Science & Technology 2008 Annual Meeting, Columbus, OH, October 28, 2008.
  • Edward T. O'Neill, Lynn Silipigni Connaway, and Timothy J. Dickey. 2008. "Estimating the Audience Level for Library Resources." Journal of the American Society for Information Science & Technology, 59(13), 2042-2050.
  • Lynn Silipigni Connaway. 2004. "Estimating Audience Level of Monographs Using Holding Patterns in WorldCat." Presented at Library Research Seminar III: Learning and Growing; Inquiry into librarianship, October 14-16, 2004, Kansas City, Missouri. Available online at: http://www.oclc.org/research/presentations/connaway/lrsIII_audience.ppt (PowerPoint:32MB/29slides)
  • OCLC Research Data Mining activities

 

Most recent updates: Page content: 2012-06-19

Lead

Lynn Silipigni Connaway

Team Members

Ed O'Neill

Jeremy Browning

Timothy J. Dickey

Where can I find an ISBN, ISSN, or OCLC WorldCat number?

  • ISBNs and ISSNs appear on records in many library catalogs.
  • ISBNs and ISSNs also are displayed on many WorldCat records, which can be located through WorldCat.org or WorldCat partner sites.
  • Online bookstores frequently display ISBNs for specific titles.
  • OCLC numbers appear on records in WorldCat and other FirstSearch databases.
  • Some library catalogs may display OCLC numbers for individual titles.