|
|
|
|
Research : Activities : Audience Level
Audience LevelThis activity is now closed. The information on this page is provided for historical purposes only. There are a variety of ways to characterize library materials. The type of reader believed to be interested in a particular item is one. Such an indicator, generally known as the audience level, is potentially useful for a variety of activities, including the development of new ways to improve information relevance for retrieval, reference services (including readers advisory) and collection development. Audience-level filters could be implemented in existing retrieval systems to assist users in finding content based on their information needs. The Audience Level prototype and its related research project are part of a broader data mining activity at OCLC Research, which seeks to explore various ways to leverage intelligence from system files, and "make data work harder." Determining a monograph's audience level is a challenge because cataloging rules generally do not require inclusion of this information. Thus, many bibliographic records have no explicit indicator of target audience. OCLC researchers hypothesized that audience level could be inferred from the types of library (such as Association of Research Libraries (ARL), non-ARL academic, public, and school) holding the material. The data presented in the Audience Level prototype are current as of January 2008. BackgroundDetermining a monograph's audience level is difficult because there is no bibliographic practice or standard requiring the inclusion of this information in the bibliographic record, except for the fixed field in the Machine Readable Code (MARC) record and the Library of Congress Subject Heading (LCSH) subdivision often used to identify juvenile literature and fiction. Thus, many bibliographic records have no direct indication of the target audience for the item represented.
ImpactThe findings from this research will benefit the development of new ways to improve information relevance for retrieval, reference services (including readers advisory) and collection development. Audience level filters could be implemented in existing retrieval systems to assist users in finding content based on their information needs. This effort is one of several data mining projects whereby OCLC Research seeks to extract intelligence from the data we have, and use it in different ways that provide value to libraries. About the Audience Level PrototypeThis prototype system, developed in conjunction with the Audience Level research project, uses library holdings data in WorldCat to calculate audience levels for books represented in the WorldCat database. The audience level is then expressed as a decimal between 0.01 (juvenile books) and 1.00 (scholarly research works). The Audience Level prototype is accessible in two ways:
An initial experiment with Greasemonkey scripts for Firefox proved to be exciting but high-maintenance, so it is no longer supported. Try out the user interfaceAccess the Audience Level prototype and input an OCLC WorldCat number, an ISBN (international Standard Book Number), or an ISSN (International Standard Serial Number) for a periodical. (See sidebar for additional information on how to find one of these numbers.) The system will return an assessment of the likely audience level of the item based on the holding patterns and bibliographic characteristics of the item, as described in the WorldCat record. This assessment is represented numerically, along with title, author, and a summary of the WorldCat holdings used to calculate the audience level of the item. The audience-level assessment also is represented graphically by a bar chart. More information about the audience-level calculation is available by clicking on the "Manifestations" link that appears on the chart. This will display a list of all the different physical realizations of the work used to calculate its audience level. (Be aware! Some works—such as those near the top of the OCLC Top 1000 list—have thousands of manifestations. Worksets such as these can take several moments to load into your browser.) Manifestation-level data displayed include OCLC number for each manifestation, language and date of the manifestation, and number of libraries holding the manifestation. In addition to the stand-alone Audience Level prototype, aggregate data from the OCLC Audience Level will be available in the prototype WorldCat Publisher Pages. In addition to the user interface described above, the Audience Level prototype is available as a web service: Web serviceThe Audience Level web service is available from: http://audiencelevel.oclc.org/AudienceLevel/webServ/ for returning XML from OCLC database number inputs, and from: http://audiencelevel.oclc.org/AudienceLevel/webISBNServ/ for returning XML from ISBNs. Example: The URL string: http://audiencelevel.oclc.org/AudienceLevel/webISBNServ/0716601036 will produce the audience-level assessment for The World Book Encyclopedia workset (0.08)responding to the input of the ISBN 0-7166-091936. MethodologyRecognizing that different types of libraries typically serve different populations, OCLC researchers considered whether library types could be related to audience levels. They decided to explore whether the pattern of holdings of materials in WorldCat might be leveraged to provide an audience-level indicator. OCLC researchers hypothesized that audience level could be inferred from the types of library holding the material, if the holdings symbols were weighted by a numeric code for library type. OCLC's WorldCat database provides an excellent data source for this project because it contains more than 50 million bibliographic records and a billion holding locations. The fixed field in the Machine Readable Code (MARC) record includes a "Target Audience" indicator (008/22), described as: "The intellectual level of the audience for which the item is intended." The following table lists these codes and the audiences they represent, along with the weight we assigned to each code. If the Target Audience indicator exists in a title's MARC record, the title is assigned the Audience Level as indicated in this table.
If the Target Audience indicator does not exist, an audience level is calculated for the title based on the library holdings data attached to the bibliographic record. Each bibliographic record in OCLC has some number of holdings symbols attached to it. These symbols represent the individual libraries that are said to "hold" the item represented by the record. Researchers determined the type of library for each holdings symbol in the database. They used 4 main categories: Association of Research Libraries (ARL) members, academic (non-ARL), public, and school. Any of the library symbols that did not fit into one of these groups were discarded. After the library type of each holdings symbol was determined, researchers assigned a weight to each library type:
Once the weights were assigned, researchers constructed an indication of audience level by averaging the weights of the holdings symbols on the record. The formula for this averaging is: (Number of ARL holdings symbols on the record * 1.0)+ (Number of academic-library holdings symbols on the record * 0.67) + (Number of public-library holdings symbols on the record * 0.33) + (Number of school-library holdings symbols on the record * 0.0) / (Total number of holdings symbols on the record) = The average library-type weight of libraries holding the item. For example, say we have a record with the following holdings symbols:
where 1 is the OCLC number for the item, and ABC, DEF, etc. are the holdings symbols. Suppose ABC, DEF, and GHI are academic libraries, JKL is a public library, and MNO is a school library. The formula used to determine audience level for this item would be: (3 * 1.0) + (1 * 0.67) + (1 * 0.33) / 5 = 0.8.Furthermore, we can use this method to determine the audience level of a FRBR work by finding all of the items in that work and computing the average (weighted by holdings) of each of their respective audience levels. For example, consider a workset containing the following items:
Where {1,2,3} are the OCLC numbers, {5,10,7} are the holdings counts that were used to compute the audience level, and {.8,.76,.94} are the respective audience levels of each item. The average audience level for the work would then be computed by: [(5 * 0.8) + (10 * 0.76) + (7 * 0.94)] / (5 + 10 + 7) = 0.826
This approach can be used to calculate overall audience-level measures for collections or other groups of records. The overall audience-level assessment for the WorldCat database itself is 0.63. A wrinkleWe believe this approach produces interesing and usable results. For example:
These values, which are for the FRBR work, are approximately what one would expect. Of course, we need to remember what this approach measures. For example, if one were to assign a 'reading level' to Nietzsche's Thus Spake Zarathrustra (ISBN 0394608089) one might expect it to be high - maybe .8 or higher. However, we return a score of 0.61. As a classic of philosophy this title has a wide potential audience, and is widely represented in public, academic and ARL collections. The manifestation-level records display audience-level measures ranging from 0.33 to 1.0. OCLC Researchers continue to explore ways to account for and manage such distributional effects. FeedbackThis approach gives an indication of audience level. Is it useful? How could it be used? We are interested in your ideas! Please let us know what you think. Outputs
Team MembersMost recent updates: page content 25 January 2010, prototype 11 February 08. |