Data Mining Research Area
Overview: Making Data Work Harder
Libraries have made huge investments in creating and maintaining rich, structured information describing the resources in their collections.
This data embodies considerable value by supporting access and inventory control. It also represents potential value in terms of
- knowing more about the characteristics of library collections
- generating interesting and innovative data displays
- providing intelligence to support a range of library decision-making needs, including
- collection development
There is untold value in bibliographic information, but it is largely untapped. If libraries are to realize the full value of their bibliographic data—or, put another way, if libraries are to maximize the return on the investments they make to create this data—steps must be taken to release this value in innovative and useful ways.
Internet giants such as Amazon and Google provide valuable lessons on the importance of squeezing the full value from available data. Whether in the form of book recommendations (if you like this book, you'll also like . . .), search result rankings, targeted advertising, or collection views (e.g., Google Scholar), the "Amazoogle" companies make a concerted effort to release as much value as possible from the data at hand.
Libraries possess rich reservoirs of data. However, this data needs to be made to work harder in order to create value for librarians and users. To this end, the OCLC Research Data-Mining Research Area will focus on projects aimed at creating value from the bibliographic information in WorldCat and other library data sources.
OCLC Research has a number of projects currently underway in the Data-Mining Research Area, with plans for several future projects as well.
- Books as Expressions of Global Cultural Diversity: WorldCat data reveal transnational patterns in literary publishing, the preservation of individual countries’ literary heritage, and the cultural diversity present in the books.
- The Systemwide Print Book Collection: analyzes the size and characteristics of aggregate print book holdings, with an emphasis on implications for digitization and preservation decision-making.
(A version of this presentation was given to the May 2005 meeting of the OCLC Members Council Digital Libraries Research Interest Group.)
- Anatomy of Aggregate Collections: The Example of Google Print for Libraries: This D-Lib Magazine article offers some perspectives on the Google Print Library Project in light of what is known about library print book collections in general, and those of the Google 5 in particular, from information in OCLC's WorldCat bibliographic database and holdings file.
- Audience Levels: infer materials' target audience, or audience level, using holdings information.
- "Last Copy:" identify rare or unique materials in individual library collections. This activity was reported in:
Connaway, Lynn Silipigni, Edward T. O'Neill, and Chandra Prabha. 2006. "Last Copies: What's at Risk?" College and Research Libraries, 67,4 (July): 370-379. Pre-print available online at: http://www.oclc.org/research/publications/archive/2006/connaway-crl07.pdf (PDF:151K/24pp.).
- WorldMap: visualize geographic distribution of selected library data. Currently available data include holdings and titles, each by place of publication (from OCLC WorldCat) and number of libraries, librarians, users, volumes, and annual expenditures (from other sources).
- Mining for Digital Resources: identification and characterization of digital resources cataloged in WorldCat.
- Comparative Collection Assessment: looks at collection development, assessment, and resource sharing for print- and e-book collections.
- Publisher name server: prototype service that resolves ISBN prefixes to publisher name; resolves variant publisher names to a preferred form; and captures and makes available various publisher attributes (e.g., location, language, genre/format, dominant subject domain, etc. of the publisher's output)
- Connaway, Lynn Silipigni, and Timothy J. Dickey. 2008. Beyond Data Mining: Delivering the Next Generation of Services from Library Data (.ppt: 2.4MB/50 slides). ASIS&T 2008 Annual Meeting, 28 October, Columbus, Ohio (USA).
- Lavoie, Brian. 2008. Mining for Copyright Evidence (.ppt: 841K/16 slides). ASIS&T 2008 Annual Meeting, 28 October, Columbus, Ohio (USA).
- O'Neill, Edward T. 2008. OhioLINK Collection Analysis Project: Preliminary Analysis (.ppt: 3.5MB/26 slides). ASIS&T 2008 Annual Meeting, 28 October, Columbus, Ohio (USA).
- Connaway, Lynn Silipigni, and Larry Olszewski. 2006. A Geographical Representation of WorldCat Resources: A Decision-Making Tool for Acquisitions and Collection Management (.pdf: 1.1MB/26 pp.). XXVI Annual Charleston Conference, 10 November, Charleston, South Carolina (USA).
- Connaway, Lynn Silipigni. 2006. Capturing Untapped Descriptive Data: Creating Value for Librarians and Users (.ppt: 1.9MB/40 slides). ASIS&T 2006 Annual Conference, 8 November, Austin, Texas (USA).
- Lavoie, Brian. 2005. On Lemons and Bibliographic Data...Creating Value through WorldCat Data-mining (.ppt:177K/14 slides). OCLC Members Council Digital Libraries Research Interest Group meeting, 16 May, Dublin, Ohio (USA).
- Lavoie, Brian, Lynn Silipigni Connaway, and Ed O'Neill. 2005. Mining for Digital Resources: Identifying and Characterizing Digital Materials in WorldCat (.ppt: 112K/15 slides). ACRL 12th National Conference: Currents and Convergence: Navigating the Rivers of Change, 7-10 April, Minneapolis, Minnesota (USA).
- Lavoie, Brian, and Roger C. Schonfeld (Ithaka). 2005. A Systemwide View of Library Collections (.ppt: 300K/35 slides). CNI Spring 2005 Task Force Meeting, 4-5 April 2005, Washington, DC (USA).
- Lavoie, Brian F., Lynn Silipigni Connaway, and Edward T. O'Neill. 2007. "Mapping WorldCat's Digital Landscape." Library Resources and Technical Services (LRTS), 51,2 (April): 106-115. Available online at: http://www.ala.org/ala/alcts/alctspubs/librestechsvc/LRTS_51n2Lavoie.pdf (.pdf: 522K/10 pp.).
- Connaway, Lynn Silipigni, Edward T. O'Neill, Eudora Loh, and Mary E. Jackson [first author]. 2007. "Changing Global Book Collection Patterns in ARL Libraries." Report prepared for the Global Resources Network. Available online at: http://www.arl.org/resources/pubs/grn_global_book.shtml or http://www.crl.edu/sites/default/files/attachments/pages/grn_global_book.pdf.