OCLC Developer Network

Tips for Specific Indexes

This gives information about indexes that have unusual features or normalization rules. Also check out the general notes on searching indexes here and see the indexes offered at each Service Level. Additionally information about all the MARC subfields used for all WorldCat indexes, including the ones offered in the API is found at http://www.oclc.org/us/en/support/documentation/worldcat/searching/searchworldcatindexes/default.htm#search_worldcat_intro.fm

The indexes below are listed in alphabetical order after listing the Keyword index information.

Keyword

The keyword index for the WorldCat Search API is not the same index used in the WorldCat.org service. It matches the index used in the cataloging and FirstSearch WorldCat versions of the database. In general the main thing not included in this index that is included in the WorldCat.org index are standard numbers other then the ISBN. So, for example, the OCLC number and ISSN are not included in this index.

The Keyword search finds information in the author, title, subject, notes, publisher, publisher location, ISBN, year, year 2 and a few fields specific to the keyword search (034/a,b,d,e,f,g, 052/a,b, 255/a,b,c,d,e).

The Year and Year2 data in the 008 data is indexed following the same rules used in the Year index following the rules given below.

The ISBN is indexed as the data is indexed, without hyphens. However, any keyword search term that meets the characteristics of an ISBN and is entered with hyphens will be automatically concatenated by the search processor.

A geographic field found only in the keyword indexes is 052/a,b. This is a field useful to map catalogers. The 052/a field is indexed alone. Any records containing 052/b also indexes as 052/a concatenated together with 052/b without spaces, as a single word. So, a record with 052 #a 1234 #b P4 #b C2 is searchable as 1234, 1234p4, and 1234c2.

Access Method

The most direct search of internet resources is the Access Method search which searches the URLs found in some WorldCat records. The characters between the punctuation in the URL are the "words" that are searched for. For example, http://www.oclc.org is searched by the "words" www, oclc, and org. The most distinct term in the string is the most useful on which to search. All the stopwords apply to this index, plus two additional stopwords, "http" and "https".

This index can be used to pull up sets of internet resources that share a common URL. So for example, adding to a search the access method word “hathitrust” will retrieve records that include a URL to the Hathi Trust Digital library. Or include the proximity search of “books google” to limit the results to records that have a link to Google books.

Author indexes

The author index actually includes all people and corporations that have participated in the creation of the item, including sponsors, editors, directors, actors, illustrators and many other roles. The Personal Name index includes only people. The Corporate/Conference Name index includes only organizations such as corporations as well as meetings and conferences.

There are no stopwords in the Author, Corporate and Conference Name, and Personal Name indexes.

Dewey Class Number

This is available as an index when using the full service level. It is available as a limit only if using the default service level.

The data has been normalized so that all spaces and punctuation was removed except periods. The Dewey Decimal index will index classification numbers up to each slash or prime number also. So a Dewey number of 123.4/56/789 will be indexed as 123.4 and 123.456 and 123.456789.

DLC Limit

The DLC limit identifies records that were either cataloged by the United States Library of Congress or cataloged under the auspices of a United States national program such as CONSER or PCC. The only value to search is “y” to limit the result to include only these records.

Government document number

This data was normalized by removing all punctuation including periods and by removing all spaces.

ISBN

A search cam be entered that includes all hyphens or that has removed and concatenated the ISBN number. Both the 10-digit and 13-digit ISBN is available for all older ISBNs.

ISSN

This data is searchable only by including the hyphen in the number.

LCCN

The Library of Congress Control Numbers can be searched in a variety of ways. The numbers can be search with the hyphen added or with the zero fill characters that is also used to store the number. So for example, sn92-1234 is searchable as 92-1234 or 92001234 or sn92-1234 or sn92001234.

Language limit

The language index not only includes the primary language value found in the MARC 008, but it also includes all the languages when an item includes multiple languages. It also includes the language of summaries and additional textual material.

The Language index includes both the three-letter codes for languages found in the OCLC Marc Code Lists (ISBN 1-55653-169-9) and the expanded value for that language code in English. The language index also cab be search by the two-character ISO codes for languages and where it corresponds to a MARC langauge both records with the ISO code adn the MARC code are retrieved.

And while the English values for these codes are also searchable, the only precise search term is the three-letter code. The English search terms may group language codes together. For example, English as a word search brings together Modern English, Old English, and Creole or Pidgin English.

Please also see the Primary Language limit described below.

Library Holdings limit

This limit will determine if a library has attached their holdings to a record within the OCLC WorldCat database. The search term to use is the OCLC symbol. To find the OCLC symbol for an institution, please see Find an OCLC Library.

Library Holdings Group limit

This index limits results of records to items with a set number of libraries holding that item. The number of libraries that hold an item indicates that more libraries thought it important enough to purchase. It also indicates that it will be more likely to be available by Interlibrary Loan. Similarly limiting results to only those items held by one or very few libraries can indicate what may be unique or rare in a library’s collection.

The index has number codes that can be searched to limit the results to only records that have that number of OCLC libraries holding the item. Only one value can be searched and only by using the SRU relation of “=”. These numbers can not be ranged, so for example it isn’t possible to search “>” 17, but it is possible to get that pre-determined range by searching the code 08.

The search terms that can be searched include the following codes:

Number of Library HoldingsSearch code
5 or more holdings05
10 or more holdings06
50 or more holdings07
100 or more holdings08
500 or more holdings09
No holdings10
1 holding only11
2 – 4 holdings12
5 – 9 holdings13
10 – 24 holdings14
25 – 49 holdings15
50 - 74 holdings16
75 – 99 holdings17
100 - 149 holdings18
150 - 199 holdings19
200 - 299 holdings20
300 - 399 holdings21
400 - 499 holdings22
500 - 599 holdings23
600 - 699 holdings24
700 - 799 holdings25
800 - 899 holdings26
900 - 999 holdings27
1,000 - 1,499 holdings28
1,500 - 1,999 holdings29
2,000 - 2,499 holdings30
2,500 or more holdings31

Library of Congress Class number

The data has been normalized so that all spaces and punctuation was removed except periods.

Material type limit

The Material type index searches the record to identify different kinds of items. The complete list of codes can be found here.

While many of these codes are also searchable in the Primary document Type  index, not all the codes searched would be the same type of results. The codes that have different meanings between these two indexes are listed here.

Articles and Artifacts

  • art is for Article, chapters, papers, etc. as the primary document type in document type index

  • acp is for the same type of material, with additional article items, in material type index

  • art is for 3-d items or artifacts in the material type index

Books

  • bks is for Books that are primarily books and not articles or internet resources in the document type index

  • bks is for Books or Text of any kind in the material type index, including articles and internet resources cataloged as text

  • bnu is for Books that are not internet resources, including additional items, in the material type index

Maps

  • map is for Cartographic Material including maps in the document type index

  • cmt is for Cartographic material including records with additional cartographic information in the material type index

  • map is only items cataloged as having an 007 to indicate maps in the record in the material type index

Open Digital limit

The limit returns records describing digital content contributed to WorldCat from open access digital repositories.  Records contain URLs the point to the digital file. 

The searchable values are:

  • cntnt = digital records from CONTENTdm repositories added to WorldCat via the Digital Collection Gateway
  • cntnt OR dgcnt = all digital records added to WorldCat via the Digital Collection Gateway
  • cntcoll OR dgcoll = all digital collection records added to WorldCat via the Digital Collection Gateway

Primary document type limit

The Primary document type is assigning a single document type of the record by determining if the record qualifies as an Internet Resource. If it is not an Internet Resource then it is assigned the document type based on the value in the Leader field of the MARC record.

The searchable values are:

  • art Articles
  • bks Books
  • com Computer files
  • int Continually updated resources
  • map Maps
  • mix Mixed material (Archival Materials)
  • sco Musical scores
  • ser Serials (Journals and Magazines)
  • rec Sound recordings
  • url Internet Resource
  • vis Visual Materials

While many of these codes are also searchable in the Material Type index to retrieve any record that is that type of document, not all the codes searched would be the same. See information on the Material Type index above.

Primary language limit

The primary language index searches the three-letter codes for languages found in the OCLC Marc Code Lists (ISBN 1-55653-169-9). There is only one primary language code per record, determined by the cataloger in used the three character language code of the 008 of the MARC record.

See information on the Full Service level Langauge index above.

Publisher and music number

This index is of a field that was originally used for Music numbers and later was expanded to include other Publisher numbers. The data in this field has normalized so that the different ways in which the data could be entered is now more easily searched in a consistent manner. Rules for normalizing the data are given below:

 

  • The data is concatenated, removing punctuation up to parentheses, commas, and dashes (double hyphens). Numbers in parentheses follow the same rules. So "ab 123" and "ab.123" and "ab-123" all are searches as "ab123". However, two spaces makes it a new number. So "ab 123" is "ab123," but when two spaces are used, it's "ab" and "123".
  • Each number up to a comma space is indexed alone. So [028 #a ab123, ab124, ab125] has three numbers that are searchable: "ab123," "ab124," and "ab125."
  • Numbers up to dashes are indexed, plus possibly the series between the dash. The series is indexed only if the beginning of the second number after the dash matches the beginning of the first number. So [262 #c ab123--ab125 has three searchable terms: "ab123," "ab124," and "ab125." However, if the ab does not start the next number, the range is not searchable. So [262 #c ab123--ac125] has two searchable terms, "ab123 and ac125." However, [265 #c ab123-ab125] has three searchable terms, "ab123", "ab124", and "ab125". While the start and ending of a series are always indexed, the range is only indexed up to the first 20 values.
  • Information within parenthesis behaves as if it is a new field, with all the rules included. So [028 #a 123--125 (cd345--cd347, dd123) ] has searchable "123," "124," "125," "cd345," "cd346," "cd347," and "dd123."

Standard Number

The standard number index includes ISBNs, ISSNs, LCCNs, Universal Product Code, National Bibliographic Agency Control Number, International Standard Recording Code, International Standard Music Number, International Article Number, Serial Item and Contribution Identifier, Standard Technical Report Number, Publisher Number, CODEN, Source of Acquisition, Report Number and Other Standard Identifiers. However, the OCLC number is not part of this index. For all of these identifiers, punctuation is removed with the except that the ISSN and ISBN can be searched either with or without the hyphens.

However, default level index of the Library of Congress control number (LCCN) has many more ways of handling LCCNs then the standard number index. If at all possible it would be best to use the LCCN index for this type of data.

Title

The Title phrase search has the 245/a and 245/b subfields combined into a single phrase search with structure attribute 1. These subfields (245/a, 245/b) can be searched as separate subfields also.

Title index doesn’t include the initial “a”, “an”, “the” or other leading articles.  This is because the words that have been indicated to be non-filing terms were not included in the title index.  Therefore at this time there are a series of stopwords in the title index to reduce the impact of this.  Other terms were added to improve the index and to match the changes made to the way the WorldCat.org title index is working. 

The list of title stopwards are:.  a, als, am, an, are, as, at, auf, aus, be, but, by, das, dass, de, der, des, dich, dir, du, er, es, for, from, had, have, he, her, his, how, ihr, ihre, ihres, im, in, is, ist, it, kein, la, le, les, mein, mich, mir, mit, of, on, sein, sie, that, the, this, to, un, une, von, was, wer, which, wie, wird, with, yousie, that, the, this, to, un, une, von, was, wer, which, wie, wird, with, you.

Year limit

The data indexed is the 008 Date1 data. This data has the practice of storing unknown data as “u”. For the index, all “u”s were indexed as zeros, so for example 199u is indexed as 1990. Years that are shorter than four digits have leading zeros added. To search year 999, enter 0999. While a range will go through the entire years indexed, any search that is an unbounded ranges is searching the range between 1000-2030. So when limiting a search to only records published before the year 1900, the limit includes only those items published between the year 1000 and 1899. Plus the year 1000 includes items that have unknown century, entered as 1uuu, which may include the 1900s.

Follow the OCLC Developer Network:

The OCLC Developer Network supports the use of OCLC Web Services—a set of tools and APIs that expose data and services for WorldCat and our member libraries and partner institutions or companies. learn more »

© 2010 OCLC Domestic and international trademarks and/or service marks of OCLC Online Computer Library Center, Inc. and its affiliates


Powered by Drupal, an open source content management system