This gives information about indexes that have unusual features or normalization rules. Also check out the general notes on searching indexes here and see the indexes offered at each Service Level. Additionally information about all the MARC subfields used for all WorldCat indexes, including the ones offered in the API is found at http://www.oclc.org/us/en/support/documentation/worldcat/searching/searchworldcatindexes/default.htm#search_worldcat_intro.fm
The indexes below are listed in alphabetical order after listing the Keyword index information.
The keyword index for the WorldCat Search API is not the same index used in the WorldCat.org service. It matches the index used in the cataloging and FirstSearch WorldCat versions of the database. In general the main thing not included in this index that is included in the WorldCat.org index are standard numbers other then the ISBN. So, for example, the OCLC number and ISSN are not included in this index.
The Keyword search finds information in the author, title, subject, notes, publisher, publisher location, ISBN, year, year 2 and a few fields specific to the keyword search (034/a,b,d,e,f,g, 052/a,b, 255/a,b,c,d,e).
The Year and Year2 data in the 008 data is indexed following the same rules used in the Year index following the rules given below.
The ISBN is indexed as the data is indexed, without hyphens. However, any keyword search term that meets the characteristics of an ISBN and is entered with hyphens will be automatically concatenated by the search processor.
A geographic field found only in the keyword indexes is 052/a,b. This is a field useful to map catalogers. The 052/a field is indexed alone. Any records containing 052/b also indexes as 052/a concatenated together with 052/b without spaces, as a single word. So, a record with 052 #a 1234 #b P4 #b C2 is searchable as 1234, 1234p4, and 1234c2.
The most direct search of internet resources is the Access Method search which searches the URLs found in some WorldCat records. The characters between the punctuation in the URL are the "words" that are searched for. For example, http://www.oclc.org is searched by the "words" www, oclc, and org. The most distinct term in the string is the most useful on which to search. All the stopwords apply to this index, plus two additional stopwords, "http" and "https".
This index can be used to pull up sets of internet resources that share a common URL. So for example, adding to a search the access method word “hathitrust” will retrieve records that include a URL to the Hathi Trust Digital library. Or include the proximity search of “books google” to limit the results to records that have a link to Google books.
The author index actually includes all people and corporations that have participated in the creation of the item, including sponsors, editors, directors, actors, illustrators and many other roles. The Personal Name index includes only people. The Corporate/Conference Name index includes only organizations such as corporations as well as meetings and conferences.
There are no stopwords in the Author, Corporate and Conference Name, and Personal Name indexes.
This is available as an index when using the full service level. It is available as a limit only if using the default service level.
The data has been normalized so that all spaces and punctuation was removed except periods. The Dewey Decimal index will index classification numbers up to each slash or prime number also. So a Dewey number of 123.4/56/789 will be indexed as 123.4 and 123.456 and 123.456789.
The DLC limit identifies records that were either cataloged by the United States Library of Congress or cataloged under the auspices of a United States national program such as CONSER or PCC. The only value to search is “y” to limit the result to include only these records.
This data was normalized by removing all punctuation including periods and by removing all spaces.
A search cam be entered that includes all hyphens or that has removed and concatenated the ISBN number. Both the 10-digit and 13-digit ISBN is available for all older ISBNs.
This data is searchable only by including the hyphen in the number.
The Library of Congress Control Numbers can be searched in a variety of ways. The numbers can be search with the hyphen added or with the zero fill characters that is also used to store the number. So for example, sn92-1234 is searchable as 92-1234 or 92001234 or sn92-1234 or sn92001234.
The language index not only includes the primary language value found in the MARC 008, but it also includes all the languages when an item includes multiple languages. It also includes the language of summaries and additional textual material.
The Language index includes both the three-letter codes for languages found in the OCLC Marc Code Lists (ISBN 1-55653-169-9) and the expanded value for that language code in English. The language index also cab be search by the two-character ISO codes for languages and where it corresponds to a MARC langauge both records with the ISO code adn the MARC code are retrieved.
And while the English values for these codes are also searchable, the only precise search term is the three-letter code. The English search terms may group language codes together. For example, English as a word search brings together Modern English, Old English, and Creole or Pidgin English.
Please also see the Primary Language limit described below.
This limit will determine if a library has attached their holdings to a record within the OCLC WorldCat database. The search term to use is the OCLC symbol. To find the OCLC symbol for an institution, please see Find an OCLC Library.
This index limits results of records to items with a set number of libraries holding that item. The number of libraries that hold an item indicates that more libraries thought it important enough to purchase. It also indicates that it will be more likely to be available by Interlibrary Loan. Similarly limiting results to only those items held by one or very few libraries can indicate what may be unique or rare in a library’s collection.
The index has number codes that can be searched to limit the results to only records that have that number of OCLC libraries holding the item. Only one value can be searched and only by using the SRU relation of “=”. These numbers can not be ranged, so for example it isn’t possible to search “>” 17, but it is possible to get that pre-determined range by searching the code 08.
The search terms that can be searched include the following codes:
| Number of Library Holdings | Search code |
|---|---|
| 5 or more holdings | 05 |
| 10 or more holdings | 06 |
| 50 or more holdings | 07 |
| 100 or more holdings | 08 |
| 500 or more holdings | 09 |
| No holdings | 10 |
| 1 holding only | 11 |
| 2 – 4 holdings | 12 |
| 5 – 9 holdings | 13 |
| 10 – 24 holdings | 14 |
| 25 – 49 holdings | 15 |
| 50 - 74 holdings | 16 |
| 75 – 99 holdings | 17 |
| 100 - 149 holdings | 18 |
| 150 - 199 holdings | 19 |
| 200 - 299 holdings | 20 |
| 300 - 399 holdings | 21 |
| 400 - 499 holdings | 22 |
| 500 - 599 holdings | 23 |
| 600 - 699 holdings | 24 |
| 700 - 799 holdings | 25 |
| 800 - 899 holdings | 26 |
| 900 - 999 holdings | 27 |
| 1,000 - 1,499 holdings | 28 |
| 1,500 - 1,999 holdings | 29 |
| 2,000 - 2,499 holdings | 30 |
| 2,500 or more holdings | 31 |
The data has been normalized so that all spaces and punctuation was removed except periods.
The Material type index searches the record to identify different kinds of items. The complete list of codes can be found here.
While many of these codes are also searchable in the Primary document Type index, not all the codes searched would be the same type of results. The codes that have different meanings between these two indexes are listed here.
art is for Article, chapters, papers, etc. as the primary document type in document type index
acp is for the same type of material, with additional article items, in material type index
art is for 3-d items or artifacts in the material type index
bks is for Books that are primarily books and not articles or internet resources in the document type index
bks is for Books or Text of any kind in the material type index, including articles and internet resources cataloged as text
bnu is for Books that are not internet resources, including additional items, in the material type index
map is for Cartographic Material including maps in the document type index
cmt is for Cartographic material including records with additional cartographic information in the material type index
map is only items cataloged as having an 007 to indicate maps in the record in the material type index
The limit returns records describing digital content contributed to WorldCat from open access digital repositories. Records contain URLs the point to the digital file.
The searchable values are:
The Primary document type is assigning a single document type of the record by determining if the record qualifies as an Internet Resource. If it is not an Internet Resource then it is assigned the document type based on the value in the Leader field of the MARC record.
The searchable values are:
While many of these codes are also searchable in the Material Type index to retrieve any record that is that type of document, not all the codes searched would be the same. See information on the Material Type index above.
The primary language index searches the three-letter codes for languages found in the OCLC Marc Code Lists (ISBN 1-55653-169-9). There is only one primary language code per record, determined by the cataloger in used the three character language code of the 008 of the MARC record.
See information on the Full Service level Langauge index above.
This index is of a field that was originally used for Music numbers and later was expanded to include other Publisher numbers. The data in this field has normalized so that the different ways in which the data could be entered is now more easily searched in a consistent manner. Rules for normalizing the data are given below:
The standard number index includes ISBNs, ISSNs, LCCNs, Universal Product Code, National Bibliographic Agency Control Number, International Standard Recording Code, International Standard Music Number, International Article Number, Serial Item and Contribution Identifier, Standard Technical Report Number, Publisher Number, CODEN, Source of Acquisition, Report Number and Other Standard Identifiers. However, the OCLC number is not part of this index. For all of these identifiers, punctuation is removed with the except that the ISSN and ISBN can be searched either with or without the hyphens.
However, default level index of the Library of Congress control number (LCCN) has many more ways of handling LCCNs then the standard number index. If at all possible it would be best to use the LCCN index for this type of data.
The Title phrase search has the 245/a and 245/b subfields combined into a single phrase search with structure attribute 1. These subfields (245/a, 245/b) can be searched as separate subfields also.
Title index doesn’t include the initial “a”, “an”, “the” or other leading articles. This is because the words that have been indicated to be non-filing terms were not included in the title index. Therefore at this time there are a series of stopwords in the title index to reduce the impact of this. Other terms were added to improve the index and to match the changes made to the way the WorldCat.org title index is working.
The list of title stopwards are:. a, als, am, an, are, as, at, auf, aus, be, but, by, das, dass, de, der, des, dich, dir, du, er, es, for, from, had, have, he, her, his, how, ihr, ihre, ihres, im, in, is, ist, it, kein, la, le, les, mein, mich, mir, mit, of, on, sein, sie, that, the, this, to, un, une, von, was, wer, which, wie, wird, with, yousie, that, the, this, to, un, une, von, was, wer, which, wie, wird, with, you.
The data indexed is the 008 Date1 data. This data has the practice of storing unknown data as “u”. For the index, all “u”s were indexed as zeros, so for example 199u is indexed as 1990. Years that are shorter than four digits have leading zeros added. To search year 999, enter 0999. While a range will go through the entire years indexed, any search that is an unbounded ranges is searching the range between 1000-2030. So when limiting a search to only records published before the year 1900, the limit includes only those items published between the year 1000 and 1899. Plus the year 1000 includes items that have unknown century, entered as 1uuu, which may include the 1900s.
The OCLC Developer Network supports the use of OCLC Web Services—a set of tools and APIs that expose data and services for WorldCat and our member libraries and partner institutions or companies. learn more »
© 2010 OCLC Domestic and international trademarks and/or service marks of OCLC Online Computer Library Center, Inc. and its affiliates
Follow the OCLC Developer Network: