Pears Indexing Classes
An index routine is declared as one of the parameter settings for an index definition within a Pears database description configuration file. It sets up how the OCLC SiteSearch Pears software extracts index terms from input data and how the software acts on that data (e.g., handling punctuation, extracting codes). Index routines create index terms in one of two basic formats: keyword, where each word is its own index entry, or phrase, where the contents of an entire field is an index entry. Pears provides a wide range of index routines for both keyword and phrase indexes. Words and Phrase routines are the most commonly used.
Routines
The following table lists the current index routines used by Pears to extract terms from input data to build indexes:
Routine | Description |
---|---|
Words Routines | |
ORG.oclc.pears.Words |
Extends Phrase and extracts and stores individual terms from fields in a record The following table lists parameters that you can use with the Words routine to more specifically define how it extracts terms to build an index: |
ORG.oclc.pears.PluralWords |
Extends Words to stem plural endings from terms as they are extracted so that only the singular form of the term is stored in the index |
ORG.oclc.pears. |
Ensures that stop words are not stored as terms in an index |
ORG.oclc.pears.SmartWords |
Extends PluralWords to ensure that terms are greater than two characters in length |
ORG.oclc.pears. |
Allows you to declare open and closed boundaries (such as quotation marks) to identify data within a phrase that is to be ignored during the extraction process Note: When using this indexing routine, you must also use the |
Phrase Routines | |
ORG.oclc.pears.Phrase |
Creates simple bound phrases by extracting the contents of a field as a single index term The following table lists parameters that you can use with the Phrase routine to more specifically define how it extracts terms to build an index: |
MARC Routines | |
ORG.oclc.pears. |
Extends Words to find the bibliographic byte in the leader string in a Marc record and generates an index term based on the code that it finds there |
ORG.oclc.pears. |
Extends Words to find the record type and bibliographic bytes in the leader string in a Marc record and generates an index term based on the codes that it finds in those two places The following is a parameter that you can use with the MarcFormat routine to more specifically define how it functions: |
ORG.oclc.pears. |
Extends Words to find the type of material byte in the leader string in a Marc record and generates an index term based upon what it finds there The following is a parameter that you can use with the MarcFormat routine to more specifically define how it functions: |
Number Routines | |
ORG.oclc.pears. |
Extends the Words routine and only extracts digit strings |
ORG.oclc.pears. |
Extends Phrase to convert the LCCard number field in a Marc record into a searchable term |
Date Routines | |
ORG.oclc.pears. |
Extends the Words routine in order to extract and normalize the publication date field in a Marc record |
Language Routines | |
ORG.oclc.pears. |
Works with the HandleChinaMarc record handling routine to change Chinese two-character language codes into their English equivalent search terms |
ORG.oclc.pears. |
Extends Words to convert the Marc three-letter language codes into English equivalent search terms The following is a parameter that you can use with the MarcFormat routine to more specifically define how it functions: |
Miscellaneous Routines | |
ORG.oclc.pears. |
Abstract class that contains base methods for extracting index terms Note: The Phrase routine implements IndexRoutines and all other Pears indexing routines extend Phrase. |
Parameters | |
delimiters=\t\n\r+-=<>(){}[]:;/\\\"!? |
|
extraDelimiters= |
|
removeDelimiters= |
|
minWordLength= |
|
maxWordLength= |
|
maxWords= |
Parameters | |
---|---|
bounds= |
Parameter | Description |
---|---|
Collapse= |
Removes any of the characters in the list from the field |
ExtraTrimChars= |
Adds the list of characters to the default list of trimChars for the current index only |
TrimChars= |
Removes any of the characters on the list form the beginning or end of the field (default set: ' & . , : *) |
MaxLength=< number> |
Shortens the field to the specified number of characters |
StartOffset=< number> |
Ignores the first specified number of characters in the field Note: The offset is performed before any other trim or collapse rules are applied. |
ExtraIndex=< index ID> |
Any terms extracted for this index are also sent to the specified index ID. |
indicator1= |
Requires that indicator1 for this field must have a value from the specified list of characters Note: This can be used only with MARC-like records. |
indicator2= |
Requires that indicator2 for this field must have a value from the specified list of characters Note: This can be used only with MARC-like records. |
indicators= |
Requires that the two indicators must have a vlaue from the specified list of character pairs Note: This can be used only with MARC-like records. |
notIndicator1= |
Inidcator1 for this field must not have a value from the specified list of characters. Note: This can be used only with MARC-like records. |
notIndicator2= |
Indicator2 for this field must not have a value from the specified list of characters. Note: This can be used only with MARC-like records. |
notIndicators= |
Two indicators must not have a value from the specified list of characters. Note: This can be used only with MARC-like records. |
NonFilingIndicator1= |
Value of the first indicator determines the number of characters to remove from the beginning of the field |
NonFilingIndicator2= |
Value of the second indicator determines the number of characters to remove from the beginning of the field |
Example: | Since titles often have a trailing slash that needs to be removed... [title] |
Bib Level Code | Type of Material | Index Term Returned |
a | analytic monograph | analytic |
b | analytic serial | analytic |
m | mongraph | monograph |
s | serial | serial |
c | collection | collection |
d | subunit | subunit |
Example: | [BibLevel] |
Record Type |
Bibliographic Level Code |
Abbreviation | Type of Material |
---|---|---|---|
a, t | m, c, a, d | bks | Books |
e, f | any | map | Maps |
p, b | any | mix | Mixed Materials |
m | any | com | Computer Files |
c, d | any | sco | Scores |
any | s, b | ser | Serials |
i, j | any | rec | Sound Recordings |
g, k, o, r | any | vis | Visual Material |
Parameter | Description |
---|---|
DebugMarcFormat= |
Turns on internal debugging |
Example: | To extract material type from a MARC leader . . . |
Type Code |
Abbreviation | Type of Material |
---|---|---|
a, t | bks | Books |
e, f | map | Maps |
p | mix | Mixed Materials |
m | com | Computer Files |
c, d | sco | Scores |
s | ser | Serials |
i, j | rec | Sound Recordings |
g, k, o, r | vis | Visual Materials |
Parameter | Description |
---|---|
DebugMarcTypeOfMaterial= |
Turns on internal debugging |
Example: | To extract material type from MARC 006 . . . |
Parameter | Description |
---|---|
DebugMarcLanguage= |
Turns on internal debugging |
Example: | To extract language from the MARC 008 field . . . |