International Cataloging:Use
Non-Latin Scripts
Judy Barnes
Last revised January 2008
Connexion client international cataloging
Scripts and languages supported
Connexion client supports the following non-Latin scripts for cataloging items in languages that use the scripts:
| Script |
Examples of supported languages |
| Arabic |
Arabic, Persian Urdu, Azerbaijani |
| Bengali |
Bangla, Assamese |
| Chinese |
Chinese |
| Cyrillic |
Russian, Bulgarian, Serbian, Ukrainian |
| Devanagari |
Hindi, Marathi, Sanskrit, Nepali, Sherpa |
| Greek |
Greek |
| Hebrew |
Hebrew |
| Japanese |
Japanese |
| Korean |
Korean |
| Tamil |
Tamil |
| Thai |
Thai |
Note: You can include more than one non-Latin script anywhere in a record, including within the same field.
Valid character sets for supported scripts
Arabic, CJK, Cyrillic, Greek, and Hebrew
Character sets for these scripts given in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media on the Library of Congress Web site at: http://www.loc.gov/marc/specifications/spechome.html define the scope of valid characters in Connexion client. The MARC-8 character set is the subset of Unicode characters approved for use in MARC 21 cataloging.
The following list defines the scope of valid characters in the Connexion client for Arabic (including Persian), CJK, Cyrillic, Greek, and Hebrew scripts:
- 33(hex) [ASCII graphic: 3] Basic Arabic
- 34(hex) [ASCII graphic: 4] Extended Arabic
- 31(hex) [ASCII graphic: 1] Chinese, Japanese, Korean (EACC)
- 4E(hex) [ASCII graphic: N] = Basic Cyrillic
- 51(hex) [ASCII graphic: Q] = Extended Cyrillic
- 53(hex) [ASCII graphic: S] = Basic Greek
- 32(hex) [ASCII graphic: 2] = Basic Hebrew
Note: The client inserts the notation (3, (4, $1, (N, (Q, (S, or (2, respectively, into field 066 to indicate which script(s) are used in a record. If multiple scripts are used, the notations are inserted individually, each in a separate subfield c.
Bengali, Devanagari, Tamil, and Thai
There are no MARC-8 character sets for Bengali, Devanagari, Tamil, or Thai. OCLC implemented the following script identification codes for these scripts based on ISO 15924 Code Lists (http://www.unicode.org/iso15924/codelists.html).
The following list shows the ranges of UTF-8 Unicode characters that define valid characters for these scripts in the Connexion client:
- Beng = Bengali (character range U+0980 to U+09FF)
- Deva = Devanagari (character range U+0900 to U+097F)
- Taml = Tamil (character range U+0B80 to U+0BFF)
- Thai = Thai (character range U+0E00 to U+0E7F)
Note: The client inserts Beng, Deva, Taml, or Thai, respectively, in field 066 of a record to indicate that the script in used. If multiple scripts are used, the notations are inserted individually, each in a separate subfield c.
Invalid characters in Connexion client
Any characters that are not included in the above lists of defined characters or that
cannot be inserted via Edit > Enter Diacritics (or
or <Ctrl><E>) are invalid in the client.To include non-Latin characters that you need but that are invalid in Connexion client, you can:
- Enter the character in the record, export the record to your local system using Unicode export format, and then remove the character before processing the record in WorldCat.
Or
- Enter the name of the character within square brackets, using the Unicode standard if available, (for example, enter [schwa]), or for CJK characters, enter the reading of the character (for example, enter [yin]).
For reference, see, for example, the Unicode charts Web page at http://www.unicode.org/charts/, which has a character name index.
Limitations on using non-Latin scripts
- You must export and import the records in Unicode format. Change export format from MARC-8 to UTF-8 Unicode in Tools > Options > Export, Record Characteristics button. Select import format in File > Import Records, Record Characteristics button.
- Bengali, Devanagari, Tamil, and Thai scripts are not supported for the following:
- Connexion client MARC-8 export
- MARC Subscription
- Local Data Creation
Note: Z39.50 access to WorldCat records now supports MARC-8 and Unicode UTF-8 character sets. See information on non-Latin scripts support in Z39.50 in the documentation on the OCLC Web site at: http://www.oclc.org/support/documentation/z3950/searchtips#5.
Multiscripts in a single record are valid
Use as many supported non-Latin scripts as you need anywhere in a record, including within the same field.
Guidelines for contributing non-Latin script records to WorldCat
Records added to WorldCat must meet MARC standards, no matter what type of scripts you use to enter the data. You must catalog according to AACR2 practices. For more information, see OCLC Bibliographic Formats and Standards on the OCLC Web site at: < http://www.oclc.org/bibformats/en/about/ >.
For quick and easy reference, open a detailed Bibliographic Formats description of any field directly from Connexion client:
| Action |
Place the cursor in a field and click Help > MARC Field Help (or click orpress <Shift><F1>). Or Right-click in the field, and on the pop-up menu, click MARC Field Help. |
Romanized data. If you provide romanized (Latin-script-equivalent) data, romanization should follow guidelines in the ALA - LC Romanization Tables on the Library of Congress Web site at: < http://www.loc.gov/catdir/cpso/roman.html >.
How the client manages non-Latin script data in records
For cataloging items in languages that use non-Latin scripts, create records that contain:
- Non-Latin script data only (or multi-non-Latin scripts if needed, one script per field)
- Latin-script equivalent data only
Or
- Both non-Latin and Latin scripts
If you include both, the client provides (or you create) paired fields that have the same tag number. The top field of the pair is for non-Latin-script data followed by a field for the corresponding romanized data.
For machine-processing purposes, non-Latin script fields are stored internally in MARC format 880 fields.
How paired fields work:
- Automatic linking. The client automatically links two fields when a non-Latin script field is followed by a Latin-script field (romanized, or Latin-script-equivalent, data) that displays the same tag number. The links are added when you reformat, save, or take a final OCLC action on the record.
- Link and Unlink commands. Whether paired fields are present or absent in a record, you can link or unlink fields using Edit > Linking Fields > Link Fields or Unlink Fields (or <Alt><E><K><L> or <Alt><E><K><U>, respectively). The client displays linked fields with a connecting bracket in the left margin.
Caution: If Latin script and non-Latin script parallel fields are not linked, display of the non-Latin script in records downloaded to your local system may be affected. You can set an option to get a warning before the client exports records with parallel unlinked non-Latin script fields in Tools > Options > Export.
- Field 245. If no romanized data appears in field 245, the client automatically adds:
- A 245 field containing the following three filler characters: < > . (less than bracket, greater than bracket, and a period)
- A 500 field with the text Non-Latin script record
- ISBN. If you enter an ISBN into a non-Latin script 020 field without adding it to the paired Latin script 020 field, the client automatically copies the data to the Latin-script 020 field.
- Automatic 066. When you validate, reformat, save, or take a final OCLC action (interact with the OCLC system) a non-Latin script record, the client automatically adds the 066 field with the following data in ‡ c to indicate which character set(s) the record contains:
- (3 for basic Arabic
- (4 for extended Arabic
- Beng for Bengali
- $1 for CJK
- (N for basic Cyrillic
- (Q for extended Cyrillic
- Deva for Devanagari
- (S for Greek
- (2 for Hebrew
- Taml for Tamil
- Thai for Thai
Input methods for languages that use non-Latin scripts
If the default language of your workstation is not the language you need for non-Latin script cataloging, or if you do not already have an input method for the language set up on your workstation, you can install input languages and methods in Windows or install an RLIN21 keyboard for some scripts. Window provides input methods for the languages that use the scripts supported in Connexion client.
- The procedure for installing a language and the corresponding input keyboard or Input Method Editor (IME) is slightly different, depending on which version of Windows you use.
- In Windows 2000, open the Windows Start menu and click Settings > Control Panel > Regional Options or Regional and Language Options. See more information and instructions on the Microsoft Web site: Enabling International Support in Windows 2000 at: http://www.microsoft.com/globaldev/handson/user/2kintlsupp.mspx.
- In Windows XP, on the Start menu, click Control Panel > Date, Time, Language, and Regional Options. See more information and instructions: Enabling International Support in Windows XP.... at: http://www.microsoft.com/globaldev/handson/user/xpintlsupp.mspx.
Caution: Administrator-level privileges for workstation. Generally, you need administrator-level privileges, and you need to either insert your Windows operating system CD-ROM or access the system files from your network, to complete the installation. Contact your system administrator if you need help.
- When English is the default language on your workstation and you install another language, Windows automatically provides the associated input keyboard or IME. You use the keyboard or IME to enter the non-Latin script.
- After installing languages and input keyboards or IMEs, a language indicator
appears in the Windows system tray (bottom right corner of the desktop) containing all languages made available on your workstation.
Use the language indicator to select an input language:
- Click the language indicator
to expand a list, and then click to select a language.
Or
- Press the Windows default keystroke shortcut <Left Alt><Shift> to toggle through the input languages without opening the list.
Note: The language indicator represents languages with two letter codes: EN (English), AR (Arabic), JP (Japanese), etc.
Or
Assign keystrokes in Windows to toggle among languages or switch to a specific language. For example, in Windows 2000, go to Start > Settings > Control Panel. Click the Input Locales tab, click Change, then click Key Settings.
If you install Microsoft Windows Regional options for Chinese, Japanese, and/or Korean language support on your workstation, Windows itself provides appropriate Input Method Editors (IMEs) for entering CJK characters.
- For general information on Windows non-Latin script entry tools, consult Help in your version of Windows (Start > Help), or see pages on the Microsoft Web site such as:
- What is an IME (Input Method Editor) and how do I use it? at: http://www.microsoft.com/globaldev/handson/user/IME_Paper.mspx
- Input Language: Keyboards and IMEs at: http://www.microsoft.com/globaldev/getWR/steps/WRG_kybrd.mspx
- Alternative RLIN21 keyboards. For information on alternative keyboards available for some scripts, see "Install RLIN21 Arabic, Cyrillic, Hebrew or Latin keyboard" in this guide.
Change the language of the client interface
The client prompts you to select an interface language when you open the client for the first time after installing or when you create a new user profile.
Or
You can change the language of the interface anytime by changing an option:
| |
Action |
| 1 |
On the Tools menu, click Options (or press <Alt><T><O>), and then click the International tab. |
| 2 |
In the Interface Language list, select one of the following available languages:
- Chinese (Simplified)
- Chinese (Traditional)
- English (default)
- German
- Japanese
- Korean
- Spanish
Note: To display the Chinese, Japanese, or Korean interface, you must have an input method for the language installed on your workstation or a Chinese, Japanese, or Korean language version of Windows. |
| 3 |
Click OK (closes the Options window) or Apply (keeps the window open) to change to the language you selected. Or Click Cancel to close the Options window without changing the interface language. Result when you change the language:
- Text in the client interface displays immediately in the selected language.
- The online client Help is provided in English only.
|
Summary of general international features
- Access to records with non-Latin scripts. Catalogers who use the Connexion client can view, create, edit, and take actions on records with supported non-Latin script data. Non-Latin scripts are not visible in WorldCat records in the Connexion browser interface unless they are saved to the online save file using the Connexion client. Save file records containing non-Latin scripts are read-only in the browser.
- Details for using non-Latin scripts:
- See details in "Use non-Latin scripts for cataloging."
- See separate topics in this Guide about using each specific non-Latin script.
- Interface language of the client. Select one of the following languages for the client interface besides English, which is the default:
- Chinese (Simplified)
- Chinese (Traditional)
- German
- Japanese
- Korean
- Spanish
Select the interface language when you open the client for the first time installing, when you create a new user profile, or anytime in Tools > Options > International.
Note: If you select a language other than English, only the text of the client interface is in the selected language. The online client Help is provided in English only.
- Chinese name authority file. Search the Chinese name authority file (Authorities > Search > Chinese Name Authority File).
- The Joint University Librarians Advisory Committee (JULAC) of Hong Kong creates and maintains the Chinese name authority file.
- Anyone using Connexion client with an existing authorization/password can access Chinese name authority records in this file.
- Use keyword or numeric searches in either the command line (enter full search syntax) or the guided keyword/numeric search area (enter or select search components).
- Access to Chinese name authority records is read-only. You may copy or print only.
See topics on searching the Chinese name authority file in Authorities, Search Authority Files for details.
Tools specifically for using non-Latin scripts
The client provides the following tools to help you catalog using non-Latin scripts:
- Export and import character set option - choose MARC-8 (default) or UTF-8 Unicode (Unicode is required for Bengali, Devanagari, Tami,l and Thai).
- Export options for data fields - determine types, position, and sort order of Latin and non-Latin script data for exported records.
- MARC-8 character verification - verify MARC-8 characters separately from record validation (this function is inappropriate for checking Bengali, Devanagari, Tamil, and Thai characters, since they are not included in MARC-8 characters sets; Bengali, Devanagari, Tamil, and Thai characters are verified during record validation).
- Field linking/unlinking - visually link or unlink non-Latin script data fields with equivalent romanized data fields.
- Arabic and Persian transliteration - two ways to automatically transliterate existing romanized data (Latin script representation of non-Latin script) into Arabic script data for Arabic and Persian records.
- Data alignment for displaying and printing Arabic and Hebrew script data.
- Unicode formatting control characters to support correct display of bidirectional data in Arabic and Hebrew script records (use right-click menu).
- CJK E-Dictionary - helps with character selection by providing comprehensive information about CJK characters supported in the client.
- Conversion of invalid CJK to MARC-8 - automatically convert invalid CJK characters to equivalent MARC-8-compliant characters.
Non-Latin script records in Connexion browser
- WorldCat records. Non-Latin script data in a WorldCat record does not display when you open the record using the Connexion browser. The message Non-Latin script suppressed displays in the upper right of the record You cannot lock or replace these records.
- Online save file records. If you save a record containing non-Latin scripts to the online save file using the Connexion client and then open the record in the Connexion browser:
- The record opens in display mode only with a warning at the top: This record contains non-Latin script data and cannot be edited using this interface. You cannot edit or take final actions on the record.
- All non-Latin script data is displayed in 880 fields at the end of the record.
- You can view the record, print it, or copy and paste data from it.
- You cannot flag records containing non-Latin scripts.
back to top
Use non-Latin scripts for cataloging
Basic cataloging functions and non-Latin scripts
This topic describes how the Connexion client supports Arabic, Bengali, CJK, Cyrillic, Devanagari, Greek, Hebrew, Tamil, and Thai scripts by specific cataloging function.
This topic contains the following kinds of information:
- Procedures for using client tools and options specifically developed for non-Latin script records
- Any special parameters for using non-Latin script data with existing client functionality
Search WorldCat
Entering searches
- Search for records containing non-Latin script data using either script search terms or romanized (Latin-script equivalent) search terms.
- Both interactive and batch searching support non-Latin script search terms (Cataloging > Search > WorldCat and Batch > Enter Bibliographic Search Keys).
- Alternatively, copy and paste non-Latin script data into client searches from sources external to the client. Non-Latin script search terms must be based on Unicode. However, only Unicode characters that can be converted to MARC-8-equivalent characters are valid in WorldCat. If Unicode characters that are not convertible are in the search term, you may find no matching records.
- About using search indexes for non-Latin script search terms:
- If you want to retrieve all records or see sample records containing a particular script, use the "character sets present" WorldCat search index (label vp:) with the assigned code for a script:
| Script |
Code for script |
Enter search as ... |
| Arabic |
ara |
vp:ara |
| Bengali |
ben |
vp:ben |
| Chinese, Japanese, and Korean |
cjk |
vp:cjk |
| Cyrillic |
cyr |
vp:cyr |
| Devanagari |
dev |
vp:dev |
| Greek |
gre |
vp:gre |
| Hebrew |
hbr |
vp:hbr |
| Tamil |
tam |
vp:tam |
| Thai |
tha |
vp:tha |
To enter one of the searches above to retrieve all records that contain a specified script, use the command line in the Search WorldCat window (Cataloging > Search > WorldCat).
Note: If a search for a particular script alone retrieves too many WorldCat records (limit 1,500 records), you must limit the search and try again. (See more about how the client displays WorldCat search results in Cataloging, Search WorldCat, "Use WorldCat search results.")
Examples:
vp:ara/1991-2 (search for Arabic script records limited to those published in 1991 and 1992)
vp:ara and la:per (search for Arabic script records limited to those describing Persian language items)
See more about word and phrase searching and search methods in general in Cataloging/Search WorldCat:
- "Search WorldCat interactively"
- "Keyword, numeric, and derived search syntax" (derived search is unavailable for non-Latin script data)
- "Browse WorldCat"
- "Customize WorldCat search and browse interfaces"
- "Enter WorldCat searches for batch processing"
Sort order of search results
You can select how the results of non-Latin script WorldCat searches are sorted:
- Alphabetically by the Latin script data
Or
- In Unicode order by the non-Latin script data
To check or change the option for sort order for WorldCat search results:
| |
Action |
| 1 |
On the Tools menu, click Options (or press <Alt><T><O>), and then click the International tab. |
| 2 |
Click the Primary Sort by Latin Script check box to select or clear the option to sort search results in alphabetical order by the Latin script data. Default: Check box selected. Search results sort alphabetically by Latin script data. Result:
- If you clear the check box, search results are sorted in Unicode order by the non-Latin-script data.
- The sort order selected also determines the sort order of local bibliographic save file and local constant data search results.
- Tamil Unicode 4.0 codes are not in collating order. The default, alphabetical sorting by Latin script, is recommended if romanized (Latin-equivalent) data is included in the record with Tamil script data.
|
| 3 |
When finished, click Close, or press <Enter> to apply the settings and close the Options window. Or Click Apply to apply the settings without closing the window. |
Display non-Latin scripts in records and lists
In records that have paired non-Latin and Latin-equivalent script fields, the non-Latin script field appears first in the pair.
Arabic and Hebrew script data, by default, displays (and prints) right to left (View > Align Right). See "Catalog using Arabic script" or "Catalog using Hebrew script" for information about using Unicode formatting control characters to ensure correct display of bidirectional data in Arabic and Hebrew script records (use the right-click menu).
Create records
Use as many supported non-Latin scripts as you need anywhere in a record, including within the same field.
Workforms
For creating bibliographic records and/or constant data with non-Latin scripts using workforms (Cataloging > Create > Single Record > [format] or Constant Data), you can set an option to display the workforms with paired fields:
| |
Action |
| 1 |
On the Tools menu, click Options (or press <Alt><T><O>), and then click the International tab. |
| 2 |
Click the Include paired fields in workforms for multi-script data check box. Default: Check box cleared. Workforms contain single fields. Results:
- The workform opens with paired fields 1XX, 245, 246, 250, 260, 300, 4XX, 5XX, 6XX, 8XX for non-Latin script entry (X = any valid tag number).
- Each pair has identical tags.
- The first of a paired field is for non-Latin script data.
|
| 3 |
When finished, click Close, or press <Enter> to apply the settings and close the Options window. Or Click Apply to apply the settings without closing the window. |
Setting the option for paired fields is not required. You can enter non-Latin script data only or romanized (Latin-script equivalent) data only. Or create your own paired fields and enter both:
| |
Action |
| 1 |
In a record, add a field and enter the same tag as the corresponding existing field. |
| 2 |
Enter the non-Latin script data in the first field of the pair, and optionally, enter the romanized data in the second field. |
| 3 |
To link the fields, on the Edit menu, click Linking Fields > Link Field, or press <Alt><E><K><L>. Or Let the client link them automatically when you validate, reformat, save, or take a final OCLC action on the record. |
Caution: If Latin script and non-Latin script parallel fields are not linked, display of records downloaded to your local system may be affected. You can set an option to get a warning before the client exports records with parallel unlinked non-Latin script fields in Tools > Options > Export.
Derive records
When you use Edit > Derive > New Master Record, New Institution Record, or New Constant Data to create records from existing records that have linked fields for non-Latin scripts, the client transfers the linked fields as pairs for each field selected in Tools > Options > Derive Record.
Note: Although the 066 field cannot be transferred, the client adds the field automatically to indicate the presence and type of non-Latin script when you validate, reformat, save, or take a final OCLC action on the record.
See also general information about how to create bibliographic records in Cataloging/Create Bibliographic Records, "About creating bibliographic records"
Edit records
Editing functions supported
- Find and Replace (Edit menu). Enter non-Latin script in both the Find What box and the Replace With boxes of the Find/Replace window.
- Cut, copy, and paste (Edit menu). Cut, copy, and paste non-Latin script data.
- When you move one of a paired field, the other field moves automatically (Edit > Move Field > Up or Down).
- Validate records and characters (Edit menu)
- Validates non-Latin script characters against supported MARC-8 character sets, as well as validating MARC structure and tags.
Also validates Bengali, Devanagari, Thai, and Tamil characters coded in Unicode 4.0 (not covered in MARC-8 character sets).
If the client finds an invalid character, an error message lists the tag and position of the character to help you find it, along with other errors found in the record.
The error message may give up to three positions per field for invalid characters.
After correcting characters, you may want to validate the record again.
- Automatically adds field 066 indicating the presence and type of non-Latin script.
- Validate characters only. Validate characters separately from record validation (available for supported MARC-8 character sets only; unavailable for Bengali, Devanagari, Tamil, and Thai characters). Use Edit > MARC-8 Characters > Verify.
Note for CJK: You can automatically convert invalid CJK characters to equivalent MARC-8 characters. Use Edit > MARC-8 Characters > Convert to MARC-8 CJK.
- Reformat (Edit menu). Rearranges fields in MARC tag order, including:
- Displaying paired fields together, with the non-Latin script field on top.
- Automatically adding field 066, if not already added, with text that indicates the presence and type of script.
- Text strings (Tools menu). Use non-Latin scripts to create or edit text strings. See more about text strings in Basics, Set Options and Customize, "Create custom text strings."
Controlling headings unsupported
- Controlled headings in bibliographic records are linked directly to the controlling authority record so that if the authority record changes, the heading is automatically updated in the bibliographic record also.
- For non-Latin script records, however, only a heading in the Latin script field can be controlled (and subsequently updated if the authority record changes). If the controlled heading is updated, you may need to update the corresponding non-Latin script field manually to match the update.
Verify MARC-8 characters
To check the validity of characters separately from the validate records function (Edit > Validate or <Shift><F5>):
| |
Action |
| 1 |
On the Edit menu, click MARC-8 Characters > Verify, or press <Alt><E><8><V>.
Results:
- The client changes the color of invalid characters to red by default (or a color you specify in Tools > Options > Record Display).
- If no invalid characters are present, you get a message that verification is completed. Click OK or press <Enter> to close.
Note: MARC-8 verification is inappropriate for Bengali, Devanagari, Tamil, and Thai characters, which are not covered in MARC-8 character sets. Bengali, Devanagari, Tamil, and Thai characters are verified when you validate records.
Tip: If the client identifies invalid CJK characters, you can use an automatic converter to convert them to equivalent MARC-8-compliant characters. (See "Catalog using Chinese, Japanese, and Korean (CJK) scripts." |
| 2 |
To remove invalid character display (display all text in the default text color or the color you selected in Tools > Options > Record Display),click MARC-8 Characters > Clear, or press <Alt><E><8><C>.
See Tools > Options > Record Display for color options. Default color for invalid characters: red. |
Tip: If non-Latin characters that you need (other than Bengali, Devanagari, Tamil, and Thai) are not in any MARC-8 approved character sets for MARC 21 cataloging:
- Enter the character in the record, export the record to your local system using Unicode export format, and then remove the character before processing the record in WorldCat.
Or
- Enter the name of the character within square brackets, using the Unicode standard if available, (for example, enter [schwa]), or for CJK characters, enter the reading of the character (for example, enter [yin]).
For reference, see, for example, the Unicode charts Web page, which has a name index, at < http://www.unicode.org/charts/ >.
Link/unlink paired non-Latin/Latin script fields
The client automatically links two non-Latin script/Latin script fields that have the same tag number when you validate, reformat, save, or take an action on the record. The client always treats a non-Latin script field as the first of a corresponding pair.
You can link or unlink two non-Latin/Latin script fields with the same tag number:
| Action |
To link fields:
Place the cursor in the first field of a set of paired fields (the non-Latin data field), and on the Edit menu, click Linking Fields > Link Fields, or press <Alt><E><K><L> to link all paired fields. Or Right-click, and on the popup shortcut menu, click Link Fields to link the two fields where the cursor is located.
To unlink fields:
Click Linking Fields > Unlink Fields, or press <Alt><E><K><U> (unlinks all linked fields). Or Right-click, and on the popup shortcut menu, click Unlink Fields (unlinks the pair of fields where the cursor is located). |
When you link fields:
- The client uses a bracket to display linked fields, as in the following example showing part of a CJK record:

- Printouts of records retain the brackets to indicate linked fields.
- If you modify the tag of one of the linked fields, the tag for the other field changes, too.
- If the cursor is in a linked field when you add a new field, the new field is added above or below the set of linked fields. Linked fields cannot be separated.
- Moving a linked field moves the set of linked fields.
- If you delete one field in a linked field set, the client keeps the other field and removes the link indicator (bracket).
Caution: If Latin script and non-Latin script parallel fields are not linked, display of the non-Latin script in records downloaded to your local system may be affected. You can set an option to get a warning before the client exports records with unlinked non-Latin script fields in Tools > Options > Export.
Caution: If Latin script and non-Latin script parallel fields are not linked, display of records downloaded to your local system may be affected.
Align Arabic and Hebrew data
By default, the client displays (and prints) Arabic or Hebrew script data in Arabic, Persian, and Hebrew records aligned to the right. Toggle alignment for these scripts using View > Align Right. See more about aligning Arabic and Hebrew script data in these topics: "Arabic cataloging" or "Hebrew cataloging."
Use Unicode formatting characters for bidirectional Arabic and Hebrew data
Valid left-to-right character strings (multiple digit numbers and punctuation) appear mixed in with right-to-left script data in Arabic, Persian, and Hebrew records. To ensure that this bidirectional data displays correctly, use Unicode formatting control characters.
The formatting control characters distinguish how to display mixed left-to-right and right-to-left data in an Arabic or Hebrew field. To insert a control character, right-click in a field, and on the pop-up menu click Insert Unicode Control Character. Then click a character.
For details, see the Bidirectional Algorithm report on the Unicode Web sit at:
http://unicode.org/reports/tr9.
See also "Cataloging using Arabic script" or "Cataloging using Hebrew script" in this booklet.
Use CJK E-Dictionary
Use the CJK E-Dictionary (electronic dictionary) on the Tools menu to search or browse for information about:
- A single CJK character
- A group of related characters
- Homophones matching a phonetic input code
- A large set of characters in sequence by East Asian Character Code (EACC) or by Unicode value
The CJK E-Dictionary provides details on all CJK characters supported in Connexion client (see separate section, "Use CJK E-Dictionary").
Transliterate Arabic
For existing Arabic records that contain only romanized data (Latin-script-equivalent representation of the Arabic script), the client provides two ways to automatically convert and add the equivalent romanized data (see "Catalog using Arabic scripts").
Use constant data
- Use non-Latin scripts in workforms to create bibliographic constant data records Cataloging > Create > Constant Data) or derive new constant data records from existing records (Edit > Derive > New Constant Data) in the same way as described above for creating and deriving bibliographic records.
- For constant data in the local file only, you can use non-Latin scripts to name constant data records and search for them by non-Latin script name.
- Also for constant data records in the local file only, you can use non-Latin scripts to specify My Status for records (Action > Set Status) and search for non-Latin script My Statuses. (My Status is an optional free-text status that you add to records to help distinguish them.)
- You can change the sort order for results of searching the local constant data file to Unicode sorting by non-Latin script (default sort order: alphabetical sorting by Latin script). The same setting also determines sort order for WorldCat search results and local save file search results.
- When you apply constant data containing non-Latin script, the client:
- Replaces a non-repeatable field in the bibliographic record with the paired non-Latin script 880 and equivalent romanized field from the constant data.
Or
- Adds the paired fields from the constant data below existing repeatable fields.
See general procedures for creating, applying, finding, and using bibliographic constant data in Cataloging/Use Bibliographic Constant Data.
Save records and search save files
- Using the Connexion client only:
- You can use non-Latin scripts to specify My Status for records in the online or local save file (Action > Set Status) and search for non-Latin script My Statuses. (My Status is an optional free-text status that you add to records to help distinguish them.)
- Local file indexes for non-Latin script data. Use non-Latin scripts to search the following indexes for records in the local save file:
- Name
Index includes non-Latin script fields associated with fields 100, 111, 130, 700, 710, 711, 730 (all subfields).
- Title
Index includes non-Latin script fields associated with fields/subfields:
- 245 a b f g k n p
- 246 a b f g n p
- My Status
- Change the sort order for results of searching the local save file to Unicode sorting by non-Latin script (default sort order: alphabetical by Latin script). The same setting also determines sort order for WorldCat search results and local constant data search results.
- When you save a record online or locally, the client automatically adds field 066 indicating the presence and type of non-Latin script.
See general procedures for saving, finding, and using bibliographic save file records in Cataloging/Save Bibliographic Records.
Report errors in non-Latin script records
You can use non-Latin script text in the message box of the Report Error window to report errors in non-Latin script records (Action > Report Error).
Export records
Select options for exported data
Select the type and location of Latin script versus non-Latin script data in exported records:
| |
Action |
| 1 |
On the Tools menu, click Options (or press <Alt><T><O>), and then click the International tab. |
| 2 |
Under Export, click one of the following check boxes:
- Include all data, with other scripts in 880 fields (default)
- Include all data, with Latin script in 880 fields
- Include Latin script only (deletes field 066 in record)
- Include other scripts only
|
| 3 |
When finished, click Close, or press <Enter> to apply the settings and close the Options window. Or Click Apply to apply the settings without closing the window. |
Select character set
- Select the MARC-8 (default) set to export Arabic, CJK, Cyrillic, Greek, or Hebrew script records.
Note: If non-MARC-8 scripts are exported in MARC-8 data format, the non-MARC-8 characters are saved in Numeric Character Reference (NCR) format.
- Required. You must select UTF-8 Unicode to export Bengali, Devanagari, Tamil, and Thai script records. These scripts are not covered in MARC-8 character sets.
To select a character set for exporting records:
| |
Action |
| 1 |
On the Tools menu, click Options (or press <Alt><T><O>), and then click the Export tab. |
| 2 |
Click Record Characteristics. |
| 3 |
Under Bibliographic Records, select one of the following from the Character Set list:
- MARC-8 (default)
- UTF-8 Unicode
Note: You can also select a record standard for bibliographic records. Select MARC 21 or Dublin Core Qualified (XML). |
| 4 |
Click OK, or press <Enter> to save your settings and close the window, or click Cancel to cancel changes.
You are returned to the Export page in the Options window.
When finished, click Close to close the Options window. |
Select fields to delete from exported records
| |
Action |
| 1 |
In the Export Options tab (Tools > Options > Export), click Field Export Options. |
| 2 |
Under Fields to Delete, in the Bibliographic Records text box, enter tag numbers for fields you want to delete in exported bibliograhic records.
Separate numbers by a comma and a space, or use a hyphen to show a range.
Example: 920, 938-999 |
| 3 |
Repeat step 2 in the Authority Records text box if needed for exporting authority records. |
| 4 |
When finished, click OK, or press <Enter> to save your settings and close the window, or click Cancel to cancel changes.
You are returned to the Export page in the Options window.
When ready, click Close to close the Options window. |
Set warning before exporting records with parallel unlinked non-Latin script fields
If Latin script and non-Latin script parallel fields are not linked, display of the non-Latin script in records downloaded to your local system may be affected.
Set an option to get a warning when exporting:before the client exports records with unlinked non-Latin script fields:
| Action |
| In the Export Options tab (Tools > Options > Export), click to select the check box next to Warn before exporting bibliographic records that include unlinked non-Latin script fields. |
See general instructions for exporting records in Cataloging/Export Bibliographic Records.
Import records
- Select the MARC-8 (default) or UTF-8 Unicode character set to import Arabic, CJK, Cyrillic, Greek, or Hebrew script records, based on the format of your local records.
- Required. You must select UTF-8 Unicode to import Bengali, Devanagari, Tamil, and Thai script records. These scripts are not covered in MARC-8 character sets.
To select a character set for importing records:
| |
Action |
| 1 |
On the File menu, click Import Records (or press <Alt><F><I>). |
| 2 |
In the Import Records window, specify the import file and destination, and then click Record Characteristics. |
| 3 |
Under Bibliographic Records, select one of the following from the Character Sets list:Note: Select Unicode for Bengali, Devanagari, Tamil, and Thai scripts.
- MARC-8 (default)
- UTF-8 Unicode
|
| 4 |
Click OK or press <Enter> to save your settings and close the window, or click Cancel to cancel changes. You are returned to the Import Records window. |
See general procedures in Cataloging/Import Bibliographic Records.
Using non-Latin script data in macros
You can incorporate non-Latin script data in Connexion client macros using SetFieldUnicode, SetFieldLineUnicode, and GetListCellDataUnicode. Data is converted to Numeric Character Reference (NCR) format.
See detailed descriptions of these macros in Basics/Use Macros, "Connexion client macro commands: Edit records."
back to top
Add non-Latin script variant name headings in authority records
About using non-Latin scripts for variant name headings in LC authority file records
The Library of Congress and other major authority record exchange partners—British Library, National Library of Medicine, and OCLC, in consultation with the Library and Archives Canada—are implementing the use of non-Latin scripts in records for Name Authority Cooperative Program (NACO) contribution/distribution processes.
When completed (not before April 2008), you will be able to use non-Latin script variant forms of name headings in:
- Fields 4XX and 7XX
- Various note fields (for example, 67X)
NACO participants can add non-Latin script data to master LC name authority records. Non-NACO catalogers can add non-Latin scripts and export records for local use. Only MARC-8 character sets for the following scripts will be supported:
- Arabic (including the Persian language)
- Chinese
- Cyrillic
- Greek
- Hebrew (including Yiddish)
- Japanese
- Korean
In addition to creating, editing, and saving non-Latin script data in authority records, searching, browsing, displaying, saving, applying constant data, and exporting is available for authority records containing non-Latin script records.
Versions 2.10 and higher of the client will support using non-Latin scripts for variant heading and notes fields as soon as this functionality becomes available. The actual date of availability will be no earlier than April 2008. OCLC will announce more details beforehand.
Details
- The Latin script or romanized form of a heading in field 1XX will continue to be the authorized heading.
- The LC authority file will not have paired records for Latin script and non-Latin script forms of name headings for the same entity.
- NACO contributors will follow MARC 21's "Model B" for multiscript records. Model B provides for unlinked non-Latin script fields, such as authority record 4XX fields, that have the same MARC tags used for Latin script data.
- Using Model B for authorities is a departure from the current bibliographic record practice of many Anglo-American Cataloging libraries, where non-Latin characters are exported as linked 880 fields (Alternate Graphic Representation) using MARC 21's "Model A" for multiscript records.
- Although Connexion client supports Bengali, Devanagari, Tamil, and Thai for use in bibliographic records, character sets for these scripts will not be supported for authority records.
See more information on the Library of Congress Web site at: http://www.loc.gov/catdir/cpso/nonroman_announce.pdf.
Character sets supported
MARC-8 character sets for non-Latin scripts to be available for references in authority records are listed in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Code Tables (see http://www.loc.gov/marc/specifications/specchartables.html).
The following supported character sets are subsets of UTF-8 Unicode that are approved for use in MARC 21 cataloging:
- Basic Arabic = 33 (hex)
- Extended Arabic = 34 (hex)
- Chinese, Japanese, Korean (EACC) = 31 (hex)
- Basic Cyrillic = 4E (hex)
- Extended Cyrillic = 51 (hex)
- Basic Greek = 53 (hex)
- Basic Hebrew = 32 (hex)
Existing non-Latin script suppor in Connexion client for bibliographic records: What applies to authority records?
Much of the client Help information about using non-Latin scripts for bibliographic records also applies to using supported scripts for authority records; for example, see more about input methods for languages that use non-Latin scripts in "Connexion client international cataloging."
Functions that are the same for authority records
Functions that apply to both bibliographic and authority records include:
- A single record and a single field in a record can have multiple non-Latin scripts, as in bibliographic records.
- Data alignment for displaying and printing Arabic and Hebrew script data (View > Align Right).
- MARC-8 character verification (verification separate from record validation) (Edit > MARC-8 Characters > Verify [or Clear].
- Conversion of invalid CJK to MARC-8 (Edit > MARC-8 Characters > Convert to MARC-8 CJK).
- Manual transliteration of romanized data to Arabic or Persian using Edit > Transliterate > Arabic [or Persian].
Note: Automatic transliteration (set by an option) is not available for name authority records.
- Unicode formatting control characters to support correct display of bidirectional data in Arabic and Hebrew script records (right-click in a field and click Insert Unicode Control Character to choose a control character).
- CJK E-Dictionary, which helps with character selection by providing comprehensive information about CJK characters supported in the client (Tools > CJK E-Dictionary).
- Use non-Latin script search terms in all LC authority file search indexes to retrieve a particular record or set of records.
- If you want to retrieve all records or see sample records containing a particular script, use the "character sets present" search index (label vp:) with the assigned code for a script:
| Script |
Code for script |
Enter search as ... |
| Arabic |
ara |
vp:ara |
| Chinese, Japanese, and Korean |
cjk |
vp:cjk |
| Cyrillic |
cyr |
vp:cyr |
| Greek |
gre |
vp:gre |
| Hebrew |
hbr |
vp:hbr |
To enter one of the searches above to retrieve all records that contain a specified script, use the command line in the LC Names and Subjects Search window (Authorities > Search > LC Names and Subjects). See more about how the client displays LC authority file search results in Authorities, Search Authority Files, "Use LC authority file search and browse results."
Functions that are different for authority records
The following features function differently for non-Latin script data in authority records:
- Character set identifier. No script identifier will appear in authority master records (unlike bibliographic records which have identifiers in field 066 ‡c).
- Validation. Validation of authority records is, of course, different from validation of bibliographic records. Specifically, validation of authority records that contain non-Latin script data:
- Is limited to name authority records (no sh/sj)
- Is limited to the following heading fields: 400, 410, 411, 430, 451, 700, 710, 711, 730, 751
- Is limited to some 6XX notes fields (to be determined)
- Checks for display from left to right or right to left based on the Unicode range of the first character after the first subfield code
- Allows adding or replacing authority records only for MARC-8 character sets.
- Allows exporting authority records with non-MARC-8 data added locally only if validation is set to None in Tools > Options > General (click Validation Level Options).
Functions that are not available for authority records
The following existing functionality for bibliographic records is not available for authority records:
- For bibliographic records, the client also supports character sets for Bengali, Devanagari, Tamil, and Thai. However, these scripts are not in the MARC-8 repertoire of UTF-8 character sets, putting them out of the scope of this implementation of non-Latin scripts for authority records. They will not be supported for authority records.
- Export and import character set option - (always use the default MARC-8 character set for authority records).
- Export options for data fields and options for transliteration in Tools > Options > International (unavailable).
- Automatic Arabic and Persian transliteration (note that manual transliteration is available).
- Field linking/unlinking (non-Latin script fields in authority records are always unlinked).
back to top
Catalog using Arabic scripts
About using the client with Arabic scripts
Use Arabic script data for cataloging items in languages that use the Arabic script (for example, besides Arabic, Persian, Urdu, and Azerbaijani). Use Arabic script data the same way you use other non-Latin script data in the client. See "Use non-Latin scripts for cataloging" for details on using non-Latin scripts and records. See client Help or other Connexion Client System Guides on the OCLC Web site for general procedures describing how to:
- Search WorldCat
- Create records
- Edit records
- Use constant data
- Save records
- Take OCLC actions
- Export records
- Import records
Tools for using non-Latin scripts
Specific tools to help with Arabic script cataloging
- Automatically transliterate existing romanized data (Latin script equivalent) into Arabic script data for Arabic and Persian records:
- Use Edit > Transliterate > Arabic [or Persian] to transliterate selected data in a record.
- Set an option in Tools > Options > International to auto-transliterate Arabic and/or Persian WorldCat records when you download them interactively.
- Toggle alignment for Arabic or Hebrew script data right-to-left or left-to-right using View > Align Right (default: right-to-left)
- Use Unicode formatting characters to control correct display of bidirectional data in Arabic and Hebrew records.
- See procedures below for using these Arabic-specific tools.
Other tools to help with non-Latin script cataloging in general
- MARC-8 character verification (Edit > MARC-8 Characters > Verify) - verify characters separately from record validation.
- Link/unlink fields (Edit > Linking Fields > Link [or Unlink]) - visually link non-Latin script data fields with equivalent Latin script (romanized) data fields.
- Export options for data fields (Tools > Options > International) - determine:
- Whether to export both Latin-script-equivalent (romanized) data and non-Latin script data or only one or the other
- Position of data if you export both Latin and non-Latin script data
- Sort order
- Export and import data using UTF-8 Unicode or MARC-8 character sets. The UTF-8 Unicode option allows you to work with non-MARC-8 characters n the client for your local records (settings for export are in Tools > Options > Export, click Record Characteristics, and settings for import are in File > Import Records, click Record Characteristics).
- See "Use non-Latin scripts for cataloging" for more specific procedures for working with these tools.
Arabic script entry and character sets
Script entry methods
- If your system default language is not Arabic, you can install the Arabic language (various forms) in Windows. When you install Arabic, Windows provides an input keyboard for entering Arabic script. See "Connexion client international cataloging," "Input methods for languages that use non-Latin scripts."
- OCLC provides an alternative Arabic script keyboard developed for RLIN21 cataloging software. You can download the Arabic keyboard from the OCLC Web site. For instructions, see Getting Started at http://www.oclc.org/support/documentation/connexion/client/gettingstarted/gettingstarted.
See also, RLIN21 Keyboards for graphic illustrations of all keyboards, including Arabic, at (http://www.oclc.org/support/documentation/connexion/client/gettingstarted/gettingstarted/rlin21keyboards.pdf.)
RLIN21 keyboards include characters specific to each script (covering multiple languages that use Arabic script), whereas Microsoft keyboards include script characters specific to a single language.
Character sets supported
The client supports the basic and extended Arabic character sets defined in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media. These character sets are available on the Library of Congress Web site at:
< http://lcweb2.loc.gov/cocoon/codetables/33.html > (Basic)
< http://lcweb2.loc.gov/cocoon/codetables/34.html > (Extended)
- 33(hex) [ASCII graphic: 3] = Basic Arabic
- 34(hex) [ASCII graphic: 4] = Extended Arabic
Script identifier in records
The client adds the following data to ‡c of field 066 in Arabic records to indicate the presence of Arabic characters:
- (3 (Basic Arabic)
- (4 (Extended Arabic)
Romanized data
See the ALA-LC Romanization Tables for Arabic and for Persian on the Library of Congress Web site at:
http://www.loc.gov/catdir/cpso/romanization/arabic.pdf (Arabic)
http://www.loc.gov/catdir/cpso/romanization/persian.pdf (Persian)
Transliterate romanized data in Arabic or Persian records into Arabic script
The client provides two ways to automatically transliterate existing romanized data into Arabic script data:
- Use Edit > Transliterate > Arabic [or Persian] to transliterate romanized data in selected fields of a displayed record.
- Set an option in Tools > Options > International tab to auto-transliterate romanized data in all Arabic records retrieved interactively from WorldCat (records with language code ara or per and no field 066). Also select the fields to auto-transliterate.
Transliterate selected fields in a record
| |
Action |
| 1 |
Display a bibliographic record or constant data record containing romanized data that describes Arabic language materials. |
| 2 |
Place the cursor in the field containing romanized data that you want to transliterate. Or Select multiple fields containing romanized data. If you select parts of fields, the client transliterates the entire field(s). |
| 3 |
Click Edit > Transliterate > Arabic [or Persian], or press <Alt><E><T><A> or <Alt><E><T><P>, or right-click and on the pop-up menu, click Transliterate > Arabic [or Persian].
See "Results of transliteration" below. |
Note: Although you can transliterate into Arabic while working offline (you do not need to be logged on to the OCLC system), your workstation must have an Internet connection.
Auto-transliterate WorldCat records retrieved interactively
Alternatively, select an option to auto-transliterate romanized data in all WorldCat records you retrieve interactively when the records have the language code ARA but no field 066:
| |
Action |
| 1 |
Click Tools > Options (or press <Alt><T><O>), and then click International. |
| 2 |
Click to select the Auto-transliterate Arabic records check box and/or the Auto-transliterate Persian records check box.
Note: This option works for records that have the language code ara or per but no field 066. |
| 3 |
Optional. Select fields to auto-transliterate:
- In the International window, select the Auto-transliterate Arabic fields check box and/or the Auto-transliterate Persian fields check box.
- Click Choose Fields to open the Choose Fields to Auto-Transliterate window.
- Click to select or clear check boxes next to fields 1XX through 8XX (X = any valid tag number).
Default: Fields 1XX through 8XX are selected.
- Click OK to save your settings or Cancel to cancel changes. You are returned to the International window.
|
| 4 |
When finished, click Close, or press <Enter> to apply the settings and close the Options window. Or Click Apply to apply the settings without closing the window. Or Click Cancel to cancel changes.
See "Results of transliteration" below.
Note: When you retrieve and display WorldCat records, the client marks any auto-transliterated fields with the symbol  |
Results of tranliteration and auto-transliteration
The client:
- Transliterates the romanized data word by word, independently of context.
Note: If context other than that of letters within a word is a factor in the appearance of the Arabic text, you may need to edit the Arabic transliteration. See also the caution below.
- Creates an identical field with the same tag number (for example, two 245 tags) to contain the transliterated Arabic script.
- Places the Arabic script field above the associated romanized data field.
- Links the pair of associated fields with a bracket.
- If auto-transliterated (option selected in Tools > Options > International),
marks transliterated fields with the symbol 
Caution: Transliteration handles the following characters incorrectly. Revise the characters manually.
- The final character taa' marbuta preceded by hamza transliterates incorrectly as haa'.
- When 'alif maksura is followed by a period, the transliteration omits 'alif maksura.
- 'Alif laam followed by 'alif madda transliterates incorrectly as 'alif laam 'alif.
- Hyphens are incorrectly deleted in transliterated text.
- When laam kasra is followed by siin or jiim, the transliteration omits siin or jiim.
- Laam kasra followed by Haa' transliterates incorrectly as haa'.
- When two laams are followed by capital A (where the first laam is a preposition), the transliteration omits 'alif hamza. However, laam followed by lowercase "a" transliterates correctly as laam 'alif.
- When laam hyphen is followed by damma, the transliteration omits 'alif hamza.
Basis of transliteration
The client transliterates romanized data based on the rules for Arabic given in ALA-LC Romanization Tables on the Library of Congress Web site at: < http://www.loc.gov/catdir/cpso/romanization/arabic.pdf >.
Align Arabic or Hebrew script data for display and print
By default, the client displays (and prints) Arabic or Hebrew scripts with data aligned to the right. To toggle between displaying these scripts righ-to-left or left-to-right:
| Action |
Toggle alignment for all Arabic or Hebrew script data in the current record: Click View > Align Right, or press <Alt><V><I>. Default: Data aligns to the right for display and printing.
Result: The Align Right icon next to the command on the View menu is active (highlighted) if Align Right is selected. The icon is inactive (grayed out) if Align Right is cleared.
Or
Toggle data alignment in the current field: Right-click a field, and on the pop-up menu, click Right-to-Left Reading Order.
Result: The client changes alignment of the Arabic or Hebrew script data only in the current field. |
Use Unicode formatting characters to control bidirectional data
Enter Unicode formatting characters in Arabic, Persian, and Hebrew records to correctly display left-to-right multiple-digit numbers and punctuation, including brackets, hyphens, internal spaces, etc., within a field of right-to-left script data.
- Export/import using UTF-8 Unicode character set. Unicode formatting control characters are retained as is in Arabic, Persian, and Hebrew records exported or imported using the UTF-8 Unicode character set, along with other non-MARC-8 Unicode characters.
- Export/import using MARC-8 character set. The Unicode formatting characters are retained in Numeric Character Reference (NCR) format in records exported or imported using the MARC-8 character set, along with other non-MARC-8 characters.
| |
Action |
| 1 |
Click to locate the cursor in the position where you want to insert a formatting control number. |
| 2 |
Right-click in the field, and on the pop-up menu click Insert Unicode Control Character. Or Right-click to open the pop-up menu and then press the keystrokes shown in step 3. |
| 3 |
Click one of the following characters (or press the keystroke shortcuts, shown in parentheses):
|
Example: To control the display of the data 742[1981 or 1982] that you enter in field 260 ‡c, and that is preceded and to be followed by Arabic script data:
- Click to locate the cursor in field 260 ‡c.
- Right-click in the field, and in the pop-up menu click Insert Unicode Control Character. Then click LRE Start of Left-to-Right Embedding.
- Enter the data string, 742[1981 or 1982], immediately following the character.
- Without moving the cursor, right-click in the field again. In the pop-up menu click Insert Unicode Control Character. Then click PDF Pop Directional Formatting.
More information:
- For details, see the Bidirectional Algorithm report on the Unicode Web site at:
http://unicode.org/reports/tr9/
- See more about selecting a character set for exporting and importing bibliographic records in the client in Cataloging, Export or Import Bibliographic Records.
Use Arabic definite article in Arabic script searches
Always include the Arabic definite article
('alif laam) in all words in a keyword search.
Indexing for Arabic script searches
Notes on searching:
- Use word or phrase search indexes and word or phrase browse indexes.
- Word searches find the data string you enter anywhere in the indexed field. Phrase searches find the data string starting with the first character in a field or subfield and including each character in exact order. Browsing scans an index for the closest match to the character string followed by any other data.
- If you use qualifiers to limit searches, enter them using Latin script.
- Do not use derived searching.
- Do not use truncation (asterisk (*) at the end of a search term). You can use browsing for automatic truncation (enter only as many characters as needed for a match without using an asterisk at the end).
- If you want to retrieve all Arabic script records or see sample records, use the "character sets present" WorldCat search index (label vp:) with the assigned code ara.
To find all Arabic script records:
Enter vp:ara in the command line search of the Search WorldCat window (Cataloging > Search > WorldCat).
Note: If a search for all Arabic script records alone retrieves too many WorldCat records (limit 1,500 records), you must limit the search and try again.
Examples:
vp:ara/1991-2
vp:ara and la:per
See general procedures for searching WorldCat in the Cataloging/Search WorldCat booklet or client Help.
Arabic character indexing specifics:
The following table shows Arabic characters grouped together and indexed the same as if they are the same character (the characters are "normalized").
Type any character of a group of normalized characters in a search and retrieve results for all characters in the group.
Images and names of characters indexed the same are in columns 3 and 4, opposite the character with which they are indexed.
| Character |
Character name |
Other characters indexed the same |
| Character |
Character name |
|
'alif |
|
double 'alif with hamza above |
|
'alif with madda above |
|
'alif with hamza above |
|
'alif with hamza below |
|
'alif wasla |
|
'alif with wavy hamza above |
|
'alif with wavy hamza below |
|
taa' |
|
taa' marbuta |
|
taa' with ring |
|
taa' with three dots above |
|
Haa' |
|
Haa' with hamza above |
|
Haa' with two dots vertical above |
|
Haa' with three dots above |
|
daal |
|
daal with ring |
|
daal with dot below |
|
daal with dot below and small Taa' |
|
daal with three dots above downwards |
|
daal with four dots above |
|
raa' |
|
raa' with small v |
|
raa' with ring |
|
raa' with dot below |
|
raa' with small v below |
|
raa' with dot above and below |
|
raa' with two dots above |
|
raa' with four dots above |
|
siin |
|
siin with dot below and dot above |
|
siin with three dots below |
|
shiin |
|
shiin with three dots below and three dots above |
|
shiin with dot below |
|
Saad |
|
Saad with two dots below |
|
Saad with three dots above |
|
Daad |
|
Daad with dot below |
|
ghayn |
|
ghayn with dot below |
|
Taa' |
|
Taa' with three dots above |
|
ayn |
|
ayn with three dots above |
|
faa' |
|
dotless faa' |
|
faa' with dot moved below |
|
faa' with dot below |
|
faa' with three dots below |
|
qaaf |
|
qaaf with dot above |
|
qaaf with three dots above |
|
kaaf |
|
swash kaaf |
|
kaaf with ring |
|
kaaf with dot above |
|
kaaf with three dots below |
|
gaf |
|
gaf with ring |
|
gaf with two dots below |
|
gaf with three dots above |
|
laam |
|
laam with small v |
|
laam with dot above |
|
laam with three dots above |
|
laam with three dots below |
|
nuun |
|
nuun with ring |
|
nuun with three dots above |
|
nuun with dot below |
|
nuun ghunna |
|
haa' |
|
haa' doachashmee |
|
haa' with hamza above |
|
waaw |
|
waaw with hamza above |
|
waaw with two dots above |
|
waaw with ring |
|
yaa' |
|
yaa' with hamza above |
|
'alif maksura |
|
yaa' with tail |
|
yaa' with small v |
|
yaa' barree |
|
yaa' barree with hamza above |
|
oe |
|
kirgiz oe |
back to top
Catalog using Bengali script
About using the client with Bengali script
Use Bengali script data for cataloging items in languages that use the Bengali script (for example, Bangla and Assamese). Use Bengali script data the same way you use other non-Latin script data in the client. See "Use non-Latin scripts for cataloging" for details on using Bengali script and Bengali records. See client Help or other Connexion Client System Guides on the OCLC Web site for general procedures describing how to:
- Search WorldCat
- Create records
- Edit records
- Use constant data
- Save records
- Report errors in records
- Export records
- Import records
Tools for using non-Latin scripts
The client provides the following general tools to help you catalog using non-Latin scripts:
- Link/unlink fields (Edit > Linking Fields > Link [or Unlink]) - visually link non-Latin script data fields with equivalent romanized data fields.
- Export options for data fields (Tools > Options > International) - determine:
- Whether to export both Latin-script-equivalent (romanized) data and non-Latin script data or only one or the other
- Position of data if both
- Sort order
- Caution: MARC-8 character verification (Edit > MARC-8 Characters > Verify) is not appropriate for verifying Bengali characters. There is no MARC-8 character set for Bengali. Using this command for Bengali results in marking all Bengali characters as invalid. The OCLC system validates Bengali characters when you validate a record.
- See "Use non-Latin scripts for cataloging" for more specific procedures for working with these tools.
Unicode export and import required for Bengali records
Because Bengali script is not included in MARC-8 character sets, you must export and import records in Unicode format (settings are in Tools > Options > Export and File > Import Records/Options button).
About Unicode
Definition. Unicode is the universal character encoding scheme for written characters and text. It defines a consistent way of encoding multi-script text that enables the exchange of text data internationally.
Unicode provides for three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8, designed for use with ASCII-based systems).
Unicode Standard, Version 4.0
- Contains 96,382 characters from scripts used for languages worldwide.
- Identical to International Standard ISO/IEC 10646:2003, Information Technology Universal Multiple-Octet Coded Character Set (UCS) Architecture and Basic Multilingual Plane, Supplementary Planes, known as the Universal Character Set (UCS)
- The Unicode Standard, Version 4.0. The Unicode Consortium. Addison-Wesley Developers Press, 2003.
Bengali script entry and character set
Script entry method
If your system default language is not Bengali, you can install Bengali, and Windows provides an input keyboard for entering Bengali script.. See "Connexion client international cataloging/Input methods for languages that use non-Latin scripts."
Character set supported
Bengali characters are defined in Unicode 4.0 (coded in the range U+0981 to U+09FA).
Caution: In Windows XP, the font size used to display Bengali script is too small. You can increase the font size for viewing and editing these records in Tools > Options > Fonts. Font size for Bengali script is not a problem in Windows Vista.
Script identifier in records
The client adds the following data to field 066 ‡c in Bengali records to indicate the presence of Bengali characters:
Romanized data
See the ALA-LC Romanization Table for Bengali on the Library of Congress Web site at http://www.loc.gov/catdir/cpso/romanization/bengali.pdf.
Indexing for Bengali script searches
Notes on searching:
- Use word or phrase search indexes and browse indexes.
- Word searches find the data string you enter anywhere in the indexed field. Phrase searches find the data string starting with the first character in a field or subfield and including each character in exact order. Browsing scans an index for the closest match to the character string followed by any other data.
- If you use qualifiers to limit a search, type them in Latin script.
- Do not use derived searching.
- You can truncate searches (asterisk (*) at the end of a search term) or use browsing for automatic trncation (enter only as many characters as needed for a match, without using an asterisk).
- If you want to retrieve all Bengali script records or see sample records, use the "character sets present" WorldCat search index (label vp:) with the assigned code ben.
To find all Bengali script records:
Enter vp:ben in the command line search of the Search WorldCat window (Cataloging > Search > WorldCat).
Note: If a search for all Bengali script records alone retrieves too many WorldCat records (limit 1,500 records), you must limit the search and try again.
Examples:
vp:ben/1991-
vp:ben and mt:bks
See general procedures and search techniques for searching WorldCat in the Cataloging, Search WorldCat booklet or client Help.
Bengali character indexing specifics:
- Bengali signs are indexed as is (Candrabindu, Anusvara, Visarga, Nukta, and Avagraha).
- Independent vowels, dependent vowels, two-part dependent vowels,.and generic or Bengali-specific character additions are all indexed as is.
- Consonants are indexed as is, if attached with Virama (Hasant); otherwise, they are indexed with the dependent vowel or consonant.
- Both Bengali and Latin numbers are indexed (either may appear in Bengali text).
Notes on sorting search results
- Bengali syllables with candrabindu or anusvara (nasalization signs) precede terms without those syllables.
- Non-conjunct forms of a consonant precede conjunct forms.
- The default sort order for search results alphabetical sorting by Latin script is recommended if romanized (Latin-equivalent) data is included in the record. The sort order option is in Tools > Options > International.
back to top
Catalog using Chinese, Japanese, and Korean (CJK) scripts
About using the client with CJK scripts
Use CJK script data to catalog items in Chinese, Japanease, and Korean. Use CJK script data the same way you use other non-Latin script data in the client. See "Use non-Latin scripts for cataloging" for details on using non-Latin scripts and records. See client Help or other Connexion Client System Guides on the OCLC Web site for client procedures describing how to:
- Search WorldCat
- Create records
- Edit records
- Use constant data
- Save records
- Report errors in records
- Export records
- Import records
Tools for using non-Latin scripts
The client provides the following general tools to help you catalog using non-Latin scripts:
- MARC-8 character verification (Edit > MARC-8 Characters > Verify) - verify characters separately from record validation.
- Link/unlink fields (Edit > Linking Fields > Link [or Unlink]) - visually link non-Latin script data fields with equivalent romanized data fields.
- Export options for data fields (Tools > Options > International) - determine:
- Whether to export both Latin-script-equivalent (romanized) data and non-Latin script data or only one or the other
- Position of data if both
- Sort order
- Export and import data in Unicode or MARC-8 format - Unicode option allows you to work with non-MARC-8 characters that are unsupported in the client and in the WorldCat database for your local records (Tools > Options > Export and File > Import Records/Options button).
- See "Use non-Latin scripts for cataloging" for more specific procedures for working with these tools.
- Specifically for CJK: The client provides a:
- CJK E-Dictionary to help with entering CJK characters: (Tools > CJK E-Dictionary).
- Automatic converter for converting invalid characters to equivalent MARC-8 characters (Edit > MARC-8 Characters > Convert to MARC-8 CJK).
- Chinese name authority file (Authorities > Search > Chinese Name Authority File). Access to records is read-only, but you can copy and paste from records or print them. See Search the Chinese name authority file and Chinese name authority file indexes for details.
CJK entry and character set
Script entry method
If your system default language is not the one you want to use for cataloging Chinese, Japanese, or Korean materials, you can install the languages you need. Windows provides the Input Method Editors (IMEs) appropriate for CJK character entry. See more about script entry methods in "Connexion client international cataloging/Input methods for languages that use non-Latin scripts."
Character sets supported
The client supports the following Chinese, Japanese, and Korean character set defined in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media. These character sets are available on the Library of Congress Web site at: http://www.loc.gov/marc/specifications/specchareacc.html.
- 31(hex) [ASCII graphic: 1] = Chinese, Japanese, Korean (East Asian Coded Character set, or EACC)
EACC is the code used for storing CJK characters and linking them to related variants for indexing in the OCLC system.
Script identifier in records
The client adds the following data to field 066 ‡c in CJK records to indicate the presence of CJK characters:
Entering punctuation
- OCLC suggests using the English keyboard to enter punctuation in CJK data fields.
- You may enter CJK punctuation marks using one of the CJK Input Method Editors only if the marks are in the MARC-8 character set (EACC) (see a list of input codes in "Use CJK E-Dictionary—"What is Tsang chieh input code?").
- For searching purposes, CJK punctuation and Latin punctuation are normalized; that is, you can enter punctuation either way and find the same records.
Romanized data
See the ALA-LC Romanization Tables for Chinese, Japanese, and Korean on the Library of Congress Web site:
- Chinese - http://www.loc.gov/catdir/cpso/romanization/chinese.pdf
- Japanese - http://www.loc.gov/catdir/cpso/romanization/japanese.pdf
- Korean - http://www.loc.gov/catdir/cpso/romanization/korean.pdf
CJK E-Dictionary
See separate topic "Use the CJK E-Dictionary." Open this electronic dictionary from the Tools menu. The CJK dictionary:
- Lets you search or browse to retrieve information about a single character, a group of related characters, homophones matching a phonetic inpurt code, or a large set of characters in sequence by East Asian Character Code (EACC) or byUnicode value.
- Shows comprehensive types of representation for each CJK character supported in Connexion client. Open an entry in the E-Dictionary to copy a value and paste it into a record, workform, constant data, or text string.
Convert invalid CJK characters to equivalent MARC-8 characters
When you verify CJK characters as MARC-8-compliant (Edit > MARC-8 Characters > Verify), and the client identifies invalid character(s), you can automatically convert the character(s) in the record to MARC-8-equivalent CJK characters:
| Action |
Click Edit > MARC-8 Characters > Convert to MARC-8 CJK, or press <Alt><E><8><J>.
Result: The client converts the characters and changes the color of converted characters to green (by default) or to a color you specify in Tools > Options > Record Display. |
Tip: If you already know that a record contains invalid CJK characters, you can use the Edit > MARC-8 Characters > Convert to MARC-8 command without first using the Edit > MARC-8 Characters > Verify command.
Note: The Library of Congress also has a CJK Compatibility Database on the Cataloging Policy and Support Office (CPSO) home page at http://www.loc.gov/ils/cjk_search/cjk_cpso.html to help with MARC-8 compliant or missing characters.
Use the Chinese, Japanese, or Korean client interface
Change the interface language from English to Chinese (simplified or traditional), Japanese, or Korean. Select the interface language when you:
- Install the Connexion client for the first time or upgrade the client to version 1.30 or higher.
- Open a new user profile you created based on a profile for which you have not already set the interface language.
- Change the interface option at any time in Tools > Options > International.
Note: To display the Chinese, Japanese, or Korean interface, you must have an input method for the language installed on your workstation (see "Connexion client international cataloging/Input methods for languages that use non-Latin scripts"), or you must have a Chinese, Japanese, or Korean language version of Windows.
Indexing for CJK script searches
For CJK script searches, the system indexes both single characters and immediately adjacent characters in a field. Use the following search strategies:
- Word search - Enter an index label and a colon (for example ti:) followed by a character string with no spaces to find a single word, or followed by more than one character string separated by a space to find multiple words, anywhere in an indexed field.
- Phrase search - Enter an index label and an equal sign (for example, ti=) followed by a character string to find exact occurrences, starting with the first character in an indexed field and including each succeeding character. Truncate the character string to find the string followed by any other data without having to enter the entire data string as it appears in a field or subfield.
To truncate, enter an asterisk (*) at the end of the search string. Enter a minimum of three CJK characters before truncating.
- Phrase browse - Enter the Scan command, an index label, and an equal sign (for example, sca ti=) followed by a character string. Phrase browsing scans an index for occurrences of the browse string at the beginning of indexed fields, followed by any other data (automatic truncation).
Note: Since all MARC-8 CJK characters are indexed singly, if you browsed for a word, the system would scan for the first character only, and results would not be significant.
Notes on searching
- If you use qualifiers to limit a search, type them using Latin script.
- Do not use derived searching.
- For searching purposes, CJK punctuation and Latin punctuation are normalized; that is, you can enter punctuation either way and find the same records. If you enter CJK punctuation, the characters must be in the MARC-8 character set (EACC) (see a list of input codes in "Use CJK E-Dictionary—"What is Tsang chieh input code?").
- If you want to retrieve all CJK script records or see sample records, use the "character sets present" WorldCat search index (label vp:) with the assigned code cjk.
To find all CJK script records:
Enter vp:cjk in the command line of the Search WorldCat window (Cataloging > Search > WorldCat).
Note: If a search for all CJK script records alone retrieves too many WorldCat records (limit 1,500 records), you must limit the search and try again.
Examples:
vp:cjk/1991-2
vp:cjk and mt:bks
See general procedures and techniques for searching WorldCat in the Cataloging/Search WorldCat booklet or client Help.
back to top
Use CJK E-Dictionary
What is the CJK E-Dictionary?
The CJK E-Dictionary (electronic dictionary for Chinese, Japanese, and Korean characters):
- Allows you to search or browse to retrieve information about a CJK character, group of related characters, homophones matching a phonetic input code, or a large set of characters in sequence by EACC or Unicode value.
- Includes all CJK characters represented in the East Asian Character Code (EACC) and supported in the Connexion client.
- For each character, provides the following types of character representation (if applicable):
- EACC bitmap
- EACC 3-byte code
- Unicode font representation
- Unicode
- Tsang-chieh input code
- Wade-Giles input code (if applicable)
- Pinyin input code (if applicable)
- McCune-Reischauer input code (if applicable)
- Modified Hepburn input code (if applicable)
See "Use the search results list" section below for a des