|
Subfield ‡c contains a code identifying the alternative character set used in the record. The subfield is repeated for each additional character set present. The following codes display:
$1Chinese, Japanese, Korean vernacular present
(3 Basic Arabic present
(4 Extended Arabic present
(N Basic Cyrillic present
(Q Extended Cyrillic present
(S Extended Greek present
(2 Basic Hebrew present
Note: These character sets encode language data in the script of the language. They do not encode romanized data in Latin script. The dollar sign ( "$" ) means the character set has multiple bytes per character. The left paragraph mark ( "(" ) means the character set has one byte per character.
Character sets for Bengali, Devanagari, Tamil, and Thai. There are no MARC-8 character sets for Bengali, Devanagari, Tamil, and Thai. OCLC implemented the following script identification codes for these scripts based on ISO 15924 Code Lists (
http://www.unicode.org/iso15924/codelists.html) and supports Unicode (
http://www.unicode.org/versions/Unicode4.0.0/) UTF-8 characters for these scripts.
Beng Bengali present.
Deva Deva present.
Taml Tamil present.
Thai Thai present.
Note: Records containing non-MARC-8 characters are expected to be output in the UTF-8 (Unicode) data format. If multiple non-Latin scripts exist in a single field or a single record and the MARC-8 data format is used, all non MARC-8 characters are expressed by numeric character reference (NCR) using the form च, where x is in the lower case and 091A indicates the Unicode code point of the target script. Non-MARC-8 script code does not appear in subfield ‡6 of the 880 linkage field. |