This version: Draft 2010-03-11
Author: Andrew Houghton, OCLC Research
Date issued: 2010-03-11
Document Status: Draft
License: Creative Commons Attribution-Share United States license.Abstract
- MARC-JSON is a library community data interchange format based on the MARC (ISO 2709) and JavaScript Object Notation (JSON) specifications.
Introduction
- The MARC-JSON format specification is the product of discussions on the code4lib list about the growing need to serialize MARC as JSON for use with AJAX scenarios and NoSQL schema-free document-oriented JSON databases, such as CouchDB and MongoDB. The MARC-JSON format retains the semantics of MARC and allows MARC-JSON to be reserialized into MARC, MARC-XML or MarcXchange without loss of semantics or content. This specification describes how to serialize MARC as JSON and presumes a familiarity with MARC.
Example
- A MARC record is a binary format that contains ASCII control characters and no line breaks which creates a challenge for using in an example. The following sample MARC authority record has been formatted for readability and the ASCII control characters have been replaced by printable representations:
-
- The MARC field terminator (FT), ASCII codepoint 1E (hex), has been replaced with Unicode codepoint U+241E (␞).
- The MARC record terminator (RT), ASCII codepoint 1D (hex), has been replaced with Unicode codepoint U+241D (␝).
- The MARC subfield delimiter, ASCII codepoint 1F (hex), has been replaced with Unicode codepoint U+241F (␟).
00474cz a2200157n 4500
001 0012 00000
003 0006 00012
005 0017 00018
008 0041 00035
040 0028 00076
016 0023 00104
043 0012 00127
151 0047 00139
688 0032 00186
688 0032 00218
751 0066 00250
␞fst01312614
␞OCoLC
␞20100213044034.6
␞060620nn anznnbabn || ana d
␞ ␟aOCoLC␟beng␟cOCoLC␟ffast
␞7 ␟afst01312614␟2OCoLC
␞ ␟an-cn-qu
␞ ␟aQuébec␟zSaint-Laurent (Île-de-Montréal)
␞ ␟aLC (2008) Subject Usage: 13
␞ ␟aWC (2008) Subject Usage: 59
␞ 0␟aSaint-Laurent (Île-de-Montréal, Québec)␟0(DLC)n 80080336
␞␝
- For comparision the sample MARC authority record is shown serialized as MARC-XML:
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00480cz a2200157n 4500</leader>
<controlfield tag="001">fst01312614</controlfield>
<controlfield tag="003">OCoLC</controlfield>
<controlfield tag="005">20100213044034.6</controlfield>
<controlfield tag="008">060620nn anznnbabn || ana d</controlfield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">OCoLC</subfield>
<subfield code="b">eng</subfield>
<subfield code="c">OCoLC</subfield>
<subfield code="f">fast</subfield>
</datafield>
<datafield tag="016" ind1="7" ind2=" ">
<subfield code="a">fst01312614</subfield>
<subfield code="2">OCoLC</subfield>
</datafield>
<datafield tag="043" ind1=" " ind2=" ">
<subfield code="a">n-cn-qu</subfield>
</datafield>
<datafield tag="151" ind1=" " ind2=" ">
<subfield code="a">Québec</subfield>
<subfield code="z">Saint-Laurent (Île-de-Montréal)</subfield>
</datafield>
<datafield tag="688" ind1=" " ind2=" ">
<subfield code="a">LC (2008) Subject Usage: 13</subfield>
</datafield>
<datafield tag="688" ind1=" " ind2=" ">
<subfield code="a">WC (2008) Subject Usage: 59</subfield>
</datafield>
<datafield tag="751" ind1=" " ind2="0">
<subfield code="a">Saint-Laurent (Île-de-Montréal, Québec)</subfield>
<subfield code="0">(DLC)n 80080336 </subfield>
</datafield>
</record>
- When the above sample MARC authority record is serialized according to this specification it would structurally appear as the following JSON. The JSON below has been formatted for readability with optional whitespace that is allowable under the JSON specification:
{
leader : "00480cz a2200157n 4500",
controlfield :
[
{ tag : "001", data : "fst01312614" },
{ tag : "003", data : "OCoLC" },
{ tag : "005", data : "20100213044034.6" },
{ tag : "008", data : "060620nn anznnbabn || ana d" }
]
datafield :
[
{
tag : "040", ind : " ",
subfield :
[
{ code : "a", data : "OCoLC" },
{ code : "b", data : "eng" },
{ code : "c", data : "OCoLC" },
{ code : "f", data : "fast" }
]
},
{
tag : "016", ind : "7 ",
subfield :
[
{ code : "a", data : "fst01312614" },
{ code : "2", data : "OCoLC" }
]
},
{
tag : "043", ind : " ",
subfield :
[
{ code : "a", data : "n-cn-qu" }
]
},
{
tag : "151", ind : " ",
subfield :
[
{ code : "a", data : "Québec" },
{ code : "z", data : "Saint-Laurent (Île-de-Montréal)" }
]
},
{
tag : "688", ind : " ",
subfield :
[
{ code : "a", data : "LC (2008) Subject Usage: 13" }
]
},
{
tag : "688", ind : " ",
subfield :
[
{ code : "a", data : "WC (2008) Subject Usage: 59" }
]
},
{
tag : "751", ind : " ",
subfield :
[
{ code : "a", data : "Saint-Laurent (Île-de-Montréal, Québec)" },
{ code : "0", data : "(DLC)n 80080336 " }
]
}
]
}
Definitions
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119.
-
- conforming implementation
- A conforming implementation is an implementation producing a serialization specified by this specification.
-
- consuming application
- A consuming application is an application consuming a serialization specified by this specification.
Objects
- The MARC-JSON format contains four objects:
record Object
- A record object represents a MARC record and contains three properties:
leader Property
- The leader property contains the content from the MARC leader.
- Constraints:
- The property's name MUST be "leader".
- The property's value MUST be a string of exactly 24 characters.
- The property's value SHOULD contain all the characters from the MARC leader.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
- A consuming application SHOULD ignore content positions 00-04 of the property's value.
- A consuming application SHOULD ignore content positions 12-16 of the property's value.
controlfield Property
- The controlfield property represents the collection of MARC control fields.
- Constraints:
datafield Property
- The datafield property represents the collection of MARC data fields.
- Constraints:
controlfield Object
- A control field object represents a MARC control field and contains two properties:
- The tag property contains the identifier for the MARC control field.
- The data property contains the content for the MARC control field.
tag Property
- The tag property contains the identifier for the MARC control field.
- Constraints:
- The property's name MUST be "tag".
- The property's value MUST be a string of exactly 3 characters.
- The property's value SHOULD contain the characters from positions 00-02 of the control field's MARC directory entry.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
data Property
- The data property contains the content for the MARC control field.
- Constraints:
- The property's name MUST be "data".
- The property's value MUST be a string whose length SHOULD be determined from positions 03-06 of the control field's MARC directory entry.
- The property's value SHOULD contain the characters starting at the position defined by adding the start record position to the values defined by positions 12-16 in the leader and positions 07-11 in the control field's directory entry and ending before the next MARC field terminator.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
datafield Object
- A data field object represents a MARC data field and contains three properties:
- The tag property contains the identifier for the MARC data field.
- The ind property contains the indicators for the MARC data field.
- The subfield property represents the collection of MARC subfields for the data field.
tag Property
- The tag property contains the identifier for the MARC data field.
- Constraints:
- The property's name MUST be "tag".
- The property's value MUST be a string of exactly 3 characters.
- The property's value SHOULD contain the characters from position 00-02 of the data field's MARC directory entry.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
ind Property
- The ind property contains the indicators for the MARC data field.
- Constraints:
- The property's name MUST be "ind".
- The property's value MUST be a string whose length is defined by position 10 from the MARC leader.
- The property's value SHOULD contain the characters starting at the position defined by adding the start record position to the values defined by positions 12-16 in the leader and positions 07-11 in the control field's directory entry and continuing for the length defined by position 10 from the MARC leader.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
subfield Property
- The subfield property represents the collection of MARC subfields for a data field.
- Constraints:
subfield Object
- A subfield object represents a MARC subfield and contains two properties:
code Property
- The code property contains the identifier for the MARC subfield.
- Constraints:
- The property's name MUST be "code".
- The property's value MUST be a string whose length is defined by position 11 from the MARC leader minus one.
- The property's value SHOULD contain the characters after the subfield's delimiter for a length defined by position 11 from the MARC leader minus one.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
data Property
- The data property contains the content for the MARC subfield.
- Constraints:
- The property's name MUST be "data".
- The property's value MUST be a string whose length is determined by the content after the subfield delimiter and code, and before the next MARC subfield delimiter or field terminator.
- The property's value SHOULD contain the characters after the subfield delimiter and code, and before the next MARC subfield delimiter or field terminator.
- The property's value MUST be specified in Unicode or escaped using JSON's character escape mechanisms.
Collections
- The MARC-JSON format contains four collections:
record Collection
- The record collection represents zero or more MARC records.
- Constraints:
- The collection MUST be a JSON array containing zero or more record objects.
controlfield Collection
- The controlfield collection represents zero or more MARC control fields.
- Constraints:
datafield Collection
- The datafield collection represents zero or more MARC data fields
- Constraints:
subfield Collection
- The subfield collection represents zero or more MARC subfields.
- Constraints:
Serialization Formats
- The MARC-JSON format has two serializations depending on the conforming implementation's use case scenario:
- Content encoding considerations:
- The serialized JSON MUST be encoded in Unicode.
- The serialized JSON SHOULD be encoded in Unicode NFC (normal form C).
- The serialized JSON MUST be encoded in UTF-8, UTF-16 or UTF-32.
- The serialized JSON MAY include a BOM (Byte Order Mark) when encoded in UTF-8.
- The serialized JSON MUST include a BOM when encoded in UTF-16, UTF-16BE or UTF-16LE.
- The serialized JSON MUST include a BOM when encoded in UTF-32, UTF-32BE or UTF-32LE.
- The content-transfer-encoding for an HTTP response MUST be binary when JSON is encoded in UTF-16 or UTF-32.
- Media type and document extension considerations:
- The media type for an HTTP request SHOULD be "application/json".
- The media type for an HTTP response MUST be "application/json".
- The document extension SHOULD be ".json".
record Serialization
- The record serialization allows a conforming implementation to serialize a single MARC record as a JSON object for data interchange.
- Constraints:
collection Serialization
- The collection serialization allows a conforming implementation to serialize zero or more MARC records as a JSON array for data interchange.
- Constraints:
- The serialization MUST be a record collection.
- An conforming implementation SHOULD be cognizant of a consuming application's limitations and not serialize more data than a consuming application can handle.
References
- JSON format and standards
- Introducing JSON provides a description of the JSON format and available implementations.
- RFC 4627 describes the JSON media type.
- ECMA 262 contains the specification for the JSON subset of the ECMAScript programming language.
- MARC format and standards
- MARC Standards provides information about the various MARC formats and standards.
- RFC 2220 describes the MARC media type.
- MARC Record Structure describes the MARC (ISO 2709) record structure, character sets, and exchange media.
- MARC XML Schema provides information on serializing MARC (ISO 2709) into XML using an XML schema.
- MarcXchange, also known as ISO 25577, provides information on serializing MARC (ISO 2709) into XML using an XML schema.
- IETF Draft describes various media types for library formats including MARC-XML.
- ISO standards
- ISO 2709 provides an abstract of the standard and purchasing options.
- ISO 25577 provides an abstract of the standard and purchasing options.
Appendix A (Informative)
- An XSLT 2.0 transform is being provided to demonstrate a conforming implementation of this specification. The XSLT 2.0 transform described in this appendix is informative rather than normative. The transform converts MARC (ISO 2709) that has been serialized as MARC-XML or MarcXchange (ISO 25577) XML into MARC-JSON. The input XML document to the transform may be either a single MARC record or a collection of MARC records.
Comments
Example doesn't validate
Accordingly to jsonlint.com, the example provided in this page doesn't validate. Propoerties must be enclosed in double quotes.
BOM and RFC:4627
The use of BOMs for payload advertised as `application/json` is in contradiction to RFC 4627's utf-encoding detection defined in chapter 3