An Archival Submission       Information Package for E-Journals

E-Journal Archive

Mellon E-journal archive project
http://www.diglib.org/preserve/ejp.htm
“Preserve significant intellectual content … independent of the form originally delivered�
Publisher, not subject based
Multiple external content suppliers
Initially dark content


Design Principles

Issue-centric
Capture content at highest resolution, finest granularity, most abstract representation
Archiving work, not manifestation
Architecture based on OAIS
Acceptance by stakeholders is cost-sensitive
Automation
Standards
Homogeneity


Find Efficiencies Wherever Possible!

What Is OAIS?

Open Archival Information System
http://ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf
“A common framework of terms and       concepts … to provide long-term       preservation of digital information�
An functional and information reference      model


OAIS Functional Model

OAIS Information Model

Archival SIP

Unit of submission is the e-journal issue
Modeled at two levels: issue and item
Explicit separation of content and metadata
METS used for metadata
XML DTD for item content under development by Harvard and NLM, based on PMC2 DTD


Normative Data Formats

Necessary for internal data homogeneity
Single format for each content category
Text is XML, raster still image is TIFF, ...
Standards, maturity, viability, robust tools, created upstream in production process
Lower level specification than MIME type
Bi-tonal TIFF with Group IV compression
Non-normative formats transformed on ingest


Format Registry

Version history
Authoritative specification / maintenance org.
Identity characterization
MIME type, magic number, internal syntax
Application specific profile
Technical metadata schema
Compliant tools
Community-wide resource and responsibility


SIP Directory Structure

What Is METS?

Metadata Encoding & Transmission Standard
http://www.loc.gov/standards/mets/
“A standard for encoding descriptive, administrative, and structural metadata   regarding objects within a digital library�
DLF funded; maintained at LC MARC Standards Office


METS Schema

Namespace-qualified XML schema
http://www.loc.gov/standards/mets/mets.xsd
Explicit structural metadata; containers for externally-defined descriptive and administrative metadata (“extension schemas�)
External pointers using XLink; internal links via ID/IDREF


Why METS?

Why not use RDF, Topic Map (ISO/IEC 13250), or a custom schema?
METS is designed specifically for library-like digital objects
Appropriate technology with pre-defined semantics
Community support


METS Structure

SIP Hierarchical Structure

SIP Metadata

DMD based on MODS (or MODS-like)
Where does technical metadata come from?
Item-level rights metadata overrides issue-level metadata
Provenance and source metadata not used
Structural metadata explicitly defined by SIP specification


SIP Details

File and internal ID naming conventions designed to make SIP self-documenting
Non-ASCII Unicode entered as XML numeric entities; non-Unicode as character entities
SIP is aggregated and compressed into a JAR file for submission
Full SIP specification (Version 1.0 DRAFT)
http://www.diglib.org/preserve/harvardsip10.pdf


METS Java Toolkit

Procedural construction and parsing of METS files
http://hul.harvard.edu/mets/
Java API
Local and global validation
Marshal/unmarshal
Serialize in-memory representation to file
De-serialize from file to in-memory representation


Toolkit Implementation

Generic API
Can be sub-classed for application-specific behavior
Based on Sun’s JAXB specification
http://java.sun.com/xml/jaxb/
Uses Jim Clark’s XP parser
http://jclark.com/xml/xp/


Procedural Construction

METS Automation Tools

For depositors:
Construction of partial METS files
Still need to add specific metadata
Pre-deposit validation
API integration into existing content management and production systems
For the archive:
Parsing and validation during ingest
Ingest conversion of SIP to AIP
Access conversion of AIP to DIP


Are These Standards Helpful?

OAIS
Common vocabulary for inter-project and inter-disciplinary conversation
Conceptual mapping between heterogeneous systems
METS
Appropriate technology
Not a universal panacea; but sufficient for library-like resources and processes
Community support


Questions?

Mellon E-Journal Archiving Project
http://www.diglib.org/preserve/ejp.htm
http://www.diglib.org/preserve/harvardsip10.pdf
OAIS
http://ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf
METS
http://www.loc.gov/standards/mets/
http://hul.harvard.edu/mets/
stephen_abrams@harvard.edu