An Archival Submission       Information Package for E-Journals

E-Journal Archive

Mellon E-journal archive project
“Preserve significant intellectual content … independent of the form originally delivered�
Publisher, not subject based
Multiple external content suppliers
Initially dark content

Design Principles

Capture content at highest resolution, finest granularity, most abstract representation
Archiving work, not manifestation
Architecture based on OAIS
Acceptance by stakeholders is cost-sensitive

Find Efficiencies Wherever Possible!

What Is OAIS?

Open Archival Information System
“A common framework of terms and       concepts … to provide long-term       preservation of digital information�
An functional and information reference      model

OAIS Functional Model

OAIS Information Model

Archival SIP

Unit of submission is the e-journal issue
Modeled at two levels: issue and item
Explicit separation of content and metadata
METS used for metadata
XML DTD for item content under development by Harvard and NLM, based on PMC2 DTD

Normative Data Formats

Necessary for internal data homogeneity
Single format for each content category
Text is XML, raster still image is TIFF, ...
Standards, maturity, viability, robust tools, created upstream in production process
Lower level specification than MIME type
Bi-tonal TIFF with Group IV compression
Non-normative formats transformed on ingest

Format Registry

Version history
Authoritative specification / maintenance org.
Identity characterization
MIME type, magic number, internal syntax
Application specific profile
Technical metadata schema
Compliant tools
Community-wide resource and responsibility

SIP Directory Structure

What Is METS?

Metadata Encoding & Transmission Standard
“A standard for encoding descriptive, administrative, and structural metadata   regarding objects within a digital library�
DLF funded; maintained at LC MARC Standards Office

METS Schema

Namespace-qualified XML schema
Explicit structural metadata; containers for externally-defined descriptive and administrative metadata (“extension schemas�)
External pointers using XLink; internal links via ID/IDREF


Why not use RDF, Topic Map (ISO/IEC 13250), or a custom schema?
METS is designed specifically for library-like digital objects
Appropriate technology with pre-defined semantics
Community support

METS Structure

SIP Hierarchical Structure

SIP Metadata

DMD based on MODS (or MODS-like)
Where does technical metadata come from?
Item-level rights metadata overrides issue-level metadata
Provenance and source metadata not used
Structural metadata explicitly defined by SIP specification

SIP Details

File and internal ID naming conventions designed to make SIP self-documenting
Non-ASCII Unicode entered as XML numeric entities; non-Unicode as character entities
SIP is aggregated and compressed into a JAR file for submission
Full SIP specification (Version 1.0 DRAFT)

METS Java Toolkit

Procedural construction and parsing of METS files
Java API
Local and global validation
Serialize in-memory representation to file
De-serialize from file to in-memory representation

Toolkit Implementation

Generic API
Can be sub-classed for application-specific behavior
Based on Sun’s JAXB specification
Uses Jim Clark’s XP parser

Procedural Construction

METS Automation Tools

For depositors:
Construction of partial METS files
Still need to add specific metadata
Pre-deposit validation
API integration into existing content management and production systems
For the archive:
Parsing and validation during ingest
Ingest conversion of SIP to AIP
Access conversion of AIP to DIP

Are These Standards Helpful?

Common vocabulary for inter-project and inter-disciplinary conversation
Conceptual mapping between heterogeneous systems
Appropriate technology
Not a universal panacea; but sufficient for library-like resources and processes
Community support


Mellon E-Journal Archiving Project