Core Elements Subgroup
Preservation Rights Metadata Use Cases
The Core Elements Subgroup is responsible for the following elements of the charge:
An implementable set of "core" preservation metadata elements, with broad applicability within the digital preservation community.
Come to consensus on what "core" means.
Compare preservation metadata element sets of various institutions, particularly in terms of whether there are "core" elements. (July-Dec.) This will be done by mapping metadata elements currently in use in implementations to those detailed in the OCLC/RLG Framework document (PDF:696K/54pp.).
While determining a core set of elements, consider also the aspect of what is implementable. This will involve establishing levels of elements (perhaps in terms of mandatory, recommended and optional) to support different levels of functionality. (Dec., Jan.)
A data dictionary to support the core preservation metadata element set (Jan.-Mar. 2004)
As part of the mapping exercise some of the data needed for a data dictionary is being included, such as mandatory vs. optional and data constraints.
The Core-Elements subgroup is having mostly weekly conference calls and has accomplished the following:
September 2004: The group spent time discussing the differences between files and bitstreams and how the semantic units applied to them. It was proposed that there was a need for a new level called "filestreams." This also related to previous discussions about embedded files. The group continued its discussion of environment elements and whether this information is dependent on file format information. It continued to define what information is needed about the environment in order to render objects for the long term. Two new participants joined the group, one from DSpace and another from the Walt Disney Company. A workplan was developed to finish the data dictionary by December in anticipation of a final PREMIS report by the end of 2004.
August 2004: The group had a face-to-face meeting in Cambridge, Massachusetts, during the first week of August and continued to work through the data dictionary. Participants revised the data model, particularly in terms of how the various entities related to each other. There was also discussion about preservation policy and business rules and how these relate to the data model. The group made considerable progress on the list of semantic units applying to all file formats. Much discussion centered around multiple layers of file formats, i.e. embedded content objects with multiple wrappers and what metadata is needed. Discussion throughout the month centered around many of the issues that arose in the meeting.
July 2004: The group revised and further discussed the data model showing entities and their relationships. There was some discussion about the rights and responsibilities inherent in preservation functions. The data model was sent out to the PREMIS Advisory Committee for comments, preferably before the August meeting.
Work continued on file format information elements and whether they apply both to files and bitstreams (the answer was yes). A few format experts have participated in the calls. One is Steve Abrams at Harvard, who is working on the Global Digital Format Registry, and another is Priscilla's colleague Andrea at FCLA.
Further discussion on hardware and software environments led to the consensus that information about these is necessary to later render the object. The group should look at the work of the PRONOM registry and its concept of viewpaths. A paper was drafted to a provide a model that maps environment components to each level of object that has been defined (bitstream, file, representation) and relating each level to a preservation objective. The environment information does not need to necessarily be stored as metadata with each object, but may be associated with a class of objects. Additional discussion on this will take place when the committee member working with PRONOM is able to attend the meetings.
June 2004: The group decided to go to an every week schedule (except for one week a month) in order to speed up the work. Also, to move things forward faster, the group agreed to meet in the Boston area in conjunction with the early August meeting of the Society of American Archivists.
Work on the elements for the agents entity was completed. Only
agentIdentifier (with scheme and value) and
agentName are included, since more detailed agent descriptions are being defined elsewhere and PREMIS does not wish to reinvent these.
The group continued to work on file format information and how it relates to profiles. The data dictionary needs to allow pointing to file format information in a registry. Information needed to render a format may be different from format properties; elements are needed for information about the environment from which a file came. Elements were added for
FileFormatName (includes value and version) and
fileFormatRegistry (includes registry name, key and role). File format registry information is repeatable, because there is a need to be able to indicate an entry in more than one registry. Profile information is not necessary as a separate element because it is part of file format information.
May 2004: Work during the month included developing the agent entity and what is the minimal amount of information needed for preservation purposes. How agents relate to other entities (rights, events, objects) needs to be considered, and enough information to document who did what and who authorized what is necessary. The level of detail in agent information may depend on a particular implementation, so that defined in the data dictionary should be minimal and general. Agent role in events is important to consider. The group is working on developing the data model more fully, a document which will be a deliverable of the working group. This will detail relationships between entities. Work on technical metadata regardless of file format continued, particularly concerning format identification.
April 2004: The group considered some example objects and how the current list of core data elements may be applied. In particular, it considered the Los Angeles Times text archive and Harvard's complex audio files. Conclusions were that those elements dealing with the objects themselves work fairly well, but conveying enough information about relationships between entities is problematic and needs further work. Work continued on the data dictionary, particularly on the objects and events entities. There was discussion about permanence levels mainly in terms of significant properties. Further discussion will be held after the group looks at technical metadata. The group considered strategies for working on technical metadata. Scope should be limited only to technical metadata that applies regardless of file format. Work on the Global Digital Formats Registry is useful here, since we may be able to use an identifier to point to information about a file format, but we cannot assume its existence and it's still in development.
March 2004: The group worked on templates to use for the data dictionary largely based on what was developed at the National Library of Australia. We will use slightly different templates for different entity types (i.e. objects, agents, events, relationships). There was a lot of discussion about which elements are recorded at which level of granularity—i.e., at the representation, file, or bitstream level. This is important information to convey for guidance in applying the element set. In addition the group would like institutions engaged in preservation activities to submit use cases for the kind of information needed in the preservation context. Nancy Hoebelheinrich at Harvard is developing a template and sample use case for this purpose. This will help the group to determine what information is needed to determine terms and conditions for objects in a preservation context.
February 2004: There was considerable discussion about the entity relationship diagram that the group was developing. It includes entities for objects, events, agents and rights statements. The group is further considering how to show relationships between the entities. As a result, the data dictionary is being reorganized to approach the data elements from an entity point of view. Small groups are revising each section and adding examples, clarifying definitions, etc. Participants are continuing to submit example objects to help clarify use of the element set. In addition the group has discussed the work at NLM on permanence ratings, and a small group was formed to analyze its impact on the core elements.
January 2004: The core elements group met in San Diego on Jan. 8 and made substantial progress on preparing a data dictionary for core metadata elements for Preservation Description Information. In defining core, the group agreed that the elements to be included had to be essential for a working archive to know because they satisfy certain functions (e.g. viabilility, renderability, understandability, authenticity, identity). The data dictionary will include the name of the semantic unit and its components, definition, obligation (required or not), data constraints, level of entity, repeatability, examples and notes. Members of the group submitted examples of element usage with different kinds of digital objects. After the meeting it was decided to reorganize the data dictionary according to an object type model, and work began on the data model. In addition, the group continued to discuss elements for rights statements related to preservation and the need to understand use cases.
December 2003: The group came to consensus on core elements for events and fixity information. The group will use a narrower definition of fixity information than OAIS to include validating document integrity and whether it had been changed; OAIS includes fixity and authentication together. As a result of this discussion, the group decided to consider additional documents in its deliverables to document any departures from OAIS and a paper about broad guiding principles. Some discussion occurred about rights related to preservation and some of the work underway on developing use cases for rights statements and the draft rights extension schema for METS.
November 2003: The group came to consensus on core elements for relationships and continued its discussion of events. As a byproduct, the group again discussed a typology of entities for digital objects so that it will be clear at which level a given metadata element would apply. The group decided to have a face-to-face meeting in conjunction with ALA to make progress on core elements and the data dictionary on Jan. 9, 2004 in San Diego.
October 2003: The group had several discussions about relationships between digital objects in order to determine which core elements were needed to describe these relationships. In particular, discussions centered around types of relationships and how they apply to different levels of objects. Two very useful documents were produced as part of this work by members of the group. Other discussions centered around how events are treated in various implementations compared to their treatment in the OCLC/RLG framework document from the previous working group. By the end of the month, the group had begun discussion on most of the elements relating to preservation description information, although more discussion was needed in some areas.
September 2003: The group decided that using the spreadsheet for the element comparison would be useful. OCLC provided a second spreadsheet for the elements from Content Information. Element by element discussions began. Much attention was given to how implementations use identifiers and discussion began on relationships between objects.
August 2003: Experimentation began with a spreadsheet provided by OCLC mapping their metadata elements to the OCLC/RLG framework. The spreadsheet was revised and others with implementations added theirs to the spreadsheet. This was done only initially for Preservation Description Information.
July 2003: Discussions of what core means, the need for a glossary so that we all use the same terminology, methodology for comparing element sets. The group began a glossary that was then given to the full PREMIS group for comment and further discussion.