Proposed Guidelines for Extending the Use of Dublin Core

[The following guidelines were prepared by a working group that assembled following the RLG metadata summit. They were intended to build on and supplement the implementation and guidelines developed through the series of Dublin Core discussions and meetings, and were submitted to the larger Dublin Core community at the DC5 meeting in Helsinki in early October 1997.]

Introduction

Many institutions and organizations provide access to a variety of information resources, including:

  • Prints and photographs
  • Built environment
  • Art and museum objects
  • Books, journals, and other publications
  • Images of any of the above
  • Journal citation indexes
  • OPACs or other library catalogs
  • Descriptions of art and museum objects
  • Full texts of documents
  • Finding aids for archival collections
  • High-level descriptions of databases
  • Descriptions of collections or resources
  • Web pages presenting items from collections
  • Web pages about institutions or organizations

Some of this information is Web-based (embodied in HTML pages) and some is not. Some is accessible via the Web, but does not itself reside in static Web pages. Although there have been repeated attempts to ensure that the Dublin Core does not exclude use for all of these types of resources, especially nonelectronic resources, previous Dublin Core work addresses, for the most part, Web-based information.

This application is intended for library, archive, and museum communities concerned with making accessible all kinds of information. A vast amount of the information in their care is in analog form and, while the original may be reformatted to create a digital surrogate, in many cases, there is no networkable form.

These communities have in place effective standards and proven practices for providing description of and access to various types of resources. It is not likely that they will stop using these proven approaches. It is also not likely that those employing mature practices developed for specific researchers or specific resources will be content to simplify their data for a lowest common denominator approach.

Because the Dublin Core elements are the product of the consensus of a vast array of representatives from many disciplines, nations, and communities and because they are accepted as a good approach to helping researchers find information, they should be useful in many ways. Two minor changes in element definitions (DATE and PUBLISHER) were proposed to the larger Dublin Core group and were accepted. Those changes make it possible to describe original materials from which surrogates have been derived.

The issue at hand is: Can the Dublin Core elements be used to integrate access to a broad range of resources? Can a simple set of descriptive elements help researchers discover resources that they then can investigate using the appropriate indigenous tools and methods? Can the Dublin Core be used to bring together the separate worlds of finding things on the Web and finding things in traditional ways? The guidelines for generic application of the Dublin Core address these questions.

The Research Libraries Group (RLG) hosted a "metadata summit" to assemble individuals from a variety of organizations involvd in describing information resources to begin answering those questions. A subset of the summit attendees volunteered to serve on a working group to draft recommendations for extending the use of the Dublin Core elements. The group attempted to accommodate the needs of libraries, archives, museums, and information providers, such as RLG in creating general principles that open doors to other applications. This document emerged from that process and was endorsed by most of the members. These individual communities will of course wish to investigate further, demonstrate implementation, and make more detailed guidelines to suit the special requirements of the indigenous resources and users.

Guidelines for extending the use of Dublin Core elements to create a generic application integrating all kinds of information resources

I. Extending the Dublin Core beyond Web-based resources

For those in libraries, museums, and archives hoping to use the Dublin Core elements to describe a variety of information resources—Internet-accessible and not—the implementation will be one of two main approaches:

A) Use the elements only to create indexes that may be accessed via the Web, though they are not "on the Web." This is the lingua franca use, wherein syntax is largely irrelevant. A brief discussion of this use follows.

B) Create Web pages that describe non-Web resources, either with a one-to-one relationship, or perhaps more likely, a one-to-many relationship. The bulk of this proposal focuses on the second approach.

I.A. Lingua franca use of the Dublin Core Elements

It is likely that non-Web-based resources will not be described "using" the Dublin Core elements, but rather that their native descriptions would be mapped conceptually to the Dublin Core. These mappings from one approach to another are often referred to as "crosswalks."

One could use a variety of crosswalks to develop a system that provides seemingly seamless, iterative searching of related indexes of different types of data (OPACs, HTML pages, SGML-encoded full texts, etc.).

Alternatively, one could use the crosswalks to create common indexes of disparate information, e.g., for the Dublin Core index called IDENTIFIER, include the contents of the USMARC fields 010 (LC Control Number), 020 (ISBN), 022 (ISSN), 024 (Other Standard Identifier), 856$u (Uniform Resource Locator), and the EAD tag , and the CIMI tag .... and so on.

Another approach would be to use the crosswalks to convert all the different data formats to uniform records using just the Dublin Core elements. But in many cases, the native records will be more extensive and one would not want to lose that information, so the Dublin Core record might "point to" the fuller record, just as embedded Dublin Core data in HTML META tags points to the fuller information in the body of the Web page itself.

Proposal: There should be one or more registries of crosswalks to and from other metadata approaches and the Dublin Core. The Library of Congress, OCLC, UKOLN, and perhaps others are beginning to assemble such registries.

I.B. Bringing non-Web resources to the Web via Dublin Core metadata

To go beyond the crosswalk approach, one might create Web pages for non-Web resources and assign Dublin Core HTML META tags (or use the RDF implementation), thereby virtually bringing all the non-Web resources into the world of the Web, within the reach of Web search indexes, and to the attention of those who search the Web. This may be done at the item-level, but more likely would be done at a group- or collection-level. These Web pages can be mounted by the owner or provider of the information and made available in a similar manner to that of their Web-based resources.

The remainder of these guidelines focuses on this approach and has as goals:

1. Taking advantage of the good work and consensus the Dublin Core represents to propose a new application.

2. Not overburdening the Dublin Core with metadata beyond that necessary to help researchers discover resources.

3. Ensuring the interoperability of the strategic Web application and this more generic application of the Dublin Core.

II. Basic principle and related definitions

In the strategic Web application of the Dublin Core elements, for the most part, it is assumed the documents are Internet-accessible original documents. The basic distinction between the strategic application and the generic application under discussion is that in the generic application it is important to indicate:

1. If a document is the original or not.
2. If a document is Internet-accessible or not.

The nature of the document is defined here as either:

"Original," which is used to mean the first manifestation, e.g., a journal published only on the Web or a painting in a museum.

"Surrogate," which is used to mean a version that stands for an original—not necessarily an exact reproduction. A copy on microfilm is an example of a surrogate. This use of the word includes lesser versions (e.g., thumbnail images) and reformatted versions (e.g., a digital audio version of an analog recording), but not a part of the whole (e.g., a detail from a photograph).

The means of accessing the document are defined from the Internet researcher point of view:

"On-line" is used to mean that the document (in original or surrogate form) is Internet-accessible (whether or not it's restricted or fee-based), e.g., a Web page, an image in a Web page, or a document on a gopher or FTP server.

"Off-line" is used to mean not Internet-accessible; neither the original nor a surrogate can be accessed over the Internet, e.g., a bound volume, a statue, or a document on a CD-ROM or in a non-networked local system.

There are four basic categories of resources:

1. On-line original—"published" in electronic form and available via the Internet, e.g., an ejournal, a Web page.

2. Off-line original—first manifestation that is not Internet-accessible, e.g., a bound volume, a painting, an artifact.

3. On-line surrogate—a version of the original that is Internet-accessible, e.g., a digital image, a digital version of a print publication.

4. Off-line surrogate—a version of the original that is not Internet-accessible, e.g., on a CD-ROM or non-networked a nonnetworked local system, or not in electronic form, like a photograph or a microfilm.

Categories #1 and #3 are addressed to some extent by the mainstream DC efforts. Categories #2 and #4 are the primary focus of these guidelines.

III. Operating guidelines and related proposals

1. Populate each of the Dublin Core elements, with values that describe the intellectual content of the original resource.

Proposal: Accept the proposed element definition changes to loosen the definitions of the DATE and PUBLISHER elements, necessary to accommodate this generic use.

2. Additionally and optionally, provide information about a surrogate, if one is available. To indicate that a surrogate is available to the researcher, supply an additional set of elements with values that relate to the surrogate and provide information necessary to access it (CREATOR, PUBLISHER, DATE, TYPE, IDENTIFIER, etc., as appropriate).

Proposal: Concoct a way to start and end separate sets of DC elements. While some systems may make use of the distinction between the separate sets, it is not a requirement for search engines to distinguish between them. The separation will be helpful to the creators and users of the metadata.

An alternative approach frequently brought up during discussions is to describe the surrogate and repeat elements describing the original as qualifiers to the source element (e.g., DC.source.title, DC.source.creator, etc.)

Proposal: Add appropriate terms to the registry of Resource Type values to describe surrogates and off-line resources either as an alternate list or by adding to an existing list. Crosswalks may accommodate most of these needs. [see also #8 below]

3. Provide element values in a form users are likely to use in their queries. For instance, while it is possible to provide Dewey Decimal Classification values for the subject element value, most researchers will not use DDC numbers in their queries, so the use of DDC numbers assumes an unlikely feature of the indexing or the search engine.

4. Ensure that the values are meaningful without requiring interpretation of any qualifiers. Though it is imaginable that schemes and thesauri could be automated to expand, normalize, or refine queries, it is not immediately likely.

If there is a more detailed description of the document, refer to it rather than attempting to represent it in the Dublin Core metadata.

5. Some result listings will take the user to the pertinent Web page, while others might describe and point to environments which, although promising, might require more specialized knowledge on the part of the user in order to reveal their full potential. Provide information about non-Web resources in the Source element. For instance, the SOURCE element of Dublin Core metadata describing an OPAC should point to the Z39.50 gateway or the SOURCE element of Dublin Core metadata describing a journal not available on the Web should point to the publisher's Web page or should supply the contact information for getting access to the journal.

Proposal: Ensure that the strategic application use of the Source element allows for this use.

Proposal: Encourage clear guidelines for the inclusion of URLs and other "pointers" in the appropriate elements, so that they can be taken advantage of in software and by users.

6. There are multiple levels of information that might be described using the Dublin Core elements:

- A description of a collection.
- A general description of the contents of the collection.
- A finding aid for the collection.
- A database of item-level descriptions.
- A description of an item in the collection (where one might extract Dublin Core values, but point to the fuller record).
- An item in the collection.
- An institutional Web site.
- A Web page that describes the institution's collections.
- A Web page describing a collection.
- A Web "publication" that includes items or surrogates.
- A Web page for an item from the collection.

If descriptions of information at various levels are queried, one receives a result set that might contain listings for specific Web pages, high-level Web sites, databases, and free-text "hits". In order to allow the results to be weighted accordingly, assign an appropriate Resource Type value.

Proposal: A number of Resource Type values should be suggested to cover levels of resources other than item-level (e.g., database, collection, group). These can be added to an existing registry or suggested as an alternate list. Crosswalks may accommodate most of these needs.

7. To indicate when one resource being described is contained by another (a description of a collection versus a description of an item from that collection), use the RELATION element.

Proposal: Ensure the RELATION use can accommodate this need.

8. To be useful for retrieval in the context of mainstream use of the DC (e.g., so searchers can easily filter out results that are not Internet-accessible), we may wish only to indicate exceptions, as follows:

a. Indicate when neither an original nor a surrogate is Internet-accessible.
b. Indicate when only a surrogate is being described.

Doing so will allow for a specific search for something that is:

a. On-line—the user can omit "records" with an "off-line" value.
b. Original—the user can omit "records" with a "surrogate" value.

Proposal: Provide guidance for these two instances and encourage retrieval systems to act upon them.

IV. Summary

These guidelines, as should be obvious to the reader, are conceptual and not implementable in their current form. The proposed actions, developments in the strategic Web application, the RDF implementation, and more detailed recommendations and demonstrations in each of the cultural heritage communities will move new applications forward. These guidelines and related proposals are intended to support additional applications of the Dublin Core with no detrimental impact on the strategic Web application. This generic application could be pursued in isolation from those discussing the strategic Web application, but if the two are coordinated, efforts will be maximized, tools developed for one can be used for the other, and (most importantly) we can help researchers find information regardless of its location or means of access.

We are a worldwide library cooperative, owned, governed and sustained by members since 1967. Our public purpose is a statement of commitment to each other—that we will work together to improve access to the information held in libraries around the globe, and find ways to reduce costs for libraries through collaboration.