Proposed Guidelines for Extending the Use of Dublin Core
[The following guidelines were prepared by a working
group that assembled following the RLG metadata summit. They were
intended to build on and supplement the implementation and guidelines
developed through the series of Dublin Core discussions and meetings,
and were submitted to the larger Dublin Core community at the DC5
meeting in Helsinki in early October 1997.]
Introduction
Many institutions and organizations provide access to a
variety of information resources, including:
- Prints and photographs
- Built environment
- Art and museum objects
- Books, journals, and other publications
- Images of any of the above
- Journal citation indexes
- OPACs or other library catalogs
- Descriptions of art and museum objects
- Full texts of documents
- Finding aids for archival collections
- High-level descriptions of databases
- Descriptions of collections or resources
- Web pages presenting items from collections
- Web pages about institutions or organizations
Some of this information is Web-based (embodied in HTML
pages) and some is not. Some is accessible via the Web, but does not
itself reside in static Web pages. Although there have been repeated
attempts to ensure that the Dublin Core does not exclude use for all of
these types of resources, especially nonelectronic resources, previous
Dublin Core work addresses, for the most part, Web-based information.
This application is intended for library, archive, and
museum communities concerned with making accessible all kinds of
information. A vast amount of the information in their care is in
analog form and, while the original may be reformatted to create a
digital surrogate, in many cases, there is no networkable form.
These communities have in place effective standards and
proven practices for providing description of and access to various
types of resources. It is not likely that they will stop using these
proven approaches. It is also not likely that those employing mature
practices developed for specific researchers or specific resources will
be content to simplify their data for a lowest common denominator
approach.
Because the Dublin Core elements are the product of the
consensus of a vast array of representatives from many disciplines,
nations, and communities and because they are accepted as a good
approach to helping researchers find information, they should be useful
in many ways. Two minor changes in element definitions (DATE and
PUBLISHER) were proposed to the larger Dublin Core group and were
accepted. Those changes make it possible to describe original materials
from which surrogates have been derived.
The issue at hand is: Can the Dublin Core elements be
used to integrate access to a broad range of resources? Can a simple
set of descriptive elements help researchers discover resources that
they then can investigate using the appropriate indigenous tools and
methods? Can the Dublin Core be used to bring together the separate
worlds of finding things on the Web and finding things in traditional
ways? The guidelines for generic application of the Dublin Core address
these questions.
The Research Libraries Group (RLG) hosted a "metadata
summit" to assemble individuals from a variety of organizations
involved in describing information resources to begin answering those
questions. A subset of the summit attendees volunteered to serve on a
working group to draft recommendations for extending the use of the
Dublin Core elements. The group attempted to accommodate the needs of
libraries, archives, museums, and information providers, such as RLG in
creating general principles that open doors to other applications. This
document emerged from that process and was endorsed by most of the
members. These individual communities will of course wish to
investigate further, demonstrate implementation, and make more detailed
guidelines to suit the special requirements of the indigenous resources
and users.
Guidelines for
extending the use of Dublin Core elements to create a generic
application integrating all kinds of information resources
I. Extending
the Dublin Core beyond Web-based resources
For those in libraries, museums, and archives hoping to
use the Dublin Core elements to describe a variety of information
resources—Internet-accessible and not—the
implementation will be one of two main approaches:
A) Use the elements only to create indexes that may be
accessed via the Web, though they are not "on the Web." This is the
lingua franca use, wherein syntax is largely irrelevant. A brief
discussion of this use follows.
B) Create Web pages that describe non-Web resources,
either with a one-to-one relationship, or perhaps more likely, a
one-to-many relationship. The bulk of this proposal focuses on the
second approach.
I.A. Lingua franca
use of the Dublin Core Elements
It is likely that non-Web-based resources will not be
described "using" the Dublin Core elements, but rather that their
native descriptions would be mapped conceptually to the Dublin Core.
These mappings from one approach to another are often referred to as
"crosswalks."
One could use a variety of crosswalks to develop a
system that provides seemingly seamless, iterative searching of related
indexes of different types of data (OPACs, HTML pages, SGML-encoded
full texts, etc.).
Alternatively, one could use the crosswalks to create
common indexes of disparate information, e.g., for the Dublin Core
index called IDENTIFIER, include the contents of the USMARC fields 010
(LC Control Number), 020 (ISBN), 022 (ISSN), 024 (Other Standard
Identifier), 856$u (Uniform Resource Locator), and the EAD tag <EADID>,
and the CIMI tag .... and so on.
Another approach would be to use the crosswalks to
convert all the different data formats to uniform records using just
the Dublin Core elements. But in many cases, the native records will be
more extensive and one would not want to lose that information, so the
Dublin Core record might "point to" the fuller record, just as embedded
Dublin Core data in HTML META tags points to the fuller information in
the body of the Web page itself.
Proposal: There should be
one or more registries of crosswalks to and from other metadata
approaches and the Dublin Core. The Library of Congress, OCLC, UKOLN,
and perhaps others are beginning to assemble such registries.
I.B. Bringing
non-Web resources to the Web via Dublin Core metadata
To go beyond the crosswalk approach, one might create
Web pages for non-Web resources and assign Dublin Core HTML META tags
(or use the RDF implementation), thereby virtually bringing all the
non-Web resources into the world of the Web, within the reach of Web
search indexes, and to the attention of those who search the Web. This
may be done at the item-level, but more likely would be done at a
group- or collection-level. These Web pages can be mounted by the owner
or provider of the information and made available in a similar manner
to that of their Web-based resources.
The remainder of these guidelines focuses on this
approach and has as goals:
1. Taking advantage of the good work and consensus the
Dublin Core represents to propose a new application.
2. Not overburdening the Dublin Core with metadata
beyond that necessary to help researchers discover resources.
3. Ensuring the interoperability of the strategic Web
application and this more generic application of the Dublin Core.
II. Basic principle
and related definitions
In the strategic Web application of the Dublin Core
elements, for the most part, it is assumed the documents are
Internet-accessible original documents. The basic distinction between
the strategic application and the generic application under discussion
is that in the generic application it is important to indicate:
1. If a document is the original or not.
2. If a document is Internet-accessible or not.
The nature of the document is defined here as either:
"Original," which is used to mean the first
manifestation, e.g., a journal published only on the Web or a painting
in a museum.
"Surrogate," which is used to mean a version that stands
for an original—not necessarily an exact reproduction. A copy
on microfilm is an example of a surrogate. This use of the word
includes lesser versions (e.g., thumbnail images) and reformatted
versions (e.g., a digital audio version of an analog recording), but
not a part of the whole (e.g., a detail from a photograph).
The means of accessing the document are defined from the
Internet researcher point of view:
"On-line" is used to mean that the document (in original
or surrogate form) is Internet-accessible (whether or not it's
restricted or fee-based), e.g., a Web page, an image in a Web page, or
a document on a gopher or FTP server.
"Off-line" is used to mean not Internet-accessible;
neither the original nor a surrogate can be accessed over the Internet,
e.g., a bound volume, a statue, or a document on a CD-ROM or in a
non-networked local system.
There are four basic categories of resources:
1. On-line original—"published" in electronic
form and available via the Internet, e.g., an ejournal, a Web page.
2. Off-line original—first manifestation that
is not Internet-accessible, e.g., a bound volume, a painting, an
artifact.
3. On-line surrogate—a version of the original
that is Internet-accessible, e.g., a digital image, a digital version
of a print publication.
4. Off-line surrogate—a version of the
original that is not Internet-accessible, e.g., on a CD-ROM or
non-networked a nonnetworked local system, or not in electronic form,
like a photograph or a microfilm.
Categories #1 and #3 are addressed to some extent by the
mainstream DC efforts. Categories #2 and #4 are the primary focus of
these guidelines.
III. Operating
guidelines and related proposals
1. Populate each of the Dublin Core elements, with
values that describe the intellectual content of the original resource.
Proposal: Accept the
proposed element definition changes to loosen the definitions of the
DATE and PUBLISHER elements, necessary to accommodate this generic use.
2. Additionally and optionally, provide information
about a surrogate, if one is available. To indicate that a surrogate is
available to the researcher, supply an additional set of elements with
values that relate to the surrogate and provide information necessary
to access it (CREATOR, PUBLISHER, DATE, TYPE, IDENTIFIER, etc., as
appropriate).
Proposal: Concoct a way to
start and end separate sets of DC elements. While some systems may make
use of the distinction between the separate sets, it is not a
requirement for search engines to distinguish between them. The
separation will be helpful to the creators and users of the metadata.
An alternative approach frequently brought up during
discussions is to describe the surrogate and repeat elements describing
the original as qualifiers to the source element (e.g.,
DC.source.title, DC.source.creator, etc.)
Proposal: Add appropriate
terms to the registry of Resource Type values to describe surrogates
and off-line resources either as an alternate list or by adding to an
existing list. Crosswalks may accommodate most of these needs. [see
also #8 below]
3. Provide element values in a form users are likely to
use in their queries. For instance, while it is possible to provide
Dewey Decimal Classification values for the subject element value, most
researchers will not use DDC numbers in their queries, so the use of
DDC numbers assumes an unlikely feature of the indexing or the search
engine.
4. Ensure that the values are meaningful without
requiring interpretation of any qualifiers. Though it is imaginable
that schemes and thesauri could be automated to expand, normalize, or
refine queries, it is not immediately likely.
If there is a more detailed description of the document,
refer to it rather than attempting to represent it in the Dublin Core
metadata.
5. Some result listings will take the user to the
pertinent Web page, while others might describe and point to
environments which, although promising, might require more specialized
knowledge on the part of the user in order to reveal their full
potential. Provide information about non-Web resources in the Source
element. For instance, the SOURCE element of Dublin Core metadata
describing an OPAC should point to the Z39.50 gateway or the SOURCE
element of Dublin Core metadata describing a journal not available on
the Web should point to the publisher's Web page or should supply the
contact information for getting access to the journal.
Proposal: Ensure that the
strategic application use of the Source element allows for this use.
Proposal: Encourage clear
guidelines for the inclusion of URLs and other "pointers" in the
appropriate elements, so that they can be taken advantage of in
software and by users.
6. There are multiple levels of information that might
be described using the Dublin Core elements:
- A description of a collection.
- A general description of the contents of the collection.
- A finding aid for the collection.
- A database of item-level descriptions.
- A description of an item in the collection (where one might extract
Dublin Core values, but point to the fuller record).
- An item in the collection.
- An institutional Web site.
- A Web page that describes the institution's collections.
- A Web page describing a collection.
- A Web "publication" that includes items or surrogates.
- A Web page for an item from the collection.
If descriptions of information at various levels are
queried, one receives a result set that might contain listings for
specific Web pages, high-level Web sites, databases, and free-text
"hits". In order to allow the results to be weighted accordingly,
assign an appropriate Resource Type value.
Proposal: A number of
Resource Type values should be suggested to cover levels of resources
other than item-level (e.g., database, collection, group). These can be
added to an existing registry or suggested as an alternate list.
Crosswalks may accommodate most of these needs.
7. To indicate when one resource being described is
contained by another (a description of a collection versus a
description of an item from that collection), use the RELATION element.
Proposal: Ensure the
RELATION use can accommodate this need.
8. To be useful for retrieval in the context of
mainstream use of the DC (e.g., so searchers can easily filter out
results that are not Internet-accessible), we may wish only to indicate
exceptions, as follows:
a. Indicate when neither an original nor a surrogate is
Internet-accessible.
b. Indicate when only a surrogate is being described.
Doing so will allow for a specific search for something
that is:
a. On-line—the user can omit "records" with an
"off-line" value.
b. Original—the user can omit "records" with a "surrogate"
value.
Proposal: Provide guidance
for these two instances and encourage retrieval systems to act upon
them.
IV. Summary
These guidelines, as should be obvious to the reader,
are conceptual and not implementable in their current form. The
proposed actions, developments in the strategic Web application, the
RDF implementation, and more detailed recommendations and
demonstrations in each of the cultural heritage communities will move
new applications forward. These guidelines and related proposals are
intended to support additional applications of the Dublin Core with no
detrimental impact on the strategic Web application. This generic
application could be pursued in isolation from those discussing the
strategic Web application, but if the two are coordinated, efforts will
be maximized, tools developed for one can be used for the other, and
(most importantly) we can help researchers find information regardless
of its location or means of access.
|