OCLC RESEARCH
Get a grip...on identifiers
As the Web has become ubiquitous, the need for persistent identifiers has
exploded
BY STUART L. WEIBEL, Senior Research Scientist, OCLC Research
The topic of persistent identifiers is at once familiar and perplexing, seemingly
simple and yet, on close examination, confounding and contentious in many respects.
Libraries have dealt with a variety of such identifiers for many yearsISBNs,
ISSNs and OCLC numbers are all commonly understood and widely used. The Internet
brings us many more, and just as with the familiar ones, the new ones have their
own special characteristics.
Identifiers are becoming a fundamental component of the digital infrastructure
that pervades our lives. Think of identifiers as a handle
a convenient
handhold for every information asset, person, invoice, object and even concept.
Our computers have them,
we have them, our cars have them, and increasingly our dogs, cats and even cows
have them.
One of the challenges confronting us in the digital information services realm
is to better understand our market, how people want it to change, and deliver
those new services. Identifiers will be one of the key components of the infrastructure
necessary to accomplish this.
Important characteristics of identifiers
In the context of the Internet, the term identifier is often found in
association with the phrase globally unique, persistent identifiers.
The Web is itself built on globally unique identifiersURLs (Uniform Resource
Locators). The globally unique part is obvious and straightforward. It is the
great virtue of the Web that files can be flexibly located in what amounts to
a global file system whose naming elements are unique by virtue of the hierarchical
structure of the Domain Name System at its upper levels, and the natural prohibition
against duplicate file names at the local file system level.
 |
| Keeping track of the burgeoning bumper crop of identifiers
is not easy. In the public information arena, OCLC and libraries will play
a central role in this challenge in order to preserve and manage information
assets that are part of the fabric of society. |
Persistence, however, is another matter. The pace of change of technology makes
persistence challenging to achieve. The number 404 has attained a prominent
place in the jargon of popular technology because of its frequency as an error
number for page not found. The locator part of URL is rather too fragile
when we want our resources to be accessible for years and decades, not days
and weeks.
Does persistence mean forever? Not necessarily. The FedEx delivery identifier
need have a lifetime measured only in days. An identifier for a managed information
asset, however, is likely to have a useful life measured in centuries. Governments,
libraries and museums in particular are expected to preserve and manage information
assets that are part of the fabric of the societies we live in, and are the
natural homes for efforts to organize, preserve and provide access to our cultural
artifacts and memories.
As the Web has become ubiquitous, the need for persistent identifiers has exploded,
even to the extent of entering the public consciousness. Keeping track of the
burgeoning bumper crop of identifiers is not easy.In the public information
arena, OCLC and libraries will play a central role in this challenge. For this,
we will increasingly rely on registries and directories of various types to
assign, track, maintain and resolve the identifiers that are embedded in our
systems.
Staff in Research and elsewhere in OCLC are involved in a variety of activities
in the identifier arena:
-
OCLC has a proposal pending with ISO to develop, market and maintain
the International Standard Text Code (ISTC) registry, a globally unique
permanent identifier to assist in the management of text assets, both digital
and in the print realm. The ISTC will facilitate exchange of information
between collecting societies and rights administrators, authors, agents,
publishers, retailers, librarians and other interested parties.
-
OCLCs PURL service arose as a result of our involvement in
the long-laboring Uniform Resource Name (URN) working group in the Internet
Engineering Task Force. PURLs were developed as a demonstration that simple,
off-the-shelf technology could be brought to bear on the problem of maintaining
identifier alignment as actual file system locations changed. PURLs continue
to be a popular low-barrier technology for organizations managing namespaces
without incurring cost or overhead.
-
OpenURLs were developed to address the so-called appropriate
copy problem. In a diverse heterogeneous information environment, a
user needs to be directed to a copy of an information asset for which he
or she has authorized access. OpenURLs depend on a consistent identification
architecture that is independent of resolution. OCLC Research staff have
been involved in prototyping registration and management infrastructure
for OpenURLs.
-
The info URI Internet draft provides a missing bit of
Internet infrastructure that supports the separation of identity from resolution.
The development of info URIs was motivated by the need for an
identifier architecture in OpenURLs, but has broader applicability as well.
-
ERRoLs are constructed, dynamic URLs that resolve to metadata,
content and services related to items stored in a community of OAI repositories.
ERRoLs are constructed by concatenating the ERRoL prefix (e.g., http://errol.oclc.org/),
an OAIidentifier and a metadata prefix or service extension. ERRoL resolution
is generally accomplished by dynamically performing OAI-PMH requests to
the home repository and transforming the responses using XSLT style sheets
or HTTP redirects.
-
The VIAF activity (Virtual International Authority File) is a joint
project of Die Deutsche Bibliothek in Frankfurt, the Library of Congress
and OCLC to improve interoperability across national authority files. Language
variants, collation conventions and character set issues all contribute
to what librarians understand as the fog of naming. Persistent identifiers
will be an important aspect of reducing this problem.
-
Registration of Dublin Core terminology namespaces (and URI schemes
associated with them) is an essential part of making metadata modular, and
in supporting the need to reference legacy terminologies from within new
Internet standards such as the Dublin Core. It has become evident that the
need is more general, and that the topic of terminology identifiers is salient
to many related technologies and standards. Effective use of terminologies
on the Web is a fundamental requirement for realizing the potentials of
the Semantic Web, and identifiers are a foundation component of this technology.
Persistent identifiers represent an important strategic interest for OCLC and
its constituents, both as infrastructural elements that require thoughtful design
and management, and as part of the changing business environment of libraries.
As such, they play a key role in demonstrating one of the fundamental value
propositions of libraries: commitment to long-term access to the information
assets of society.
Windows or Web? Choice is a great thing
|