To identify the focus of our research project, it is useful to think of metadata translation at various levels of remove from the day-to-day activity of converting records. Of course, at the front line are the complex production systems that create or search large databases of bibliographic records. These systems can always be streamlined, enhanced, modernized or further automated. At the next level, the concern shifts to the abstract model or theory that can be developed as part of the effort to design the next-generation system. At a still further level of remove are the activities of the standards organizations that maintain metadata formats such as MARC, Dublin Core and the Learning Object Metadata Standard, among others, and are the ultimate source of the templates for the records in OCLCs databases. Our effort is concentrated at the second level. The goal of our research project is to develop a data model for metadata translation that would encourage standardization and reuse. In current production systems, too many record streams still require special handling. All too often, problems are diagnosed through human effort and software processes are discarded after a single use. To improve on the status quo, we have established a set of guidelines. A data model for metadata translation must keep track of semantic as well as structural differences between standards. For example, the MARC 260 field and the Dublin Core Publisher element differ not only in how they are encoded but also in what they describe. The MARC field, of course, is more detailed because it shows an explicit relation between the date and place of publication that is missing in the Dublin Core element. But for many uses, this approximate equivalence is good enough as long as we are aware of its limitation. If possible, use languages and software applications designed to manipulate structured text. This principle encourages a focus on the problem at hand and not the supporting machinery of a large application. XML is the current language of choice. Make sure the translation engine is modular. Our studies of production software at OCLC have shown that metadata translation is used in the initial conversion of thirdparty records, record export, database searching and in the formatting of special requests, such as interlibrary loans. A modular translation engine could be plugged into many processes. Wherever possible, use standards. It is tempting to develop ad-hoc variants of standards for local and immediate needs, such as the ubiquitous and every-changing MARC-ish, but these efforts are ultimately self-defeating. Participate in the discussion of errors, validation and enhancement. Methods of validation are well-developed for MARC but are only in the early stages for other metadata formats. As we develop research prototypes that realize these principles, the outcomes are both immediate and long-term. We have shared scripts, utility programs and machine-processible encodings of crosswalks with colleagues at OCLC and in the library community. We have made recommendations about the strengths and limitations of XML for processes that encode and track human-supplied intelligence. We have developed a data model for describing crosswalks and the associated files that make them machine-processible. These materials are available from a repository built entirely from open-source software and allow users to identify crosswalks that meet the needs of their data and automatically execute translations. When more crosswalks become available in this form, perhaps the most useful ones will achieve the status of standards. But to make significant progress, we need to engage communities of experts in the challenging task of defining fully formalized metadata schemas that can interoperate with existing standards. During the past six months, we have worked with colleagues from the Gateway to Educational Materials (GEM) Project to define a crosswalk between the GEM standard and MARC. We have made recommendations for standardizing and enhancing the controlled vocabulary of GEM records. The immediate goal of this collaboration is a record exchange. The GEM project, funded by the United States Department of Education, has a large collection of records that describe learning objects for primary and secondary schools. Once these records are mapped to MARC, they can be imported into WorldCat. In return, once MARC is mapped to the GEM standard, OCLC will be able to populate the GEM database with pointers to learning objects exported from WorldCat. In addition to the GEM project, we are assessing usage patterns of the Learning Object Metadata standard, which was proposed by the Institute of Electrical and Electronics Engineers (IEEE), and is the standard of choice for educational institutions throughout the world that manage elearning resources. Our involvement with e-learning metadata is a case study for the next phase of Extended WorldCat. Metadata translation is a key component in the architecture of a system that will improve the capacity of WorldCat to accept non-MARC metadata and make OCLCs services more responsive to the needs of a broad range of cultural heritage institutions. Progress on the Metadata Schema Transformations project can be monitored at: www.oclc.org/research/projects/ mswitch/1_schematrans.htm
|