Stuart Weibel Interviews Tim Berners-Lee
July 29, 2003

This interview with Tim Berners-Lee, Director of the World Wide Web Consortium (W3C) was conducted by OCLC Researcher Stuart Weibel. Tim agreed to discuss his perspectives on major trends in the information landscape and their impact on use and access to public information. This interview was conducted in support of the OCLC environmental scan of the Library and Information communities, developed for strategic planning purposes for OCLC and its member libraries.

The 2003 OCLC Environmental Scan will be available to the OCLC membership by the end of this year.

Thanks for agreeing to this interview, Tim. The Web is a strategic technology for libraries, and your views on its development are important to us as we try to understand how librarianship changes in the digital era, and how the public need for information access and use evolves.

Can you comment on some of the trends in the information landscape that are important for people and their relationship to public information access, educational institutions, and libraries?

Two of the major issues that concern me are the evolution of technical standards and intellectual property rights.

Technical standards are challenging and difficult, especially when competing commercial interests of the companies color their evolution. Entrepreneurial creativity can be overshadowed by patent claims, and the threat of patent claims around foundational technical standards threaten the future of that infrastructure,and all the new markets which depend on it.

And just as patents make it harder to share a common infrastructure, over-zealous protection of intellectual property rights (IPR) can make it hard to share and preserve the ideas of our culture. Copyright and IPR systems protect the rights of creators, but they can also threaten public access and dissemination of important ideas. Plays cannot be performed, images cannot be seen, music performed, and the archiving of important cultural artifacts may be at risk. Copyright law should recognize such risks and confer special rights of archiving and preservation to libraries and archives, including both content and assignment and maintenance of persistent identifiers.

Much of your recent efforts focus on the Semantic Web activity. Can you explain the importance of this work and where you think it will lead?

The Semantic Web is a key to realizing the potential of the Web. The Web is a place for machine-processsable data as well as human-readable documents. The Semantic Web is about promoting languages for exchanging data and describing its meaning. Interoperability is enabled within a community which uses a shared ontolgy of terms.
Ontologies grow and evolve. The general standard for data is the XML-based Resource Description Framework, RDF, and the upcoming Web Ontology Language, OWL. (OWL provides the standard for the exchange of ontologies. Future standards will address query, rules and trust languages.)

The library community has historically seen the Semantic Web as primarily about metadata. While that is important, it is only one aspect of the larger picture. There is financial data, chemical data, biotechnology data, experimental data, geographic data and more. All of these domains have their own vocabularies, with few explicit points of connection. The Semantic Web is aimed at bridging those gaps, and allowing links across fields. Libraries have long understood the importance of established vocabularies, and have led in their development. To the extent that data can be encoded in common syntaxes like RDF and described with public vocabularies, they can be more accessible and more useful. People and applications can draw better correlations, better connections, better inferencing, and these can lead us to more effective use of information.

People and organizations need to adopt these common vocabularies and encoding standards for this sort of integration to occur. What incentives are necessary to encourage wider adoption of such standards?

The benefit of the Web is proportional to the number of connections - links -- to related information. Just as the Web evolved rapidly as people recognized this and acted independently to launch Web servers, this same network effect will bring people together around common semantic standards as they come to realize the enhanced value of their data and information in the context of a truly Semantic Web.

There is no business case for individual conformance in this early phase. The benefit comes from a community of conformance. Some organizations get on board because they have the longer vision, some because the langauges and tools meet a short-term need. When there is a large amount of data out there on the web, then the benefit will be immediately apparent. To understand the benefits of the semantic web way of working often requires time and experience to learn the lessons of isolation, but when the realization of the benefits of semantic web languages arrive, it is very convincing.

Governments have been aware from first-hand experience of the need to handle large quantities of data which have varying levels of structure. Government research funds both in the US and in Europe have helped advance the development and use of ontology systems such as DAML+OIL. The W3C is a fundamental part of the technology transfer strategy for these projects. The technology is moving progressively from the research area through standardizatyion into wide commercial deployment.

Can you share your impressions of the developments in the area of Institutional Repositories such as the MIT-Hewlett Packard collaboration on DSpace? Are there impediments that must be overcome for such systems to play an important role in sustaining and promoting scholarly communication?

Projects such as SIMILE which will leverage and extend DSpace by enhancing its support for arbitrary schemas and metadata though the application of Semantic Web technologies are particularly important in facilitating scholarly communication. As I mentioned earlier, to the extent that data can be encoded in common syntaxes like RDF and described with public vocabularies, they can be more accessible and more useful. People and applications can draw better correlations, better connections, better inferencing, and these can lead us to more effective use of information. Perhaps nowhere in the academic environment is this more important than the area of scholarly communication.

The impediments to success are much the same as impediments in the larger Web. We currently lack an ethos for reliable web publication. We need a closer connection between the technology and the institutional commitments necessary to maintain persistent identifiers and namespaces. We need a realignment of legal constraints and recognition of fair use within the context of the new digital infrastructure. We need to avoid as far as possible the constraints of patents or monopoly at any of the layers of the infrastructure.

We also need to sustain the open connectivity - the linking among people, organizations, data, and ideas - that drive the growth and diversity of the Web. We need to build all of this on a foundation of solid, clean Web standards that will be of universal benefit - for scholarship, for commerce, and for public and private information spaces.

Thank you very much, Tim, for sharing these ideas. They will be of great help to us as we endeavor to understand the relationship of OCLC, libraries, and the larger information landscape.

