Much of the scholarly material on the Web is missed by harvesters. This includes metadata in OAI-PMH repositories, which DSpace uses. Google has several problems harvesting OAI repositories, which are different from standard Web pages.
The standard DSpace uses the Handle system (www.handle.net) for identifying items, which (purposely) mask the identity of the host, making harvesting difficult to schedule. The OAI protocol uses possibly non-persistent URLs to link pages of metadata. This also interferes with standard methods of harvesting.
OCLC Research is working with Google and MIT to periodically harvest interested DSpace users' metadata and transform it into a harvest-friendly format, resolve the handles so that institutions can be identified, and make the resulting URLs harvestable by search services such as Google.
- DSpace harvesting project
- Thom Hickey
- Jeff Young