Metasearch Survey Among RLG Members
Objective: This survey had
three goals: To determine how RLG members are using metasearching; to
learn more about their expectations for these searches; to use this
information to help make RLG databases good metasearch targets.
Overview: In May-June 2005
we surveyed a representative cross-section of RLG member institutions.
Most respondents were enthusiastic about metasearch. Although their
definitions of it varied, most tended to focus on undergraduate
students, use of a simple search box, and full-text resources in the
results. (Metasearches allow users to search across multiple catalogs,
search engines, and commercial databases. Frequently, these searches
merge and de-duplicate results and unify access to a variety of
information resources.)
The survey report includes:
- What do institutions and users expect from
metasearch?
- How are these members implementing metasearch now?
- Do current implementations meet expectations?
- Can we quantify the likely spread of various tools
for types
of information?
- What might make existing metasearch implementations
moot?
Survey report
In May and June 2005, RLG conducted an informal survey
in order to learn more about our members' expectations and experiences
of metasearch. RLG staff conducted guided discussions with ten
institutions drawn from RLG's members. The selected respondents
represented a mix of institutional types, metasearch tools, and
services targeted. At the time, five described their metasearch
implementations as in production, and five described them as tests;
since then, one has moved from test to production.
This sample provides only a limited basis for
generalization. (MetaLib may be underrepresented relative to its
adoption by our members and customers. In addition we did not talk with
faculty, graduate, or undergraduate students, the intended
beneficiaries of metasearch.) Nevertheless, these discussions did give
us a valuable view of members' concerns, and will help guide our
choices about how to design and develop RLG services. We hope they will
be of interest to others.
We set out to answer these questions:
- What are institutions and users expecting from a
metasearch capability?
- How are they implementing it now?
- Are the current implementations fulfilling their
expectations?
- Can we quantify the likely spread of various tools
within our user base by users of different types of
service—bibliographic, citation, digital?
- Are there other ways of providing the same
functionality or new ways of researching that might make existing
metasearch implementations moot?
Findings
Most of our respondents were very enthusiastic about
metasearch, although they had various ideas about what it is. Most of
their definitions had in common a focus on undergraduate students, use
of a simple search box, and full-text resources in the
results. At the same time, the present level of satisfaction is low,
success measures are not yet clear, and implementers are glad to look
to vendors and others to define strategies and goals—few felt
able to be very active participants in that process. Interest on the
part of librarians, administrators, or system vendors could dissipate
if results aren't more satisfactory. Interest on the part of
undergraduate students may be difficult to attract if students have
more familiar alternatives they perceive as adequate. We think the
future of these metasearch efforts is still uncertain.
From this survey, we concluded:
- Undergraduates using metasearch tools will get value
from RLG citation resources. RLG should work with leading vendors of
metasearch tools to make them convenient targets. What is critical for
targeted services is ability to provide fast-enough keyword searching.
- Customers who are implementing metasearch tools need
support from target systems such as RLG.
- The RLG Union Catalog is a lower priority. We didn't
hear it cited as a desirable target, which surprised us. Institutions
would rather target their own OPAC (online public access catalog).
- Making RLG Archival Resources a good target is a low
priority because its audience—advanced researchers,
genealogists—isn't a good fit with the current audience for
metasearch efforts. This resource is better exposed through search
engines and genealogy sites.
- Interoperability with other image collections, rather
than metasearching per se, will be most useful
for image aggregations. This, in addition to exposing images to Web
search engines (through services like Trove.net™), is what
will serve image searchers now.
- Deduplication and ranking were not among the most
important considerations for respondents.
Some surprises
- A simplified interface was mentioned more often as a
goal than merged results or a single search against multiple resources.
- The most important criterion in tool selection was an
established relationship with the vendor.
- Despite their enthusiasm, most respondents indicated
they had limited time and attention to invest in either shaping or
appraising metasearch efforts at their institutions; expediency
mattered more than standards.
- It's about full text. Citations are a step along the
way, not the destination.
- No respondents regarded services like Google as the
way of reaching their metasearch goal, despite the fact that most said
their own users see them that way. Students are moving to search
engines to get metasearching done. Can librarians change that behavior?
Can they offer an environment students regard as better?
What are
institutions and users expecting from a metasearch capability?
Undergraduates were identified as the principal audience
for metasearch by most respondents (8/10). "We really see this as a
tool to help students get started finding stuff—it's not a
tool for advanced research." Two mentioned graduate students. A few
mentioned that faculty or librarians need metasearch in order to
discover relevant resources. There were suggestions from some
respondents that more advanced users need more advanced interfaces. Our
other studies suggest to us that faculty too may need ways to discover
what licensed resources are available to them; however, that wasn't
ordinarily viewed as a purpose of metasearch.
A simplified user interface was the goal most often
identified by respondents (7/10). Half mentioned promotion of
lesser-used resources. Somewhat surprisingly, fewer than half (4/10)
mentioned as their goal merged results, or a single search, against
multiple resources ("all scholarly resources" or "image and other
databases" or "local and licensed resources").
Nearly all respondents (9/10) mentioned keyword as the
preferred form of search. ("That's what metasearch is.") The tenth
respondent said that the preferred form of search depends on the
target. No respondents identified parametric searching by title,
author, or subject as important, although one lamented, "we're missing
an opportunity to teach students about using advanced interfaces."
More than half of respondents (6/10) expect users to
stay in the metasearch tool environment, rather than seeing the target
system. The question of what they meant by "target interface" needs
further investigation. Is it an interface (like RLG's Eureka®)
that a metadata vendor provides for searching specific sets of
metadata? Our respondents might have thought so. But is that what a
student thinks of as the "target?" The only target worth going to might
be the interface for data—that is, full text—rather
than for metadata.
Few respondents (2/10) mentioned ranking as a factor in
their selection of a metasearch product. None mentioned the ranking
algorithm of the target system as a factor in selecting targets. Few
respondents (3/10) mentioned deduplication as an important feature in a
metasearch product. Of these, one felt successful deduplication was
unlikely in the near term. Another felt deduplication would depend on
local loading of data. Poor handling of ranking or deduplication by
metasearch products is not an important failing from a customer point
of view.
How are
institutions implementing metasearch now?
MetaLib from Ex Libris claims dominance among
respondents belonging to the Association of Research Libraries, but
there was some level of dissatisfaction with all metasearch vendors.
Still, most respondents (6/10) said they'd work with their current
vendor if the current implementation doesn't meet expectations. Two
said they'd select a new tool, two said they'd try a new approach, and
two said they'd develop a tool on their own, though one of those
described this as the "worst-case scenario." A few of our respondents
see themselves as partners in development efforts with vendors. Most,
however, didn't seem to feel they could devote many resources to such
efforts.
Some respondents (4/10) preferred Z39.50 among search
protocols. A larger number (5/10) stated that it doesn't matter to them
because the tool vendor "takes care of everything." Another said that
aggregators ought to supply gateways/connectors. In addition to Z39.50,
two other protocols were mentioned: one respondent mentioned SRU/SRW,
and one mentioned HTTP. None identified NISO working group
recommendations as something what would affect their decisions
regarding implementation.
More than half of respondents (6/10) felt that the
record format doesn't matter to them, and that mapping is the tool
vendor's responsibility. Two mentioned Dublin Core. One mentioned MARC.
Among other ways of reaching the metasearch goal, four
mentioned leaving it to vendors to innovate. An equal number mentioned
aggregating metadata locally. (Another characterized this approach as
impossible.) Only two mentioned looking beyond the library sector for
solutions. Asked if their users saw other ways of reaching the
metasearch goal, however, six mentioned Google Scholar, three mentioned
Google, and two mentioned Google Print.
The process for tool selection involved systems staff or
committees without library representation(6/10) slightly more often
than library staff (5/10). Most respondents (8/10) understood
metasearch as a natural extension of their access objective. The most
important criterion in tool selection was an established relationship
with the vendor (3 respondents). Adherence to standards was an
important criterion in vendor selection for only one respondent. Only
one respondent mentioned student demand as a driver for metasearch
initiatives as a whole, and only one mentioned student demand as a
factor in target selection.
Can we assess the
spread of various tools for different types of service?
Can we quantify the likely spread of various tools
within RLG's user base by users of different types of
service—bibliographic, citation, digital?
Nearly all respondents (9/10) are targeting citation
databases. The one that isn't is focusing on full-text e-journals. Half
are targeting bibliographic databases. One that isn't initially
targeted both the RLG Union Catalog and OCLC's WorldCat, but found
students were confused by the results—books mixed with
articles, and virtually no full text. Half are targeting image
databases. Most of those (3/5) target local rather than third-party
image databases. Since ARTstor content can't be federated, federation
of other image resources may become a lower priority for them. Lack of
support for thumbnail display—critical for image resource
discovery—was also mentioned as a limitation of MetaLib and
other federated search tools.
Some respondents (3/10) resisted our
citation/bibliographic/image classification, and specifically mentioned
full text (e-books and full text electronic journals).
Most respondents had organized their metasearch efforts
around disciplines, rather than making them cross-disciplinary or
comprehensive. They reported that this was not because they saw this as
preferable, but because they saw it as expedient: the relevant sets of
resources were already identified. Some (generally smaller
institutions) had not yet been selective at all.
Most felt selection would change, though they felt it
was too soon to say how. One said that if they had it to do all over
again, they'd include fewer targets and focus more on the needs of
undergraduates. All respondents identified subject specialists or
selectors as the people responsible for selecting targets for
metasearch. Three equated user demand with what librarians demand. None
mentioned students or faculty.
Half of the respondents mentioned site licenses or
unlimited searching as important factors in selecting targets. One
asserted that limits based on license restrictions (simultaneous users,
for example) are already having a serious impact on those wishing to
search directly in a particular target system interface.
Are the current
implementations fulfilling expectations?
Only three respondents were satisfied with their current
implementation. Four said they were dissatisfied (though one of those
was hopeful) and four thought it was too early to say.
The success measure most often mentioned was a shift in
traffic from direct access (4/10). Half the respondents mentioned as
vendor-supplied statistics that matter to them both a report of
searches and a report of sessions in the target interface. Half were
not yet looking at statistics. One noted that they'll need to weigh
searches on preconfigured profiles differently from searches based on
intentional database selection by users.
Everyone is still struggling with the definition of
success and how to measure it. It may be that the important measure of
success will not be a shift in how citation metadata is used, but an
increase in discovery and use of full-text resources.
Other ways of
providing the same functionality?
Are there other ways of providing the same functionality
or new ways of researching that might make existing metasearch
implementations moot?
If metasearch efforts seek to provide a comprehensive,
general-purpose tool for undergraduate or beginning research, such
local efforts may be in a contest with emerging services such as Google
Scholar and Google Print. (If these can provide access to local
licensed resources—which is already happening through
Google's use of OpenURL—and if more abstracting and indexing
data is made available to Google.) This would be a difficult contest
for library metasearch systems to win, either in terms of inclusiveness
or in terms of visibility.
This difficulty did not seem to be very much on the
minds of librarians we spoke to: they planned further steps down the
road they're on. But the unevenness of this contest may lead to a
change in direction from the administrators who drive many metasearch
efforts, or from the tool vendors who shape them. Alternatively, if
metasearch efforts end up differentiating themselves from Google
Scholar by focusing on various specific disciplines or specific
audiences, then these efforts would be a good vehicle for exposing
resources like RLG's to the researchers who need them.
All of these conversations were instructive for us. We
thank all our
respondents for their time and candor.
Member experiences
How some RLG members have applied metasearch:
[link to "/en/page.php?Page_ID=7721#metasearch">California
Digital Library]
(RLG TopShelf, March 2005)
[link to "/en/page.php?Page_ID=20740#article1"] National Library of
Australia (RLG Focus, August 2005)
[link to "/en/page.php?Page_ID=20740#article1"] National Library of
Australia (RLG Focus, August 2005)
What about Google?
James Michalko, [link to "/en/page.php?Page_ID=366"]
"Welcome the
elephant to the room" (RLG TopShelf, December 2004)
Péter Jacso, "Google Scholar (Redux)"
(Péter's Digital Reference Shelf, June 2005)
Gary Price, "Google Scholar" (ResourceShelf,
May 2005)
|