OCLC research project measures scope of the Web
DUBLIN, Ohio, Sept. 8, 1999--Researchers at OCLC have determined that the World Wide Web has about 3.6 million sites, of which 2.2 million are publicly accessible. They also found that the largest 25,000 sites represent about 50 percent of the Web's content, and that the number of sites and their size are climbing.
The project, conducted by the OCLC Office of Research, indicates that the World Wide Web has approximately 2.2 million Web sites that offer publicly accessible content. These sites contain nearly 300 million Web pages.
These results, obtained in June 1999 through OCLC's Web Characterization Project, also show that significant portions of the Web are not publicly accessible or do not offer meaningful content. About 400,000 Web sites can be considered "private," in that they do not offer content that is accessible without fee or prior authorization. In addition, about 1 million sites are "provisional"--either in a transitory or unfinished state (e.g., the ubiquitous "Under Construction" site) or have only content that, from a general perspective, is meaningless or trivial.
Project findings indicate that adult content claims a small proportion of the Web. About two percent of the public sites--42,000 of the 2.2 million--contain sexually explicit material.
The mean size of a public Web site is about 129 pages, a 13 percent increase over last year's estimate of 114 pages. The Web is dominated by a relatively small collection of "megasites"--the largest 25,000 sites contain about 50 percent of all pages on public sites.
The number of public Web sites has approximately tripled in the two-year period from June 1997 to June 1999, increasing from 800,000 to 2.2 million.
"The Web has achieved the status of being one of the foremost information resources available today," said Ed O'Neill, consulting research scientist and manager of the Web Characterization Project. "Because of the Web's importance to libraries around the world, OCLC is committed to providing timely information that will assist them in understanding the Web and using its content."
In addition to conducting independent Web research, project staff are also working with the World Wide Web Consortium's Web Characterization Activity, a cross-industry group committed to the goal of promoting the Web's evolution and ensuring its long-term interoperability and robustness.
Founded in 1978, the OCLC Office of Research is dedicated to research that explores the place of the library in the changing technology environment and develops tools that enhance the productivity of libraries and their users.