Web Characterization

Notice: The Web Characterization Project is on hiatus. No statistics for 2003 were released.

For a five-year overview of the public Web, please see our article in D-Lib Magazine, " Trends in the Evolution of the Public Web: 1998 - 2002".

This archived project page has not been updated since April 2003.

The Web Characterization Project conducts an annual Web sample to analyze trends in the size and content of the Web. Analysis based on the sample is publicly available. The sample is obtained by creating a list of randomly generated IPv4 addresses, and then attempting to connect to Port 80 at each address to identify the presence of public Web services. If an HTTP service is identified, harvesting software captures the site and stores it for future analysis. The scope of the sample is confined to publicly available Web content only. If our sampling interferes with your network operations, or if you have questions about our activity, please contact us at one of the addresses below.

Statistics (1998 - 2002)

The statistics reported on this site are part of an ongoing research effort. They may be revised or updated at any time.

Statistics categories:

Research team

  • Ed O'Neill (Consulting Research Scientist)
  • Brian Lavoie (Research Scientist)
  • Rick Bennett (Consulting Software Engineer)
  • Anya Dyer (Intern)
  • Sarah Worthington (Intern)