This project is now closed. The information on this page is provided for historical purposes only. Links and downloads may no longer work.

Web Characterization

The Web Characterization Project conducts an annual Web sample to analyze trends in the size and content of the Web. Analysis based on the sample is publicly available. The sample is obtained by creating a list of randomly generated IPv4 addresses, and then attempting to connect to Port 80 at each address to identify the presence of public Web services. If an HTTP service is identified, harvesting software captures the site and stores it for future analysis. The scope of the sample is confined to publicly available Web content only. If our sampling interferes with your network operations, or if you have questions about our activity, please contact us at one of the addresses below.

Statistics (1998 - 2002)

The statistics reported on this site are part of an ongoing research effort. They may be revised or updated at any time.

Statistics categories:

  • Size and growth statistics
  • Country and language statistics
  • Linkage patterns
  • Miscellaneous statistics

Research team

  • Ed O'Neill (Consulting Research Scientist)
  • Brian Lavoie (Research Scientist)
  • Rick Bennett (Consulting Software Engineer)
  • Anya Dyer (Intern)
  • Sarah Worthington (Intern)