Parallel Web Server

Trace Distribution

Compressed, ASCII format (66.2MB)

Related Information

E. Katz, M. Butler, and R. McGrath. A Scalable HTTP server: The NCSA prototype. Computer Networks and ISDN systems, pages 240-249, November 1994.

Applications for Measurement and Benchmarking of I/O on Parallel Computers

This application uses a parallel web-server based on the round-robin DNS scheme described by Katz. Similar schemes are used by most busy commercial web sites. We used the Apache 1.2 server as the base web server which is replicated on the participating hosts. This application uses multiple processes per processor to implement multiple threads of control. Over the period of a day, it creates a large number of processes (about 2000), most of which terminate relatively soon. At any given time, there are no more than ten active processes.

Input Dataset

We used NASA Kennedy Space Center's httpd logs for August 1995 to create the document hierarchy as well as to drive the application. To account for the explosive growth in web accesses since 1995, we collapsed the request stream for the entire month to a single day - taking care to preserve the time-of-day variations. That is, we merged the 4 days worth of data into a single day, preserving the timestamp of each request. The size of the dataset served was 524~MB which is stored in 13,457 files.

Workload

There were seven participating hosts in the experiment. We used four different hosts as clients to drive the experiment. Each client was responsible for making all the HTTP requests to a single server. Client requests are made using the HTTP 1.1 protocol, and servers delivered the data to the clients based on the size of the request. Clients were connected to the servers via an ATM switch, hence the timestamps of the requests as seen by the Web server were accurate. The experiment was run over a 24 hour period and a total of about 1.5 million HTTP requests were served, delivering over 36 GB of data.