|
|
What TCP/IP Protocol Headers Can Tell Us About the Web
|
Authors
|
F. Donelson Smith
Felix Hernandez Campos
Kevin Jeffay
David Ott
DiRT Group,
Department of Computer Science, University of North Carolina at Chapel Hill
|
Abstract
|
We report the results of a large-scale empirical study of web traffic. Our
study is based on over 500 GB of TCP/IP protocol-header traces collected in
1999 and 2000 (approximately one year apart) from the high-speed link
connecting The University of North Carolina at Chapel Hill to its Internet
service provider. We also use a set of smaller traces from the NLANR
repository taken at approximately the same times for comparison. The principal
results from this study are: (1) empirical data suitable for constructing
traffic generating models of contemporary web traffic, (2) new
characterizations of TCP connection usage showing the effects of HTTP
protocol improvement, notably persistent connections (e.g., about 50%
of web objects are now transferred on persistent connections), and (3)
new characterizations of web usage and content structure that reflect
the influences of "banner ads," server load balancing, and content
distribution. A novel aspect of this study is a demonstration that
a relatively light-weight methodology based on passive tracing of only
TCP/IP headers and off-line analysis tools can provide timely, high
quality data about web traffic. We hope this will encourage more
researchers to undertake ongoing data collection and provide the
research community with data about the rapidly evolving characteristics
of web traffic.
|
|