To better understand the observed behavior, we have developed deterministic models of how an IP network responds to a train of probe packets over a multi-hop connection. The models relate the arrival, departure, and service times of the probe packets at the routers. Using these model we can estimate the delay encountered by every probe packet at congested routers. We shall see that this encountered delay consists of two parts: a component due to cross traffic (which is to be expected); and a component due to the server going offline for few hundred milliseconds (which is totally unexpected).
We point out that our deterministic modeling approach is a departure from the traditional approach, which uses stochastic models validated using either laboratory testbeds or small-scale experiments. We believe that stochastic models, while appropriate at the design stage, are not particularly suited for network monitoring and diagnosis. This is because accurate system dynamics (user load, traffic patterns, routing dynamics, etc.) are not available at the design stage. However, the situation is different for network diagnosis and control. Here the network is available and indeed operational. Hence we can collect vast amounts of low-level information. The difficulty here relates to the manner in which this information can be used and interpreted. Further, we are concerned not only with extracting performance metrics but also with troubleshooting (e.g., identifying routers exhibiting abnormal behavior). This is where deterministic models are far more appropriate than stochastic models; the latter would require too much modeling and computational effort.
In all models developed, we consider transfer of packets from a given source host to a destination host, and assume that all packets between the source-destination pair of hosts follow the same path, comprising of one or more routers (or switches). The flow of packets along the fixed path is considered to be a virtual circuit. Intermediate routers perform the usual functions of receiving, storing, routing, and forwarding packets using FIFO discipline. Additionally, the routers may also process packets that belong to other virtual circuits, or may engage in certain house keeping activities. From the viewpoint of modeling the virtual circuit, each router (and associated circuit) may be viewed as a server with an input queue, and the virtual circuit as a tandem of n servers (see Figure 1).
Figure 1 Flow of traffic over a virtual circuit and its model
The first model (see Agrawala and Jain [4]) assumes that routers on the virtual circuit process packets belonging to the virtual circuit alone, and that the routers are work-conserving (no house keeping functions). It has been shown that the departure time of individual packets from a server (router, that is) is given by
,
where is the time of departure of packet j from server i, is the propagation time from server i-1 to server i, and is the processing time at server i for packet j. Assuming to be independent of j (or equivalently, packets are of a fixed length), it can be shown that the virtual circuit can indeed be modeled as in Figure 2, where b refers to the bottleneck server. Further, assuming propagation delays to be negligible,
.
Thus, in case an equally spaced set of packets are sent with an inter-packet send time of , , then the end-to-end delay is simply . Recall, that the model assumes that there is no cross traffic, and that the servers are work-conserving.
Figure 2 Resulting end-to-end delay over virtual circuit with no cross traffic
The above model has been extended to cover the case where there is cross traffic (see Jain, Agrawala, Sanghi [3]). The amount of cross traffic handled by a router i is captured by the parameter, . It is assumed that the cross traffic can be broken down into infinitesimally small packets, and a fraction of the available server capacity is used by the cross traffic. It can then be shown that
,
where is the arrival time of packet j at server i. An analysis of a virtual circuit yields the following important observations:
, then the packets do not encounter any queuing delays.
The fluid flow assumption is crucial to the above model. However, it has been observed over the Internet that there is a significant variation in end-to-end delays due to cross traffic. This model is incapable of explaining the excessive delays encountered periodically. Possibly, these are due to house keeping tasks undertaken by intermediate routers.
This brings us to the third model of a virtual circuit, wherein the cross traffic may not be broken down into a large number of uniformly spaced small packets. The cross traffic is assumed to be in the form of one or more tasks (transmission or house keeping) to be performed by a router. The basis for the model lies in the observation that the end-to-end delay for a circuit with no cross traffic is
.
The net effect of processing delay due to cross traffic at each of the intermediate routers may be aggregated in the form of a delay term , where is the additional time spent by intermediate server i to process cross traffic. This results in
.
The model essentially allows one to study virtual circuits where some of the router may not be work-conserving. If a server i were to engage in certain house keeping functions (or takes a coffee- break), the corresponding delaywould be equal to the coffee-break duration. The model, in fact, explains the observed behavior over the Internet, where some packets periodically experience a sudden and significant increase in the end-to-end delay, and each subsequent packet sent with a uniform inter-packet send time experiences a linearly decreasing delay. That is, if , then the end-to-end delay encountered by packet j, , is given by
.
Clearly, if is large due to a router taking a coffee-break, then the subsequent packets would also encounter large end-to-end delay, although a decreasing one.
The periodic large-magnitude delays in the above plot cannot be generated by cross traffic, exactly because of their regularity and magnitude. Enormous amounts of traffic would have to enter the router with such regularity and in such short time (almost certainly exceeding the bandwidth of the router links. Thus the only explanation for these spikes is that the server periodically goes offline for hundreds of milliseconds; we refer to this as server "coffee breaks". As a consequence, the encountered delay consists of two components:
From the frequency and magnitude of the variations, we see that there are several servers taking "coffee breaks". An interesting question is about the location of the server on break. Note that when a server takes a break the processing of packets on the forward as well as reverse path stops. Therefore when we examine the detailed characteristics of the extended delay we observe two peaks (see Figure 12 and 13). On one side the presence of two peaks confirms the coffee break theory. On the other, the distance between the two peaks gives an idea about the location of the server taking the break. The delay between the two peaks equals the roundtrip time between the router and the remote host. This remote distance shows up in a plot of the encountered delays as two peaks separated by that distance.
[Top] [Back to NetCalliper Project]