This research involves the investigation of flow control (scheduling) techniques
for multiple, I/O intensive parallel applications. Currently many of the parallel
filesystems and I/O runtime libraries address maximizing the performance of individual
parallel applications having intense I/O requirements. Most studies have not investigated
the potential performance problems of handling tens to possibly hundreds of outstanding
I/O requests in a multi-workload environment. Through some simple experiments, it has been
found that disk scheduling may not perform well for a large number of outstanding requests
for contiguous data on multiple disks. On single disk I/O servers, as I/O queues are
filled with large requests spanning many files, there is the potential for high seek
penalties that can lead to variability in request time, lower I/O performance and
throughput.
The goals of this research will be to 1) develop some simple server scheduling
policies and 2) investigate the benefits of these policies through performance evaluation
experiments. The evaluation will be done using both simulation and a "straw-man"
parallel filesystem. While the latter would allow experiments to be conducted on real
machines, the availability and accessibility of very large systems would be a major
concern as would be the amount of control over the variability in the system itself. At
the cost of larger turn-around times on experiments, simulation allows very large systems
to be simulated and detailed statistics to be gathered. Also, for system scaling studies,
it is impossible to change the speeds of components (i.e., disk, cpu, network) in real
systems. This is very easy to do in a simulator. The experiments will be driven by I/O
traces from applications taken mostly from the scientific processing domain and also some
from the non-scientific domain, including web-servers and interactive applications. The
traces are gathered from applications ported and developed within the High Performance
Software Laboratory at University of Maryland and also from the Scalable I/O Project
(Specifically, The Pablo Group at UIUC).
Some of the expected benefits from this research will be 1) a better understanding
of the behavior of parallel filesystems that are shared by multiple I/O intensive
applications and 2) some flow control techniques that may improve server efficiency,
application and system throughput.