I/O-intensive applications with large I/O demands, such as satellite
data processing, data warehousing, data mining, are increasingly relying on scalable
computing architectures for their I/O and computational requirements. It is critical that
a wide range of scalable computer architectures, from a cluster of workstations to an SMP,
provide sufficient support for these I/O-intensive applications. While recent studies show
that almost all of these architectures are unbalanced and insufficient for I /O-intensive
workloads, recent research trends such as network-attached storage, active (or
intelligent) disks, serial I/O interconnects, are promising avenues to remedy this
problem. With so many architectural options becoming possible, our research attempts to
evaluate these I/O architectures in an application-driven framework.
The main goal of our research is to improve the end-to-end
application performance. Our research effort consists of two sections. First, we analyze
the behavior of a suite of scientific and non-scientific I/O-intensive parallel
applications. Second, these applications are used to drive the evaluation of storage
architectures for the multicomputers with the hope of improving the end-to-end application
performance.
Our application characterization effort concentrates a suite of
(currently) thirteen I/O-intensive parallel a pplications from various domains such as
databases, satellite data processing, out-of-core scientific applications, etc. Our main
objective for studying these applications is to determine what characteristics of I/O
architectures are desirable for future parallel machines. In addition to determining usual
application characteristics, such as steady-state and peak I/O rates, spatial and temporal
access patterns, I/O locality, we are investigating whether the application structure
allows it to disclose I/O request patterns to the file system and whether the file system
can predict the access patterns and the request inter-arrival times. Furthermore, we are
attempting to find out how to tune these applications for the underlying architecture to
get a very good I/O performance.
This research investigates a variety of architectural alternatives
for the I/O subsystem of the parallel computers ranging from network-attached disks to
active disks, ie. disks with a sufficiently powerful and having sufficiently large
on-board memory. We are focusing on the impact of these emerging I/O architectures on the
end-to-end performance of the I/O-intensive parallel applications. We are not only
investigating how a particular application performs with a given architecture but what is
required to program that particular architecture efficiently and what changes are required
at the application.