In recent years, I/O-intensive parallel programs have emerged as one of the
leading consumers of cycles on parallel machines. Examples of I/O-intensive applications
include satellite data processing, medical image databases, high performance relational
databases, data mining, and detailed scientific modeling of complex phenomena. It is
critical that future parallel machines be designed to accommodate the characteristics of
I/O-intensive applications. To be able to do this, hardware designers need tools to
accurately predict and analyze the performance of alternative designs for these
applications. Conversely, application developers need tools to predict the performance of
their applications on existing and future parallel machines in a straightforward way, so
that they can be assured of good performance not just on existing machines, but also for
the foreseeable future. Performance prediction of applications on parallel machines is a
widely studied area. Previous work in this area has mainly focused on performance
prediction of compute intensive scientific applications. Performance prediction for data
intensive (I/O-intensive) applications on existing and future parallel machines poses
several challenges. The vast amount of data processed by these applications requires
expensive hardware configurations and renders virtually impossible direct experimentation
on the target machine. It also rules out the use of detailed simulation techniques,
because of long running times for simulations of large parallel configurations and large
datasets. The complexity of these applications hinders the application of analytical
methods.
In this work, we are developing a simulation-based framework to predict the
performance of data intensive applications on existing and future parallel machines. Our
framework consists of two components; application emulators and a suite of simulators.
Application emulators accurately capture the behavior of data intensive applications and
enable experimentation with critical application components (e.g., input data
partitioning, data declustering, processing structure, etc.) easily and flexibly. Our
suite of simulators model the I/O and communication subsystems of the parallel machine at
a sufficiently detailed level for accuracy in predicting application performance, while
providing relatively coarse grain models of the execution of instructions within each
processor. We have developed application emulators for three I/O-intensive applications,
two satellite data processing applications and a medical image database system for large
scale parallel machines. We have also developed a suite of simulation models that are both
sufficiently accurate and execute quickly, so are capable of simulating parallel machine
configurations of up to thousands of processors on a high-performance workstation. These
simulators model the I/O and communication subsystems of the parallel machine at a
sufficiently detailed level for accuracy in predicting application performance, while
providing relatively coarse grain models of the execution of instructions within each
processor. We introduce a new technique, loosely coupled simulation, that embeds the
processing structure of the application in the form of a simple dependency graph into the
simulator while preserving the application workload. This technique allows accurate, yet
relatively inexpensive performance prediction for very large scale parallel machines.