Motivation Until
recently, most applications developed for parallel machines avoided I/O as much as
possible (distributed databases have been a notable exception). Typical parallel
applications (usually scientific programs) would perform I/O only at the beginning and the
end of execution with the possible exception of infrequent checkpoints. This has been
changing: I/O-intensive parallel programs have emerged as one of the leading consumers of
cycles on parallel machines. This change has been driven by two trends. First, parallel
scientific applications are being used to process larger datasets that do not fit in
memory. Second, a large number of parallel machines are being used for non-scientific
applications, for example databases, data mining, web servers for busy web sites (e.g.
Altavista and NCSA). Characterization of these I/O intensive applications is an important
problem that has tremendous effect on the design of I/O subsystems, operating systems and
filesystems.
To this end, we have traced seven parallel I/O-intensive applications. These
applications were run on eight nodes of an IBM SP-2. We used the AIX trace utility
to trace I/O-related system calls (open, close, read, write and seek). We also captured
all message-passing activity and context-switches. This allowed us to accurately compute
the inter-arrival times for I/O requests and to better understand the application
behavior. Some characteristics of these traces have been described in University of
Maryland Technical Report:
Mustafa Uysal, Anurag Acharya, and Joel Saltz. Requirements of
I/O Systems for Parallel Machines: An Application-driven Study. Technical
Report, CS-TR-3802, University of Maryland, College Park, May 1997.
We are making these traces available for the use of other researchers. The traces
are in ASCII. We provide a description of the trace format; utility programs to convert
to/from a binary format; and library routines to access the trace records in binary
format. For each of the applications, we provide a brief description of the application
itself, the input dataset and the workload. projector
Non-scientific applications
Scientific applications
Utilities
These files describe the trace formats and provides small utilities to deal with
the trace files, such as converting to/from binary, a library of routines to manipulate
trace records, etc.
People
Last updated on Tue May 27 12:37:44 EDT 1997 by Mustafa Uysal ( uysal@cs.umd.edu ).
|