Maryland Applications for Measurement and Benchmarking of I/O on Parallel Computers
Motivation
Until recently, most applications developed for parallel machines
avoided I/O as much as possible (distributed databases have been a
notable exception). Typical parallel applications (usually scientific
programs) would perform I/O only at the beginning and the end of
execution with the possible exception of infrequent checkpoints. This
has been changing: I/O-intensive parallel programs have emerged as one
of the leading consumers of cycles on parallel machines. This change
has been driven by two trends. First, parallel scientific applications
are being used to process larger datasets that do not fit in
memory. Second, a large number of parallel machines are being used for
non-scientific applications, for example databases, data mining, web
servers for busy web sites (e.g. Altavista and NCSA). Characterization
of these I/O intensive applications is an important problem that has
tremendous effect on the design of I/O subsystems, operating systems
and filesystems.
To this end, we have traced seven parallel I/O-intensive applications.
These applications were run on eight nodes of an IBM SP-2. We used the
AIX trace utility to trace I/O-related system calls (open,
close, read, write and seek). We also captured all message-passing
activity and context-switches. This allowed us to accurately compute
the inter-arrival times for I/O requests and to better understand the
application behavior. Some characteristics of these traces have been
described in University of Maryland Technical Report:
Mustafa Uysal, Anurag Acharya, Joel Saltz.
Requirements of I/O Systems for Parallel Machines: An Application-driven Study.
Technical Report, CS-TR-3802, University of Maryland, College Park,
May 1997.
We are making these traces available for the use of other researchers.
The traces are in ASCII. We provide a description of the trace format;
utility programs to convert to/from a binary format; and library
routines to access the trace records in binary format. For each of the
applications, we provide a brief description of the application
itself, the input dataset and the workload.
Non-scientific applications
Scientific applications
Utilities
These files describe the trace formats and provides small utilities to
deal with the trace files, such as converting to/from binary, a
library of routines to manipulate trace records, etc.
People
Last updated on Tue May 27 12:37:44 EDT 1997
by Mustafa Uysal ( uysal@cs.umd.edu ).