|  | |||||
|  |  |  |  |  |  | 
| Principal InvestigatorsMustafa UysalAnurag Acharya Joel Saltz Software DistributionRelated Information | Applications for Measurement and Benchmarking of I/O on Parallel ComputersUntil recently, most applications developed for parallel machines avoided I/O as much as possible (distributed databases have been a notable exception). Typical parallel applications (usually scientific programs) would perform I/O only at the beginning and the end of execution with the possible exception of infrequent checkpoints. This has been changing: I/O-intensive parallel programs have emerged as one of the leading consumers of cycles on parallel machines. This change has been driven by two trends. First, parallel scientific applications are being used to process larger datasets that do not fit in memory. Second, a large number of parallel machines are being used for non-scientific applications, for example databases, data mining, web servers for busy web sites (e.g. Altavista and NCSA). Characterization of these I/O intensive applications is an important problem that has tremendous effect on the design of I/O subsystems, operating systems and filesystems.To this end, we have traced seven parallel I/O-intensive applications. These applications were run on eight nodes of an IBM SP-2. We used the AIX trace utility to trace I/O-related system calls (open, close, read, write and seek). We also captured all message-passing activity and context-switches. This allowed us to accurately compute the inter-arrival times for I/O requests and to better understand the application behavior. Some characteristics of these traces have been described in University of Maryland Technical Report: 
 We are making these traces available for the use of other researchers. The traces are in ASCII. We provide a description of the trace format; utility programs to convert to/from a binary format; and library routines to access the trace records in binary format. For each of the applications, we provide a brief description of the application itself, the input dataset and the workload. 
 |