Parallel Text Search

Trace Distribution

Compressed, ASCII format (66.2MB)

Related Information

Sun Wu and Udi Manber. agrep - a fast approximate pattern-matching tool. In USENIX Conference Proceedings, pages 153-162, San Fransisco, CA, Winter 1992.

Applications for Measurement and Benchmarking of I/O on Parallel Computers

This application is a trivial parallelization of agrep program from University of Arizona. Agrep is capable of partial match and approximate searches. Our parallel version, called pgrep, can operate in one of two modes. In the first mode, the list of files to be searched is partitioned, round-robin, among the processors. In the second mode, each input file is block-partitioned among the processors, so that each processor searches a distinct portion of the file. This application performs I/O using synchronous read operations.

Input Dataset

To drive this application, we used the /users document hierarchy on the University of Maryland web server. The hierarchy contained ~475MB in 18,655 files.

Workload

We used a complex pattern including wildcards, disjunction and substitution: b#dy,[a-z]#gif,l#[n-z]t#,F#. In this pattern, "," is OR-construct; "#" is wildcard; and "[a-z]" is all the characters between "a" and "z". We allowed 3 substitutions on this pattern, i.e. any string which can be obtained by substituting 3 characters is also accepted by pgrep.