Parallel Text Search
Description
This application is a trivial parallelization of agrep program
from University of Arizona. agrep is capable of partial match
and approximate searches [1]. Our parallel version, called pgrep
, can operate in one of two modes. In the first mode, the list of
files to be searched is partitioned, round-robin, among the
processors. In the second mode, each input file is block-partitioned
among the processors, so that each processor searches a distinct
portion of the file. This application performs I/O using synchronous
read() operations.
Input Dataset
To drive this application, we used the "/users" document
hierarchy on the University of Maryland web server. The hierarchy
contained 475~MB in 18,655 files.
Workload
We used a complex pattern including wildcards, disjunction and
substitution: "b#dy,[a-z]#gif,l#[n-z]t#,F#"
.
In this
pattern, "," is OR-construct; "#" is wildcard; and
"[a-z]" is all the characters between "a" and "z". We allowed 3
substitutions on this pattern, i.e. any string which can be obtained
by substituting 3 characters is also accepted by the pgrep.
Traces
You can download the trace files in the following formats:
References
[1] Sun Wu and Udi Manber. agrep - a fast approximate
pattern-matching tool . In USENIX Conference Proceedings, pages
153-162, San Fransisco, CA, Winter 1992.
Last updated on Tue May 27 12:37:44 EDT 1997
by Mustafa Uysal (uysal@cs.umd.edu ).