Data Mining

Trace Distribution

Compressed, ASCII format (543KB)

Applications for Measurement and Benchmarking of I/O on Parallel Computers

This application tries to extract association rules from retail data -- in particular, buying patterns that characterize the shopping behavior of retail customers. This application performs I/O using synchronous read() operations. Detailed description of this application can be found in:

Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison.

Input Dataset

We have used a database consisting of 50 million transactions, with an average transaction size of 10 items and maximal potentially frequent set size of 3. The synthetic data was generated based on the following retail data model:

Fast Algorithms for Mining Association Rules in Large Databases.

The dataset size for this program was 4 GB and was partitioned into 8 files, one per processor.

Workload

We used "Find all rules" query that extracts all the possible association rules in the transaction database.