3. Benchmark Engineering

3.1 Introduction to Benchmark Engineering
Creating an accurate benchmark requires considerable investment and expertise. The usefulness of a benchmark depends largely on its composition and assumptions made during implementation process. The only accurate way to measure the performance of a system is to test the software applications you use on your computer system. Thus benchmark engineering involves many important steps, ranging from defining a typical workload to testing and releasing the final benchmark suite. The most important thing to be aware of is the fact that the same benchmark program can generate different results on the same system using differnet configurations. There are four general categories of benchmark tests:

application-based tests, which run real applications and record the execution time

playback tests, which use logs of system calls made during specific application activities and play them back in isolation

synthetic tests, which approximate application activity in specific subsystems.

inspection tests, which do not attempt to mimic application activity, but instead directly exercise specific subsystems.

The next section will discuss factors that need to be considered during the process of benchmark engineering.

3.2 Steps Involved in the Process of Benchmark Engineering
Benchmark engineering involves typical steps that any engineering process does. The developer of a benchmark needs to have a clear understanding of what he is trying to measure, and under what circumstances these measurements are going to be conducted. Rules and requirements in inspecting a machine's performance have to be defined precisely and factors that need to be considered in analyzing results and reporting performance have to be specified. The following are the essential steps involved in benchmark engineering:

Establish standard criteria that the benchmark needs to test. This means the developer needs to decide if he or she is constructing a system-level benchmark or a component-level benchmark. Depending on the level of the benchmark, a detailed analysis of a subsystem or a system needs to be conducted. This includes :

defining components or subsystems of the system

investigating the system platforms and environments

learning the architecture and technology used to develop the system

specifying software and hardware configurtions that need to be set
up when running the benchmark

Examine the nature of the "workload" performed on the system that the benchmark is about to test. The benchmark developer must come up with real programs or kernels that are good representatives of day-to-day workload on the system being tested. For instance, if the benchmark is to evaluate the floating-point performance of a processor, including graphics applications in the benchmark will not reflect the actual performance capability, but also defeat the purpose of benchmarking. The following factors must be investigated thoroughly to simulate a workload:

types of operations, functions and procedures in the workload

frequencies of different types of operations

possible exceptions and their sources

special requirements that need to be fulfilled to perform the operations

Implement or select the appropriate programs that represent the workload investigated in the previous steps. The benchmark developer must balance the amount of real programs, kernels and applications that will be included in the benchmark based on the information he gathered about the workload.

Specify the type of tests that the benchmark can perform. The developer must decide when and how his benchmark can perform:

application-based tests
playback tests
synthetic tests
inspection tests

Perform the testing on a machine and analyze the benchmark results. Testing needs to be done several times to uncover any mistakes that the developer made in any of the previous steps. Benchmarking different machines of the same system configurations will help identify any drawbacks or weakness of the benchmark performance. All findings and results should be recorded and examined carefully.

Fix any errors encountered during the previous step and perform the testing again till satisfactory and reliable results are obtained. Make sure the benchmark being implemented has all the properties of a reliable benchmark. These properties will be discussed in the next section.

3.3 Properties of a Reliable Benchmark
A reliable benchmark should have the following properties to be commercially successful :

automated set-up of the operating environment for proper and consistent execution of the workloads

ability to collect and store performance data for user-specified projects

statistical consistency for all catagories of testing

ability to represent the workload accurately

ability to minimize the testing time and produce the results with minimum variations across machines

ability to set appropriate performance standards that take into account the distinctive characteristics of platform capabilities such as multitasking, 3D rendering, etc.

well-balanced composition and manageable size

3.4 Popular Reliable Benchmarks
The followings are the most up-to-date benchmarks available on market to measure the performance of your machine. Each of them has its own unique ways of assuring accuracy and consistency. Due to the time constraint, we are unable to include a detailed analysis of their performance merits. To obtain a complete analysis on their performance, please follow the corresponding links :

Ziff-Davis Benchmarks

TPC Benchmarks

BAPCo Benchmarks

SPEC Benchmarks

AIM Benchmarks

Top 20 Benchmarks

Overview | Introduction | Types | Reporting | Revising | Questions