3. Benchmark Engineering
3.1 Introduction to Benchmark Engineering
Creating an accurate benchmark requires considerable investment and expertise.
The usefulness of a benchmark depends largely on its composition and assumptions
made during implementation process. The only accurate way to measure the
performance of a system is to test the software applications you use on your
computer system. Thus benchmark engineering involves many important steps,
ranging from defining a typical workload to testing and releasing the final
benchmark suite. The most important thing to be aware of is the fact that the
same benchmark program can generate different results on the same system using
differnet configurations. There are four general categories of benchmark tests:
- application-based tests, which run real applications and record
the execution time
- playback tests, which use logs of system calls made during
specific application activities and play them back in isolation
- synthetic tests, which approximate application activity in
specific subsystems.
- inspection tests, which do not attempt to mimic application
activity, but instead directly exercise specific subsystems.
The next section will discuss factors that need to be considered during the
process of benchmark engineering.
3.2 Steps Involved in the Process of Benchmark Engineering
Benchmark engineering involves typical steps that any engineering process does.
The developer of a benchmark needs to have a clear understanding of what he is
trying to measure, and under what circumstances these measurements are going to
be conducted. Rules and requirements in inspecting a machine's performance have
to be defined precisely and factors that need to be considered in analyzing
results and reporting performance have to be specified. The following are the
essential steps involved in benchmark engineering:
- Establish standard criteria that the benchmark needs to test. This means
the developer needs to decide if he or she is constructing a system-level
benchmark or a component-level benchmark. Depending on the level of the
benchmark, a detailed analysis of a subsystem or a system needs to be
conducted. This includes :
- Examine the nature of the "workload" performed on the system that the
benchmark is about to test. The benchmark developer must come up with
real programs or kernels that are good representatives of day-to-day
workload on the system being tested. For instance, if the benchmark is
to evaluate the floating-point performance of a processor, including
graphics applications in the benchmark will not reflect the actual
performance capability, but also defeat the purpose of benchmarking.
The following factors must be investigated thoroughly to simulate a
workload:
- types of operations, functions and procedures in the workload
- frequencies of different types of operations
- possible exceptions and their sources
- special requirements that need to be fulfilled to perform the
operations
- Implement or select the appropriate programs that represent the workload
investigated in the previous steps. The benchmark developer must balance
the amount of real programs, kernels and applications that will be
included in the benchmark based on the information he gathered about the
workload.
- Specify the type of tests that the benchmark can perform. The developer
must decide when and how his benchmark can perform:
- application-based tests
- playback tests
- synthetic tests
- inspection tests
- Perform the testing on a machine and analyze the benchmark results.
Testing needs to be done several times to uncover any mistakes that
the developer made in any of the previous steps. Benchmarking different
machines of the same system configurations will help identify any
drawbacks or weakness of the benchmark performance. All findings and
results should be recorded and examined carefully.
- Fix any errors encountered during the previous step and perform the
testing again till satisfactory and reliable results are obtained.
Make sure the benchmark being implemented has all the properties of
a reliable benchmark. These properties will be discussed in the next
section.
3.3 Properties of a Reliable Benchmark
A reliable benchmark should have the following properties to be commercially
successful :
- automated set-up of the operating environment for proper and consistent
execution of the workloads
- ability to collect and store performance data for user-specified projects
- statistical consistency for all catagories of testing
- ability to represent the workload accurately
- ability to minimize the testing time and produce the results with minimum
variations across machines
- ability to set appropriate performance standards that take into account the
distinctive characteristics of platform capabilities such as multitasking,
3D rendering, etc.
- well-balanced composition and manageable size
3.4 Popular Reliable Benchmarks
The followings are the most up-to-date benchmarks available on market to
measure the performance of your machine. Each of them has its own unique
ways of assuring accuracy and consistency. Due to the time constraint, we
are unable to include a detailed analysis of their performance merits. To
obtain a complete analysis on their performance, please follow the corresponding
links :
Overview |
Introduction |
Types |
Reporting |
Revising |
Questions