5. Improving Benchmarks
The usefulness of a benchmark is influenced by several factors and these
factors may change over time. A big factor is the ability of the benchmark
to resist "cracking" or a way of tuning the performace of
a benchmark by the use of different compilers, preprocessors, or benchmark-specific flags. After a benchmark has become standardized, there is
tremendous pressure to improve performance by targeted optimizations or
by aggressive interpretations of the rules for running the benchmark.
5.1 Optimizations that Affect Benchmark Performance
To further illustrate why benchmarks may become obsolete,
it would be helpful to know why the SPEC92 benchmarks were replaced with the SPEC95.
When the benchmarks were adopted for the SPEC92 suite, several flaws were
present but these were minimal since code optimization was difficult at that time.
However, after the program became
a benchmark, compiler authors became resourceful in optimizing around
the benchmark. The following is a
list of some optimizations that are used:
- Many compilers perform loop unrolling. They duplicate the code of the
loop body, generate larger basic blocks that can be more easily optimized
by other compilation techniques. This is a common optimization that
generally benefits programs with loops.
- Via some compilers, the conditions that are checked by the if
statements are transformed to a logically equivalent form that can be
compiled into more efficient code.
- Some compilers optimize the load instruction over several iterations
of the loop. Instead of loading a 16-bit item at a time, the compiler
generates load instructions for a 32-bit or 64-bit words, storing them in
later iterations in the loop. However, this is a legitmate optimization if
implemented properly. It benefits some programs more than others,
such as programs that possess these data type properties.
5.2 Updating Benchmarks
Since technology
is always improving, benchmarks that are used to measure the effect of changes in
technology also need to be improved. There are several issues that have to
be addressed.
- Runtime. The length of time that a benchmark is running is important
because if the running time interval is too short, small changes or
fluctuations in the measurements will have a significant impact on the observed
percentage improvements. Thus SPEC improved their benchmarks by
making them longer to take account of future performance.
- Application size. Applications are growing in complexity and size, thus
benchmarks may become less representative of what was run on current systems
if they are not adapted for larger programs. Also it is important to mix
benchmarks requiring large resources along with smaller programs.
- Application type. Just as the size and complexity of application
increases, a wider range of types of applications are growing. Thus, these
other types should be considered in order to cover different varieties and
increased complexities of the workload.
- Portability. It is also important that benchmarks and tools used in the
process of benchmarking are independent of the operating system.
- Pre-emptive multitasking. Most systems can handle many tasks, such as
printing or switching between applications, at the same time by splitting
processing time between the multiple tasks.
- Increasing data sizes. There is an increase in the use of audio and
video intensive apllications which require more efficient I/O capabilities
for higher bandwith.
- New system resources. There are new operating systems which make it
easier for developers to integrate newer capabilities into their
applications. These include better 3D graphics engines, telephony protocols,
and data types like audio and video which often require more computing power.
- Moving target. It is likely that improvements in the test performance
become specific to that test only. By updating benchmarks, it may
encourage general improvements and make test-specific optimizations less
effective.
| Overview |
Introduction |
Types |
Engineering |
Reporting |
| Questions |