|
|
Characterizing the Memory Behavior of Java Workloads:
A Structured View and Opportunities for Optimizations
|
Authors
|
Yefim Shuf <yshuf@cs.princeton.edu> <yefim@us.ibm.com>
Computer Science Department, Princeton University
IBM T. J. Watson Research Center, Yorktown Heights
Mauricio J. Serrano <mauricio.j.serrano@intel.com>
Intel Microprocessor Research Labs, Santa Clara
Manish Gupta <mgupta@us.ibm.com>
IBM T. J. Watson Research Center, Yorktown Heights
Jaswinder Pal Singh <jps@cs.princeton.edu>
Computer Science Department, Princeton University
|
Abstract
|
This paper studies the memory behavior of important Java workloads
used in benchmarking Java Virtual Machines (JVMs), based on
instrumentation of both application and library code in a
state-of-the-art JVM, and provides structured information about these
workloads to help guide systems' design. We begin by characterizing
the inherent memory behavior of the benchmarks, such as information on
the breakup of heap accesses among different categories and on the
hotness of references to fields and methods. We then provide detailed
information about misses in the data TLB and caches, including the
distribution of misses over different kinds of accesses and over
different methods. In the process, we make interesting discoveries
about TLB behavior and limitations of data prefetching schemes
discussed in the literature in dealing with pointer-intensive Java
codes. Throughout this paper, we develop a set of recommendations to
computer architects and compiler writers on how to optimize computer
systems and system software to run Java programs more
efficiently. This paper also makes the first attempt to compare the
characteristics of SPECjvm98 to those of a server-oriented benchmark,
pBOB, and explain why the current set of SPECjvm98 benchmarks may not
be adequate for a comprehensive and objective evaluation of JVMs and
just-in-time (JIT) compilers.
We discover that the fraction of accesses to array elements is
quite significant, demonstrate that the number of "hot spots" in the
benchmarks is small, and show that field reordering cannot yield
significant performance gains. We also show that even a fairly large
L2 data cache is not effective for many Java benchmarks. We observe
that instructions used to prefetch data into the L2 data cache are
often squashed because of high TLB miss rates and because the TLB does
not usually have the translation information needed to prefetch the
data into the L2 data cache. We also find that co-allocation of
frequently used method tables can reduce the number of TLB misses and
lower the cost of accessing type information block entries in virtual
method calls and runtime type checking.
|
|