For those of us who are still grappling with exactly what a memory
consistency model is and what kind of consistency models current machines
have, I have found the following to be very helpful:
@Article{Adve:1996:SMC,
author = "Sarita V. Adve and Kourosh Gharachorloo",
title = "Shared memory consistency models: {A} tutorial",
journal = "Computer",
volume = "29",
number = "12",
pages = "66--??",
month = dec,
year = "1996",
coden = "CPTRB4",
ISSN = "0018-9162",
bibdate = "Mon Jan 6 14:08:32 MST 1997",
acknowledgement = ack-nhfb,
}
@TechReport{STAN//CSL-TR-95-685,
type = "Thesi",
number = "CSL-TR-95-685",
title = "Memory Consistency Models for Shared-Memory
Multiprocessors",
month = dec,
notes = "[Adminitrivia V1/Prg/19960625]",
pages = "392",
year = "1995",
bibdate = "June 25, 1996",
author = "Kourosh Gharachorloo",
url =
"ftp://elib.stanford.edu/pub/reports/csl/tr/95/685/CSL-TR-95-685.pdf",
abstract = "The memory consistency model for a shared-memory
multiprocessor specifies the behavior of memory with
respect to read and write operations from multiple
processors. As such, the memory model influences many
aspects of system design, including the design of
programming languages, compilers, and the underlying
hardware. Relaxed models that impose fewer memory
ordering constraints offer the potential for higher
performance by allowing hardware and software to
overlap and reorder memory operations. However, fewer
ordering guarantees can compromise programmability and
portability. Many of the previously proposed models
either fail to provide reasonable programming semantics
or are biased toward programming ease at the cost of
sacrificing performance. Furthermore, the lack of
consensus on an acceptable model hinders software
portability across different systems. This dissertation
focuses on providing a balanced solution that directly
addresses the trade-off between programming ease and
performance. To address programmability, we propose an
alternative method for specifying memory behavior that
presents a higher level abstraction to the programmer.
We show that with only a few types of information
supplied by the programmer, an implementation can
exploit the full range of optimizations enabled by
previous models. Furthermore, the same information
enables automatic and efficient portability across a
wide range of implementations. To expose the
optimizations enabled by a model, we have developed a
formal framework for specifying the low-level ordering
constraints that must be enforced by an implementation.
Based on these specifications, we present a wide range
of architecture and compiler implementation techniques
for efficiently supporting a given model. Finally, we
evaluate the performance benefits of exploiting relaxed
models based on detailed simulations of realistic
parallel applications. Our results show that the
optimizations enabled by relaxed models are extremely
effective in hiding virtually the full latency of
writes in architectures with blocking reads (i.e.,
processor stalls on reads), with gains as high as
80\\%. Architectures with non-blocking reads can
further exploit relaxed models to hide a substantial
fraction of the read latency as well, leading to a
larger overall performance benefit. Furthermore, these
optimizations complement gains from other latency
hiding techniques such as prefetching and multiple
contexts. We believe that the combined benefits in
hardware and software will make relaxed models
universal in future multiprocessors, as is already
evidenced by their adoption in several commercial
systems.",
institution = "Stanford University, Computer Systems Laboratory",
}
Joel Jones
jjones@uiuc.edu
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:18 EDT