To introduce myself first - I am Sarita Adve. I am currently an associate
professor at Rice University, but will be moving to Illinois
Urbana-Champaign this fall. My research area is computer architecture. Most
relevant to this list, I did my PhD dissertation on memory consistency
models under Mark Hill at Wisconsin ('88-'93), and continue to be involved
in this area. I am therefore very interested in seeing the issues being
discussed on this list be resolved, and hope that more architects and
PL/compiler folks will work together on this topic.
Bill summarized some of the threads on the list to me, and asked some
specific questions. I thought I'd respond to the entire list. I've included
his message to me below (Bill, hope that's ok).
*** Which memory models allow the execution below?
The Alpha and PowerPC memory models allow it. RMO did not allow it in the
initial version of SPARC V9 - I can't locate my most recent revision of V9,
but don't think this aspect has changed. TSO and PSO also don't have this
problem. IA-32 also should not have this problem, although I haven't seen a
precise spec of the IA-32 model. I have not yet looked at the IA-64 in
detail.
*** Do current implementations allow the execution below?
To the best of my knowledge, no current implementation allows the execution.
To allow this result, currently, a processor would need to do "value
prediction;" i.e., processor 2 would need to predict that the read of Mem[0]
would return 2, before that read completed. Value prediction is a very hot
topic in architecture research, but to my knowledge, commercial
implementations don't exist yet.
*** Will future implementations allow the execution below?
Given how aggressively processors are employing speculation today and the
amount of work going on in value prediction, it is likely that future
processors will do value prediction. I don't expect this group to be able to
influence that decision since value prediction could potentially be a
valuable uniprocessor optimization. However, see below ...
*** Can processors implement value prediction for most cases, and yet be
"safe" for the cases we care about?
The answer to this question is (a guarded) yes. Processors can do whatever
speculation and reordering they want, but it is possible to have them
recover if these optimizations mess up multiprocessor interactions. This
type of technique is used, for example, in SGI Origin and all sequentially
consistent versions of HP's multiprocessors. The technique allows these
systems to appear SC even though they aggressively reorder memory
operations.
Nevertheless, I say "a guarded" yes to the question, because this technique
relies on certain hardware features that are quite standard today (e.g.,
hardware cache coherence), but may not be standard in the future. Similarly,
I haven't yet thought much about the impact on software shared-memory
systems.
Moreover, the cost of these techniques, of course, is some compromise in
performance, and some (but not inordinate) complexity. Therefore, not all
hardware vendors subscribe to this philosophy (otherwise, all systems would
be SC). Particularly, the Alpha and PowerPC models do not preclude systems
that will have value prediction without the above safeguard.
By the way, the techniques alluded to above were the motivation for Mark
Hill's paper cited earlier on this list. For those not familiar with them,
I'll be happy to provide references (on the techniques as well as
performance studies) or elaborate further.
*** So what does this mean?
- Given that Java has enough clout, it may be possible to "force" hardware
designers to accept that they have to provide the safeguard discussed for
future machines. But I think this should be the last resort.
- Before taking the leap to relying on hardware, I would suggest spending
some energy on determining if there is a pure software solution to this
problem for exactly the cases we want this to work on. Some suggestions have
already been made. But I think the first step should be to converge on the p
recise minimal definition of safety that people really want. Doug Lea has
started this with the definition of immutable objects. Can we get agreement
on this?
- I also think more thought needs to be given to whether this is a desirable
constraint at all. My personal bias is with Raymie's position - If we expect
programmers to synchronize for general writes, then why not for
initialization writes? Also, I worry that no matter how we state this
exception to the general rule of needing synchronization, there would be
other cases that are almost similar but are not included in the exception
and would encourage confusion and bugs. But as Bill said, this is a
religious debate best decided in another forum.
- Final comment on going the hardware route: I also think that requiring
ordering between data dependent reads but not requiring ordering between
control dependent reads is inelegant. Almost all processors today support
predication or conditional instructions (IA-64 does it in the most general
form). Effectively, this feature converts control dependent operations to
data dependent ones (also called if-conversion). I think it would be
inelegant for the memory model to not impose an order on the same two
instructions when the dependence is expressed as a control dependence but to
have an order if expressed as data dependence. I suppose one could get
around this asymmetry with careful choice of wording, but it might be ugly
and is something to think about.
I have another set of comments on Bill's alternative memory model in the
Java Grande paper. I'll send those in a separate message.
Sarita
-----------------------------------------------------------------------
Prof. Sarita Adve
Dept. of Electrical Engg. - MS 366 Office: 3029 Duncan Hall
Rice University E-mail: sarita@rice.edu
6100 Main Street Phone : 713-737-5686
Houston, TX 77005 Fax : 713-737-6196
WWW: http://www-ece.rice.edu/~sarita
-----------------------------------------------------------------------
> -----Original Message-----
> From: Bill Pugh [mailto:pugh@cs.umd.edu]
> Sent: Wednesday, June 30, 1999 10:07 AM
> To: Sarita Adve
> Subject: Question on memory models
>
>
> Sarita,
>
> I don't know how much you've been following the traffic on the Java
> memory model mailing list, so I'll summarize:
>
> Consider the following.
>
> Initially, Mem[0] = 1, Mem[1] = 3, Mem[2] = 4
>
> Processor/thread 1:
>
> Mem[2] := 5
> memory barrier
> Mem[0] := 2
>
> Processor/thread 2:
>
> R1 := Mem[0]
> R2 := Mem[R1]
>
> On a number of processor memory models, including the Dec Alpha,
> these actions could result in processor/thread 2 loading 4 into R2
> (Seeing the new value for Mem[0] and the old value for Mem[2]).
>
> As has been discussed on the mailing list, this makes it very hard to
> do virtual method dispatch in a safe and fast way on a multiprocessor.
>
> Any comments? What other processors can see a 4 in register 2? Am I
> correct that you cannot on Sparc RMO?
>
> Bill
>
-------------------------------
This is the JavaMemoryModel mailing list, managed by Majordomo 1.94.4.
To send a message to the list, email JavaMemoryModel@cs.umd.edu
To send a request to the list, email majordomo@cs.umd.edu and put
your request in the body of the message (use the request "help" for help).
For more information, visit http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:13 EDT