> -----Original Message-----
> From: Boehm, Hans [mailto:hans_boehm@hp.com]
> Sent: Wednesday, July 18, 2001 4:46 PM
> > Ok, I classify this issue as write atomicity. In other
> > words, when a write
> > becomes visible to a processor, does it become visible to all
> > processors at
> > the same time? There are at least three options:
> >
> > (1) Yes, a write becomes visible to all processors at the same time.
> > (2) A write can become visible to its own processor early, but becomes
> > visible to all other processors at the same time.
> > (3) A write can become visible to some other processors early.
> >
> > I believe your question is about whether we should permit option 3.
> Are there cases in which option 2 gives you some useful properties that
> option 3 doesn't?
>
> I'm not sure that as a programmer the distinction between 2 and 3 matters.
> Does it?
It depends on what the other ordering constraints are. For example, for a
model that prohibits all reorderings except W -> R (a la TSO), here are
codes that can distinguish between options 2 and 3:
Initially A=Flag1=Flag2=0
Thread 1:
Flag1 = 1
A=1
register1 = A
register2 = Flag2
Thread 2:
Flag2 = 1
A = 2
register3 = A
register4 = Flag1
Result: register1=1, register2=0, register3=2, register4=0
This result can only happen if option 2 is permitted, and is independent of
option 3.
The example you've been citing (with four processors) needs option 3, and is
independent of option 2. Another such example with only three processors is:
Initially A=B=0
Thread 1
A = 1
Thread 2
if (A==1)
B=1
Thread 3
if (B==1)
register1 = A
Result: B=1, register1=0
How useful is it to prohibit 2 or 3 for programmers?
One could argue that it is important to prohibit 3 for the above code (also
called causality by some).
I'm not sure how important it is to explicitly prohibit 2 for programmers. I
haven't come across an algorithm that relies on it, but others on this list
would know better. This optimization is tricky because whether or not it
becomes visible depends a lot on other system constraints. If it becomes an
important issue, I can look up previous work to remind myself of all the
related issues. Certainly once we finesse the rest of the semantics, I'll go
back to it.
In this context, I should also clarify one thing from my previous message:
> > Option (2) is related to local data dependences and program
> > order related
> > memory model constraints. If we go with requiring that program ordered
> > volatile write followed by volatile read should be usable for
> > ordering, then
> > option (2) is prohibited directly.
I should have said "if we go with requiring that program ordered volatile
write followed by volatile read should be usable for ordering, then **it
becomes easier to say that option (2) should not be visible for volatiles.**
That is, we don't have to explicitly prohibit the optimization, but because
of the other parts of the spec, its effect may not be visible anyway.
> > As far as real machines - The last time I did a survey of
> > this issue (with
> > Kourosh about five years ago), the only real machine that
> > permitted option
> > (3) was the Cray T3D.
> Did you also look at large NUMA machines like an SGI O2K? Or was
> this done
> too early? I suspect we don't need to worry about the T3D, but
> it would be
> nice to have some evidence that NUMA machines aren't likely to end up
> weakening memory models in this respect.
SGI O2K is supposed to be sequentially consistent. We looked at Convex
Exemplar and Sequent NUMA-Q. We certainly didn't find evidence then that
these machines permitted (3). Unfortunately, I don't remember if we found
absolute documentation that they prohibited (3).
> >
> > My opinion: requiring non-normal writes to be atomic (i.e.,
> > prohibit options
> > 2 and 3) should be acceptable from hardware point of view.
> It sounds like it's possible on most architectures, but would in
> many cases
> require some sort of a barrier after a volatile write.
> Presumably that
> barrier would entail some cost, since it would wait for the store
> buffer to
> drain, and we know the store buffer is nonempty at that point? If the
> hardware guarantees option (2), I still don't see how you can
> eliminate that
> barrier, though you might be able to postpone it.
> Thus "acceptable" means
> acceptable at substantial cost?
>
Note that for option (3), the barrier would need to go after the read that
returned the value of the non-atomic write. But most machines today wouldn't
require it (including IA-64, Alpha, Origin, TSO). For option (2), yes, we
would need a barrier after volatile writes, but we would need this anyway if
we went with program order constraints for volatile write --> volatile read.
Also, I should reword the above paragraph from the previous message too:
> > My opinion: requiring non-normal writes to ***appear*** atomic (i.e.,
> > *** making options
> > 2 and 3 invisible***) should be acceptable from hardware point of view.
Sarita
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT