Thanks. That helps to clarify things, though I still have a few more
questions/issues:
> From: Sarita Adve [mailto:sadve@cs.uiuc.edu]
>
> Ok, I classify this issue as write atomicity. In other
> words, when a write
> becomes visible to a processor, does it become visible to all
> processors at
> the same time? There are at least three options:
>
> (1) Yes, a write becomes visible to all processors at the same time.
> (2) A write can become visible to its own processor early, but becomes
> visible to all other processors at the same time.
> (3) A write can become visible to some other processors early.
>
> I believe your question is about whether we should permit option 3.
Are there cases in which option 2 gives you some useful properties that
option 3 doesn't?
I'm not sure that as a programmer the distinction between 2 and 3 matters.
Does it?
>
> The IA-64 model does not seem to permit option 3 for release
> and semaphore
> writes (I am tentative because their wording isn't formal,
> but all their
> examples show that it does not permit it). IA-64 does allow
> option (3) for
> other (non-release, non-semaphore) writes. For these "normal"
> writes, in my
> understanding, there is no general way to use a fence to
> ensure that the
> writes appear atomic.
That's also my understanding of the spec.
>
> The PowerPC model permits option 3 for all writes. The SYNC
> instruction
> could be used to get atomicity, but the original
> specification had some
> ambiguities about this and I am not sure if a correction was
> published.
>
> As far as real machines - The last time I did a survey of
> this issue (with
> Kourosh about five years ago), the only real machine that
> permitted option
> (3) was the Cray T3D.
Did you also look at large NUMA machines like an SGI O2K? Or was this done
too early? I suspect we don't need to worry about the T3D, but it would be
nice to have some evidence that NUMA machines aren't likely to end up
weakening memory models in this respect.
>
> As far as option (2), it enables reading early from a
> processor's own write
> buffer. Many systems allow this, including Alpha, SPARC, and
> Intel. However,
> it turns out that this is not detectable with Alpha and
> PowerPC models (and
> for that matter with SC).
>
> Option (2) is related to local data dependences and program
> order related
> memory model constraints. If we go with requiring that program ordered
> volatile write followed by volatile read should be usable for
> ordering, then
> option (2) is prohibited directly. If we want to go with more
> of TSO like
> semantics for program ordering of volatile write --> volatile
> read, then we
> would need to debate this issue.
>
> My opinion: requiring non-normal writes to be atomic (i.e.,
> prohibit options
> 2 and 3) should be acceptable from hardware point of view.
It sounds like it's possible on most architectures, but would in many cases
require some sort of a barrier after a volatile write. Presumably that
barrier would entail some cost, since it would wait for the store buffer to
drain, and we know the store buffer is nonempty at that point? If the
hardware guarantees option (2), I still don't see how you can eliminate that
barrier, though you might be able to postpone it. Thus "acceptable" means
acceptable at substantial cost?
Hans
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:33 EDT