Silly me. If I had first looked at how similar issues were addressed
in multiprocessor linux kernels running on different processors, I
could have saved myself a lot of time and trouble looking things up
and figuring them out. It's comforting that I reached the same
conclusions they did though.
A project at the Linux Scalability Effort looked into "data dependent
read barriers" that address exactly the issues encountered for final
field reads etc. (i.e., whether you really need a LoadLoad barrier
separating the load of a reference and to the load of a field off that
reference). See:
http://lse.sourceforge.net/locking/wmbdd.html
and more generally
http://lse.sourceforge.net/locking/
and more generally
http://lse.sourceforge.net/
They also conclude that (of all processors linux runs on) only alphas
need load barrier instructions for dependent reads. Current 2.5
kernels have a "read_barrier_depends" function that is a no-op on
everything except alphas. See below. They further propose to replace
this with just dropping read_barrier_depends and, on alphas, using a
"write barrier shootout" across processors trigerred by StoreStore
barriers that forces read-side clients to become consistent enough to
read without barriers. This keeps reads cheap, but is surely very
expensive for writes. This suggestion seems not to have made it into
linux kernels though, but might be worth considering in JVMs.
Here's a good short description of it pasted from a posting by Dipankar
Sarma <dipankar@in.ibm.com> at http://lwn.net/Articles/5159/ .
You might find the example familiar-looking :-)
Sometime ago, during a discussion on lock-free lookups, it was
agreed that an additional memory barrier interface,
read_barrier_depends() that is lighter than an rmb(),
is necessary to make sure that data-dependent reads are not
re-ordered over this barrier. For many processors, data-dependency
enforces order, so this interface is a NOP, but for those that don't
(like alpha), it can be a read_barrier().
For example, the following code would force ordering (the initial
value of "a" is zero, "b" is one, and "p" is "&a"):
CPU 0 CPU 1
b = 2;
memory_barrier();
p = &b; q = p;
read_barrier_depends();
d = *q;
because the read of "*q" depends on the read of "p" and these
two reads should be separated by a read_barrier_depends(). However,
the following code, with the same initial values for "a" and "b":
CPU 0 CPU 1
a = 2;
memory_barrier();
b = 3; y = b;
read_barrier_depends();
x = a;
does not enforce ordering, since there is no data dependency between
the read of "a" and the read of "b". Therefore, on some CPUs, such
as Alpha, "y" could be set to 3 and "x" to 0. read_barrier()
needs to be used here, not read_barrier_depends().
-Doug
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:42 EDT