Hotspot will issue a store-byte instruction on all platforms,
even if the inner loop is unrolled. The Sparc hardware Does
The Right Thing. Intel makes no claims about what happens
on Intel hardware, which makes sense since lots of folks make
motherboards and much of the correctness depends on the
motherboard. Intel appears to Do The Right Thing in the
hardware I have in front of me.
Denying the compiler from write-combining for unrolled tight
loops with short element sizes is painful for certain uses;
character conversions and the like.
I can write-combine if I'm going to slather over the whole
array anyways, right? I.e., if there's already a race
condition and 2 threads are busy wiping over the whole
array it doesn't matter whose write wins.
E.g., this is ok to unroll & write-combine:
for( int i=0; i<A.length; i++ )
A[i] = translate(A[i]);
Really, what I am denied is reading an old value, then
writing the same old value back (as part of a
write-combining optimization). I can't write to a value
that isn't being written to already.
E.g., this is NOT ok to write-combine:
for( int i=0; i<A.length; i+=2 /* skip every other element! */)
A[i] = translate(A[i]);
The write-combined code looks something like this (modulo
hastily written syntax errors and an alignment pre-loop):
byte A[];
for( int i=0; i<A.length; i+=4 /*unrolled!*/ ) {
int I = *(int*)A[i]; /* bogus Java syntax for doing an int-load
from a byte array */
int B0 = translate( X & 0xFF);
int B1 = (X>> 8) & 0xFF ;
int B2 = translate((X>>16) & 0xFF);
int B3 = (X>>24) & 0xFF ;
I = (B3<<24) | (B2 << 16) | (B1 << 8) | B0;
(int*)A[i] = I; /* bogus Java syntax for doing an int-write to a
byte array */
}
This will show word-tearing if another thread is trying to
write to the alternate bytes. I can live without this, but
I still want to write-combine in the first example above.
Cliff
Doug Lea wrote:
> One of the harder cases to deal with about word-tearing is when
> different threads all write into different, adjacent elements of a
> shared array. As in:
>
> class SharedArray {
> final static int N = 100;
> final static byte[] array = new byte[N];
>
> public static void main(String[] args) {
> for (int i = 0; i < N; ++i) {
> final int index = i;
> Thread t = new Thread() {
> volatile int old = 0;
> public void run() {
> for (int k = 0; k < 10000000; ++k) {
> int current = ++array[index];
> if ((current & 0xFF) != ((old+1) & 0xFF)) throw new Error();
> old = current;
> }
> }
> };
> t.start();
> }
> }
> }
>
>
> Can/should we say that this is guaranteed to work only if "array" is
> declared as "volatile"? The argument here is that the array itself is
> shared, so should be marked as volatile (even though none of its
> elements are shared). This is basically the same story we give for
> other uses of volatile arrays. (The underlying snag is, as usual,
> that there is not syntax to declare the elements of arrays final or
> volatile.)
>
> This might be enough of a hook so that compilers could do the right
> thing (here, maybe use 32bits for the elements) on machines otherwise
> susceptible to word-tearing.
>
> (BTW, this code runs without error on multiway sparcs using hotspot 1.4beta3)
>
> -Doug
>
-------------------------------
JavaMemoryModel mailing list - http://www.cs.umd.edu/~pugh/java/memoryModel
This archive was generated by hypermail 2b29 : Thu Oct 13 2005 - 07:00:37 EDT