Intentionally writing code with data races is something best reserved for low-level native implementations of synchronization primitives. Most programmers should just not count on any specific behavior in code containing data races. However, the expectation that all objects are properly initialized (assuming the constructors are written properly), seems a worthwhile property to guarantee.
The existing Java memory model impacts both compiler optimization and insertion of memory barriers. Unfortunately, I have no empirical data on the performance impact of these issues. Part of the problem is that the impact may be minimal now, but grow as compilers and processor architectures become more aggressive.
More debate is needed on the Java memory model, and I have no illusions that this paper will settle the issue. But I hope it will be an important step in discussions leading to a solution.