Next: The JMM is too Up: Fixing the Java Memory Previous: Reality

Subsections

A New Proposal

In this section, I propose a new Java memory model. This model is closely coupled to the Java virtual machine. The rules for Java source programs can be derived by a simple and naive translation from Java source to Java bytecode, and then using the rules of this model. A Java thread executes read, write, lock, unlock and think actions:

A read action corresponds to a getfield, getstatic or arrayload Java bytecode instruction.
An write action corresponds to a putfield, putstatic or arraystore Java bytecode instruction.
A lock action corresponds to a monitorenter Java bytecode instruction.
A unlock action corresponds to a monitorexit Java bytecode instruction.
A think action corresponds to all other Java bytecode instructions.

A memory action is either a read or write action.

Within a thread, there is a direct dependence ordering between two actions A and BifA occurs before Bin the original program, and:

1.: there is a flow dependence from A to B (i.e., the value computed/read/written by A effects the action performed by B). Issues such as stack depth and stack manipulation instructions (e.g., swap) are ignored in determining flow dependences.
2.: A and B are lock and memory actions (either order).
3.: A is a write action and B is an unlock action (either order).
4.: A and B are both memory actions on the same variable and at least one of them is a write action.
5.: A and B are both memory action on volatile variables.

The required dependence ordering of actions is the transitive closure of direct dependence ordering. Actions within a thread can be ordered in any way that respects these orders. These constraints make it impossible to determine that a threads actions have been reordered, except through interaction with another thread or other external agent (e.g., a debugger).

Control Speculation

Note that there is no requirement that the ordering of actions respect control dependences (there is a control dependence when one instruction influences whether another instruction will be performed). The Sparc RMO memory model allows reordering of loads that doesn't respect control dependences (e.g., speculative loads), but doesn't allow speculative stores (since you can't undo them). Defining control dependence in Java is a little tricky, since many instructions (and all memory actions) can throw an exception that prevent following instructions from occurring). If we included such exceptions in computing control dependence, then we wouldn't be able to perform any reordering of writes at all.

Instead, we allow actions to be reordered as though the system had exact knowledge of the path of program execution. Loads may be done speculatively, and stores may be done in a manner that appears to be speculative. However, a store may not be performed unless it is guaranteed that the thread will execute the store (excluding situations such as a VirtualMachineError or ThreadDeath error). This is intended to allow the compiler to use any form of static or runtime analysis to predict which paths will be taken and which exceptions cannot be thrown.

Scalar Replacement

If a memory actionA and a read action Breference the same non-volatile variable andA and Bare reordered so thatB immediately follows A, then Bcan be replaced with a think action that computes the same value as was read/written byA. This rule subsumes both scalar replacement by the compiler and write buffers within a processor. For example, this rule, combined with the reordering rules above, allow for the behavior seen in Figure 7. Without this scalar replacement rule, such behavior would be illegal.

Dead Store Elimination

If two write actionsA and Breference the same non-volatile variable andA and Bare reordered so thatA immediately precedes B, then Acan be deleted.

Comparison with the Old JMM

My proposed Java memory model is neither stronger nor weaker than the existing Java memory model. My model requires that memory operations not be reordered in a way that violations the data dependences of the program, while the old model does not. However, it is hard to imagine how one could take advantage of the additional freedom offered by the old model.

On the other hand, my model does not require Coherence nor does it impose anomalous constraints such as shown in Figures 4 - 5.

Enforcing Coherence

The above proposal is designed so that in the absence of synchronization, it has no impact on compiler optimizations and can be executed on architectures such as the Sparc V9 RMO without memory barriers. However, it does not enforce Coherence, while the original JMM did. The only effect this has is on successive reads of the same variable.

The benifits of enforcing Coherence is unclear. But if is it desired, Coherence can be enforced by changing rule 4 so that there is a direct dependence even if bothA and Bare read actions.

**Figure 8:** Interference by other threads on same processor
$\begin{figure}\begin{center} \begin{tabular}{\vert l\vert l\vert\vert l\vert l\v... ...malous result: w = 2, x = 0, y = 1, z = 0} \end{tabular}\end{center}\end{figure}$

Threads, not Processors

One issue that needs to be addressed is that processor memory models are in terms of processors, while the Java memory model is in terms of threads and has no concept of processors. Consider the example in Figure 8. Threads 2 and 4 should only be able to see the writes to a[1] and a[2] through main memory. Which write happened first in main memory? If y = 0, then we must have w = 2 and the write of a[1] occurring before the write of a[2]. If z = 0, then we must have x = 2 and the write of a[2] occurring before the write of a[1]. This suggests that the result in Figure 8 can't happen.

However, unless we are careful, it can. Consider the case where, on processor 1, the write to a[1] is initiated first, followed by the instructions for thread 2. On processor 2, the write to a[2] is executed before the instructions for thread 4. All of the instructions in thread 2 and 4 finish execution before the writes from threads 1 and 3 exit the write buffers on processors 1 and 2. In this case, w and x will get their values from the write buffer, and y and z could get their values from the cache (which is coherent because the writes haven't exited the write buffer).

One way to fix this is to require a memory barrier when switching threads on a processor. On a multiprocessor that implemented the Sparc RMO memory model, you would need a Membar #Lookaside instruction as part of a context switch. The context switch is probably expensive enough that you won't notice the cost of the Membar instruction.

On an architecture such as the Tera, which has very fast context-switching (between instructions), this could prove to be more of a problem. It might be possible to weaken the memory model to allow for the execution shown in Figure 8, but I'll leave that for another time.

**Figure 9:** Reordering of field initialization and ref update
$\begin{figure}\begin{center} \begin{tabular}{\vert c\vert c\vert} \multicolumn{... ...ine \multicolumn{2}{c}{a = 0 (!?), 1 or 3} \end{tabular}\end{center}\end{figure}$

Next: The JMM is too Up: Fixing the Java Memory Previous: Reality

William Pugh