Next: The JMM is too
Up: Fixing the Java Memory
Previous: Reality
Subsections
A New Proposal
In this section, I propose a new Java memory model. This model is closely coupled
to the Java virtual machine. The rules for Java source programs
can be derived by a simple and naive translation from Java source to
Java bytecode, and then using the rules of this model.
A Java thread executes read, write, lock, unlock and think actions:
-
A read action corresponds to a getfield, getstatic or arrayload Java bytecode instruction.
-
An write action corresponds to a putfield, putstatic or arraystore Java bytecode instruction.
-
A lock action corresponds to a monitorenter Java bytecode instruction.
-
A unlock action corresponds to a monitorexit Java bytecode instruction.
-
A think action corresponds to all other Java bytecode instructions.
A memory action is either a read or write action.
Within a thread, there is a direct dependence ordering between two actions A and B if
A occurs before B in the original program,
and:
- 1.
- there is a flow dependence from A to B (i.e., the value
computed/read/written by A effects the action
performed by B). Issues such as stack depth and
stack manipulation instructions (e.g., swap)
are ignored in determining flow dependences.
- 2.
- A and B are lock and memory actions (either order).
- 3.
- A is a write action and B is an unlock action (either order).
- 4.
- A and B are both memory actions
on the same variable and at least
one of them is a write action.
- 5.
- A and B are both memory action on volatile variables.
The required dependence ordering of actions is the transitive closure of
direct dependence ordering. Actions within a thread can be ordered
in any way that respects these orders. These constraints
make it impossible to determine that a threads actions have been
reordered, except through interaction with another thread or other
external agent (e.g., a debugger).
Note that there is no requirement that the ordering of actions
respect control dependences (there is a control dependence
when one instruction influences whether another instruction
will be performed). The Sparc RMO memory model allows
reordering of loads that doesn't respect control dependences (e.g.,
speculative loads), but doesn't allow speculative stores (since you can't
undo them). Defining control dependence in Java is a little
tricky, since many instructions (and all memory actions) can throw
an exception that prevent following instructions from occurring). If
we included such exceptions in computing control dependence, then
we wouldn't be able to perform any reordering of writes at all.
Instead, we allow actions to be reordered
as though the system had exact knowledge of the path of
program execution. Loads may be done speculatively, and
stores may be done in a manner that appears to be speculative.
However, a store may not be performed unless it is guaranteed that
the thread will execute the store (excluding situations such
as a VirtualMachineError or ThreadDeath error). This is intended
to allow the compiler to use any form of static or runtime analysis to
predict which paths will be taken and which exceptions cannot be thrown.
If a memory action A and a read action B
reference the same non-volatile variable and A and B are
reordered so that B immediately follows A, then B can be replaced
with a think action that computes the same value as was read/written
by A. This rule subsumes both scalar replacement by the compiler
and write buffers within a processor. For example, this rule, combined
with the reordering rules above, allow for the behavior seen in
Figure 7.
Without this scalar replacement rule, such behavior would be illegal.
If two write actions A and B
reference the same non-volatile variable and A and B are
reordered so that A immediately precedes B, then A can be deleted.
My proposed Java memory model is neither stronger nor
weaker than the existing Java memory model. My model requires
that memory operations not be reordered in a way that
violations the data dependences of the program, while the
old model does not. However, it is hard to imagine how
one could take advantage of the additional freedom offered
by the old model.
On the other hand, my model does not require
Coherence nor does it impose anomalous constraints such as
shown in Figures 4 - 5.
The above proposal is designed so that in the absence of synchronization,
it has no impact on compiler optimizations and can be executed on
architectures such as the Sparc V9 RMO without memory barriers.
However, it does not enforce Coherence, while the original
JMM did. The only effect this has is on successive reads
of the same variable.
The benifits of enforcing Coherence is unclear. But if
is it desired, Coherence can be enforced by changing rule 4
so that there is a direct dependence even if both A and B are read actions.
Figure 8:
Interference by other threads on same processor
 |
One issue that needs to be addressed
is that processor memory models are in terms of processors,
while the Java memory model is in terms of threads
and has no concept of processors. Consider
the example in Figure 8.
Threads 2 and 4 should only be able to see the writes to
a[1] and a[2] through main memory. Which write
happened first in main memory?
If y = 0, then we must have w = 2 and
the write of a[1] occurring before the write of a[2].
If z = 0, then we must have x = 2 and
the write of a[2] occurring before the write of a[1].
This suggests that the result in Figure 8
can't happen.
However, unless we are careful, it can. Consider the case where, on
processor 1, the write to a[1] is initiated first, followed
by the instructions for thread 2. On processor 2, the write to a[2]
is executed before the instructions for thread 4. All of the instructions in
thread 2 and 4 finish execution before the writes from threads 1 and 3
exit the write buffers on processors 1 and 2. In this case, w and x
will get their values from the write buffer, and y and z
could get their values from the cache (which is coherent because the writes
haven't exited the write buffer).
One way to fix this is to require a memory barrier when switching threads
on a processor. On a multiprocessor that implemented
the Sparc RMO memory model,
you would need a Membar #Lookaside instruction as part of a context switch.
The context switch is probably expensive enough that you won't notice
the cost of the Membar instruction.
On an architecture such as the Tera, which has very fast context-switching (between
instructions), this could prove to be more of a problem.
It might be possible to weaken the memory model to allow for the execution shown in Figure
8, but I'll leave that for another time.
Figure 9:
Reordering of field initialization and ref update
 |
Next: The JMM is too
Up: Fixing the Java Memory
Previous: Reality
William Pugh