JSR-000XXX Revise the Specification of the JavaTM Threads
Identification |
Request |
Contributions
Submitting
Participant: |
Univ. of Maryland |
Name of
Contact Person: |
William Pugh |
E-Mail
Address: |
pugh@cs.umd.edu |
Telephone
Number: |
301-405-2705 |
Fax Number: |
301-405-2744 |
Section 2: Request
2.1 - Please describe the proposed Specification:
A specification that describes the semantics of threads, locks,
volatile variables
and
data races. This specification will be a replacement for
Chapter 17 of the Java Language Specification (and
Chapter 8 of the Java Virtual Machine Specification).
2.2 - What is the target Java platform?
(i.e., desktop, server, personal, embedded, card, etc.)
All platforms
2.3 - What need of the Java community
will be addressed by the proposed specification?
Programmers need to be able to understand which
thread communication idioms are legal and to
write reliable multithreaded
software; JVM implementors need to be able to implement
a high-performance JVM without violating the Java specification.
2.4 - Why isn't this need met by existing specifications?
Chapter 17 of the Java Language Specification (and
Chapter 8 of the Java Virtual Machine Specification)
describe the semantics of threads and locks, as well
as related features such as volatile variables.
Unfortunately, that specification has been found
to be very hard to understand and has many subtle,
unintended implications. It is unclear if anyone
actually understands the entire specification and its
implications. Many synchronization idioms recommended
in books and articles are invalid according to the
existing specification. Subtle, unintended implications of the
existing specification prohibits common compiler
optimizations done by essentially all Java virtual
machines and would be prohibitively expensive
to enforce on many existing processor architectures.
A number of people have looked at the specification
and decided that patching or modifying the existing
specification could not produce a satisfactory and understandable result.
Therefore, we recommend that a replacement specification
be developed.
2.5 - Please give a short description
of the underlying technology or technologies:
Some of the issues/features/goals we would consider in revising
this specification are:
- For "correctly synchronized" programs, the semantics
should be simple and intuitive. Of course, this depends
on defining "correctly synchronized" both formally and
intuitively.
- For incorrectly synchronized programs, it is not acceptable
to just say that the semantics are undefined (as has been
done in a number of other language specifications, such as
Modula-3 and Ada).
Instead, we need to determine which safety guarantees need
to be made so that incorrectly synchronized programs cannot
be used to attack the security of a system.
- A primary concern is the ability of unsophisticated programmers
to create reliable/correct multithreaded programs. To accomplish
this goal, we must balance the needs for
- a simple, easy to understand model, and
- a model that does not overly restrict the possible
ways to write reliable programs.
- A secondary concern is to allow the creation of
high performance JVM implementations across a
wide range of platforms.
- There exist a number of dubious coding idioms, such as the double-check
idiom, that are designed to allow threads to communicate without
synchronization. Almost all such idioms are broken
under the existing semantics. Changing the semantics
to allow such idioms to work would impose substantial
performance penalties on certain platforms, even for
code that did not use the dubious idioms.
- It is expected than many of these synchronization-avoiding
idioms will also be broken under the revised semantics.
- We will develop educational material and lead an educational effort
to inform developers
of commonly used incorrect idioms.
- Where possible, we will develop tools that statically detect
some occurrences of common incorrect idioms.
- Strengthened semantics for volatile (F)
should allow many of these idioms
to be fixed by making a single field volatile.
- Except for a very few subtle and dubious cases, we do not anticipate
breaking any code that is guaranteed to work under the existing
semantics.
- The ability to declare a field as volatile
was adopted from C/C++,
where it was originally use for memory-mapped I/O devices.
In Java, volatile is primarily/solely used for fields that will
be accessed
without synchronization.
-
Very little existing Java code uses volatile,
because programmers are unsure of the semantics of volatile
and because they are unaware of the importance of
volatile for code not using synchronization.
-
Unfortunately, the existing
semantics of volatile are weak enough that many (apparently
reasonable) uses of volatile are invalid. Furthermore,
most existing JVM's do not correctly implement the existing
semantics for volatile.
-
We will look
at strengthening the semantics of volatile to make it
easier to use correctly.
- Most programmers assume that immutable objects -- objects whose
fields are only set in their constructor -- such as String do
not need synchronization. Unfortunately, in an incorrectly
synchronized program, it is possible for a thread to observe
a unsynchronized immutable object change. In particular,
it is possible for a String to first appear to have the value "/tmp",
and on later observation appear to have the value "/usr". This has
clear and serious implications for security.
- The existing semantics allow this behavior, as do the
specifications of
shared memory multiprocessors with weak memory models.
This could be fixed by making all of the methods of the String
class synchronized. But this would be non-intuitive, and would
impose a performance penalty on all Java platforms, even though
it is needed only on a tiny percentage of them.
- We need to allow programmers to create classes that
represent truly immutable objects, while not imposing
a significant performance penalty on platforms where nothing
needs to be done to ensure true immutability.
- We expect to do this by strengthening the semantics of final
fields to allow a guarantee of true immutability, even in the presence
of data races. This strengthening would also permit
more aggressive compiler optimization of code using
final fields.
- As part of this change, we may prohibit the use of
native code and/or reflection to change
final fields (with some sort of back door provided
to allow backwards compatibility for System.in, System.out
and System.err).
- As part of this effort, we should try to understand the
potential implementation impact of any proposed semantics.
In particular, some proposed semantics will be more expensive
to implement on processors with weak memory models and expensive
synchronization.
- Some of the changes contemplated, particularly the changes
to the semantics of volatile and final, will require that
JVM's be changed in order to be compliant with the new specification.
- We will develop compatibility tests that will automatically test
whether a JVM enforces some of the guarantees made by
the thread specification (other guarantees may be difficult
or infeasible to test).
- We will also consider standards as to how the thread safety properties
of an API should be documented. Javadoc (correctly) does not list whether
a method is synchronized, because a method could be thread safe due to
internal synchronization. For example, is java.io.ByteArrayInputStream
guaranteed to be thread safe? Nothing in the document says that it
is, but Sun's standard implementation is. Making the implementations
of input and output
streams was a dubious decision incurring substantial performance
penalties.
Could a valid Java implementation provide unsynchronized I/O streams?
One possibility would be devise Javadoc tags for thread safety and
guidelines for using them.
- The semantics of multithreaded programs are far more subtle
than previously thought. Even simple ideas like "happened previously"
can become subtle and complicated in the presence of multiple
threads.
We will review other thread related issues, such as class initialization,
asynchronous exceptions, finalizers, sleep, wait, join and
interrupts, in accordance
with the goals described above.
2.6 - Is there a proposed package
name for the API Specification? (i.e., javapi.something,
org.something,
com.something, etc.)
No
2.7 - Does the proposed specification
have any dependencies on specific operating systems, CPUs, or I/O devices
that you know of?
Many of the most surprising behaviors
of a multithreaded program can happen only on a shared memory multiprocessor
with a weak memory model (e.g., SMP Alpha systems). Similarly,
the cost of strengthening the specification will be highest
on shared memory multiprocessors
with a weak memory model.
However, the specification should make the same guarantees about behavior
on all platforms, even if in practice it might be impossible for some legal
behaviors to occur on certain platforms.
2.8 - Are there any security issues
that cannot be addressed by the current security model?
No
2.9 - Are there any internationalization or localization
issues?
No.
2.10 - Are there any existing specifications
that might be rendered obsolete, deprecated, or in need of revision as
a result of this work?
Chapters 17 of the JLS and chapter 8 of the JVMS will be completely
replaced. Other changes may be needed elsewhere in the JLS and JVMS.
Section 3: Contributions
3.1 - Please list any existing documents,
specifications, or implementations that describe the technology. Please
include links to the documents if they are publicly available.
Discussion of the issues in this JSR has been ongoing for some
time. Some of this discussions have taken place on
the Java Memory Model
mailing list. The
Java Memory Model
web page contains archives of those discussions, as well as links
to related resources.
Some of the relevant resources:
- Technical papers:
- Fixing
the Java memory model,
Proceedings of the ACM 1999 conference on Java
Grande, 1999
Pages 89-98. (Slides from
talk)
-- this paper sets out some of the problems with the existing
specification,
although the solution proposed in this paper has been found to be
inadequate and it not recommended for consideration.
- Improving
the Java Memory Model with CRF,
Jan-Willem Maessen, Arvind
and Xiaowei Shen, Computation Structures Group Memo 428, MIT, 2000.
- Bill's current proposal
- Descriptions of double-check idiom:
- Lazy
instantiation, Philip Bishop and Nigel Warren, JavaWorld Magazine
- Reality
Check,
Douglas C. Schmidt, C++ Report, SIGS, Vol. 8, No. 3, March 1996.
-
Double-Checked
Locking:
An Optimization Pattern for Efficiently Initializing and Accessing
Thread-safe Objects,
Douglas Schmidt and Tim Harrison.
3rd annual Pattern Languages of Program Design conference, 1996
- Programming
Java threads in the real world, Part 7, Allen Holub,
Javaworld Magazine, April 1999.
3.2 - Explanation of how these items
might be used as a starting point for the work.
The specifications listed above could be used as a basis for the
public draft. Where they fall short of stated goals or constraints,
further work will be needed to determine the best course of action