Next: Compressing References
Up: Compressing Java Class Files
Previous: Basic approaches
Structuring information
In order to reduce redundancy in my archive format,
I redesigned the basic structure of information
in a Java classfile. You can think2
of this restructuring
as being an in-memory format for encoding classfiles,
which is built and then encoded into a bytestream.
Some of the
things I did in my reorganization:
- Classnames are encoded as a package name and a
simple class name. All classes from the same package
will share the same package name, and classes
from different packages can share the same simple
class name.
For example, the package name java.lang will occur only once.
- In Java classfiles, the types of methods and fields
are encoded as strings. In my restructured format,
a method type is encoded as an array of classes
containing the return type and the argument types.
A field type is just a class. Primitive types
and array types are encoded as special class references
that are converted back to primitive types
when decompressed.
- Generic Attributes have been eliminated. Instead, additional
flags are set in the access flags that say whether specific
attributes apply to this object. For example, there
is a bit in the access flags for a Field definition
that tells whether the field has a constant value. If so,
then there is an additional reference to a constant value
(e.g., an integer or a string).
Once we have a collection of class files in
our internal format, the wire code is generated/parsed by a preorder
traversal of the data-structure, starting from
the roots.
As each edge
is traversed, an appropriate reference is encoded. As each primitive (int, long,
float, or double) is encountered, it is encoded.
The internal format for Code (attached to MethodsDefinitions) is
more complicated.
I separate bytecode into streams of opcodes, registers numbers,
integer constants, virtual method references, field method references, and
so on. The encoding of bytecodes is discussed more throughly in
Section 7
Figure 1:
Fragment of Internal format for class files
ClassDefinition [] classesDefined;
class PackageName { String name; }
class SimpleClassName { String name; }
class MethodName { String name; }
class FieldName { String name; }
class ClassRef {
PackageName & packageName;
SimpleClassName & simpleClassName;
}
class ClassDefinition {
ClassRef & thisClass;
int access_flags;
ClassRef & superClass;
ClassRef & [] interfaces;
MethodDefiniton [] methods;
FieldDefinition [] fields;
}
class ExceptionRef {
ClassRef & clazz;
}
|
class MethodRef {
ClassRef & owner;
MethodName & methodName;
ClassRef & type[];
}
class MethodDefinition {
MethodRef & method;
int access_flags;
Code code;
ExceptionRef & exceptionsThrown[];
}
class FieldRef {
ClassRef & owner;
FieldName & fieldName;
ClassRef & type;
}
class FieldDefinition {
FieldRef & field;
int access_flags;
Object & constantValue;
}
|
& is used to indicate a reference to an object that may be shared and might have been seen before
|
Next: Compressing References
Up: Compressing Java Class Files
Previous: Basic approaches
William Pugh