Table 1:
Benchmark programs studied in this paper
|
Size in Kbytes |
sjar/ |
sjar/ |
sj0r.gz/ |
|
Benchmark |
sj0r |
jar |
sjar |
sj0r.gz |
sj0r |
jar |
sjar |
Description |
rt |
8,937 |
5,726 |
4,652 |
2,820 |
52% |
81% |
61% |
Java 1.2 runtime |
swingall |
3,265 |
2,193 |
1,657 |
998 |
51% |
76% |
60% |
Sun's new set of GUI Widgets (JFC/Swing 1.1) |
tools |
1,557 |
950 |
737 |
513 |
47% |
78% |
70% |
Java 1.2 tools (javadoc, javac, jar, ...) |
icebrowserbean |
226 |
125 |
116 |
88 |
52% |
93% |
76% |
HTML browser |
jmark20 |
309 |
189 |
173 |
91 |
56% |
91% |
53% |
Byte's java benchmark program |
visaj |
2,189 |
1,524 |
1,157 |
703 |
53% |
76% |
61% |
Visual GUI builder |
ImageEditor |
454 |
359 |
257 |
162 |
57% |
72% |
63% |
Image editor, distributed with VisaJ |
Hanoi |
86 |
57 |
46 |
31 |
54% |
80% |
67% |
Demo applet distributed with Jax |
Hanoi_big |
56 |
37 |
30 |
20 |
53% |
80% |
67% |
Hanoi, partially jax'd |
Hanoi_jax |
38 |
22 |
21 |
16 |
55% |
96% |
74% |
Hanoi, fulled jax'd |
javafig |
357 |
198 |
170 |
143 |
48% |
86% |
84% |
Java version of xfig |
javafig_dashO |
269 |
136 |
131 |
113 |
49% |
96% |
86% |
javafig, processed by dashO |
Programs from SPEC JVM98 (http://www.spec.org/osg/jvm98/) |
201_compress |
15 |
11 |
10 |
6 |
64% |
85% |
59% |
Modified Lempel-Ziv method (LZW) |
202_jess |
270 |
183 |
136 |
64 |
50% |
74% |
47% |
Java Expert Shell System based on NASA's CLIPS expert shell system |
205_raytrace |
52 |
31 |
24 |
15 |
47% |
78% |
64% |
Raytracing a dinosaurs (invoked by 227_mtrt) |
209_db |
10 |
6 |
6 |
5 |
56% |
94% |
84% |
Performs multiple database functions on memory resident database |
213_javac |
516 |
274 |
226 |
143 |
44% |
82% |
63% |
Sun's JDK 1.0.2 Java compiler |
222_mpegaudio |
120 |
68 |
62 |
45 |
51% |
91% |
73% |
Decompresses MPEG Layer 3 audio |
228_jack |
115 |
74 |
55 |
36 |
48% |
74% |
65% |
A Java parser generator that is based on the Purdue Compiler Construction Tool Set (PCCTS) |
|
|
|
|
|
|
|
|
|
sj0r |
non-classfiles excluded, debugging information stripped,
no compression |
jar |
non-classfiles excluded, class files as distributed (debugging information
often not stripped),
files compressed individually |
sjar |
non-classes excluded, debugging information stripped,
files compressed individually |
sj0r.gz |
non-classes excluded, debugging information stripped,
individual files not compressed,
jar file gzip'd as a whole |
|
In this paper, I explore wire-formats for collections of Java class files.
I assume that bandwidth is the most precious resource.
Time required to compress a Java archive is relatively unimportant,
while the time required to decompress must be reasonable (not significantly
longer than using gzip).
The wire-format is a sequential format: all of the class files
must be decompressed in sequence.
As they are decompressed, they can
be written to disk as a conventional jar file or separate classfiles.
These would be completely conventional classfiles that could be used by
a standard JVM.
Alternatively, each class can be directly loaded into a JVM as
it is decompressed, saving the expense of constructing the classfile.
For this, a custom classloader would be required, but no other changes
to the JVM would be required.
See Section
11 for a discussion of eager
class loading.
While it would be possible to include debugging information in a wire-format,
we would typically prefer to save space by excluding it.
I do not encode the attributes LineNumberAttribute, LocalVariableTable nor
SourceFile. Also, because my approach requires that we renumber
entries in the constant pool, I exclude any unrecognized attributes
(we would not be able to update references to the constant pool in
unrecognized attributes).
I also exclude any non-class files (e.g., PNG image files) from
archive in performing my size calculations.
I report compression as the size of the
compressed object, as a percentage of the size of the original object.
To have a consistent and fair comparison of the size of my archive format with
standard jar files, I performed the following
transformations to the benchmarks I studied:
- Remove LineNumberAttribute, LocalVariableTable and SourceFile attributes
- Garbage collect the constant pool (remove unused constants)
- Sort entries in the constant pool according to type
- Sort UTF constants according to their content
These changes typically give a 20% improvement in jar file size
Sorting of the constant pool entries can give an improvement of
several percent when the class file is compressed, because it enables zlib
to do a better job of finding repeated patterns.
In this paper,
when I report the size of original and compressed class files, those sizes
reflect the improvements gained by these transformations.
Any improvements I report for the new techniques in this paper
reflect improvements beyond those gained by
removing debugging information
and garbage collecting the constant pool.
I will often refer to gzip and zlib compression
interchangeable. However, in most situations
where I apply gzip compression I do
not include the 18 bytes for the GZIP header and trailer.
Table:
Benchmark programs studied in this paper
|
Size in Kbytes |
sjar/ |
sjar/ |
sj0r.gz/ |
|
Benchmark |
sj0r |
jar |
sjar |
sj0r.gz |
sj0r |
jar |
sjar |
Description |
rt |
8,937 |
5,726 |
4,652 |
2,820 |
52% |
81% |
61% |
Java 1.2 runtime |
swingall |
3,265 |
2,193 |
1,657 |
998 |
51% |
76% |
60% |
Sun's new set of GUI Widgets (JFC/Swing 1.1) |
tools |
1,557 |
950 |
737 |
513 |
47% |
78% |
70% |
Java 1.2 tools (javadoc, javac, jar, ...) |
icebrowserbean |
226 |
125 |
116 |
88 |
52% |
93% |
76% |
HTML browser |
jmark20 |
309 |
189 |
173 |
91 |
56% |
91% |
53% |
Byte's java benchmark program |
visaj |
2,189 |
1,524 |
1,157 |
703 |
53% |
76% |
61% |
Visual GUI builder |
ImageEditor |
454 |
359 |
257 |
162 |
57% |
72% |
63% |
Image editor, distributed with VisaJ |
Hanoi |
86 |
57 |
46 |
31 |
54% |
80% |
67% |
Demo applet distributed with Jax |
Hanoi_big |
56 |
37 |
30 |
20 |
53% |
80% |
67% |
Hanoi, partially jax'd |
Hanoi_jax |
38 |
22 |
21 |
16 |
55% |
96% |
74% |
Hanoi, fulled jax'd |
javafig |
357 |
198 |
170 |
143 |
48% |
86% |
84% |
Java version of xfig |
javafig_dashO |
269 |
136 |
131 |
113 |
49% |
96% |
86% |
javafig, processed by dashO |
Programs from SPEC JVM98 (http://www.spec.org/osg/jvm98/) |
201_compress |
15 |
11 |
10 |
6 |
64% |
85% |
59% |
Modified Lempel-Ziv method (LZW) |
202_jess |
270 |
183 |
136 |
64 |
50% |
74% |
47% |
Java Expert Shell System based on NASA's CLIPS expert shell system |
205_raytrace |
52 |
31 |
24 |
15 |
47% |
78% |
64% |
Raytracing a dinosaurs (invoked by 227_mtrt) |
209_db |
10 |
6 |
6 |
5 |
56% |
94% |
84% |
Performs multiple database functions on memory resident database |
213_javac |
516 |
274 |
226 |
143 |
44% |
82% |
63% |
Sun's JDK 1.0.2 Java compiler |
222_mpegaudio |
120 |
68 |
62 |
45 |
51% |
91% |
73% |
Decompresses MPEG Layer 3 audio |
228_jack |
115 |
74 |
55 |
36 |
48% |
74% |
65% |
A Java parser generator that is based on the Purdue Compiler Construction Tool Set (PCCTS) |
|
|
|
|
|
|
|
|
|
sj0r |
non-classfiles excluded, debugging information stripped,
no compression |
jar |
non-classfiles excluded, class files as distributed (debugging information
often not stripped),
files compressed individually |
sjar |
non-classes excluded, debugging information stripped,
files compressed individually |
sj0r.gz |
non-classes excluded, debugging information stripped,
individual files not compressed,
jar file gzip'd as a whole |
|
In this paper, I explore wire-formats for collections of Java class files.
I assume that bandwidth is the most precious resource.
Time required to compress a Java archive is relatively unimportant,
while the time required to decompress must be reasonable (not significantly
longer than using gzip).
The wire-format is a sequential format: all of the class files
must be decompressed in sequence.
As they are decompressed, they can
be written to disk as a conventional jar file or separate classfiles.
These would be completely conventional classfiles that could be used by
a standard JVM.
Alternatively, each class can be directly loaded into a JVM as
it is decompressed, saving the expense of constructing the classfile.
For this, a custom classloader would be required, but no other changes
to the JVM would be required.
See Section
11 for a discussion of eager
class loading.
While it would be possible to include debugging information in a wire-format,
we would typically prefer to save space by excluding it.
I do not encode the attributes LineNumberAttribute, LocalVariableTable nor
SourceFile. Also, because my approach requires that we renumber
entries in the constant pool, I exclude any unrecognized attributes
(we would not be able to update references to the constant pool in
unrecognized attributes).
I also exclude any non-class files (e.g., PNG image files) from
archive in performing my size calculations.
I report compression as the size of the
compressed object, as a percentage of the size of the original object.
To have a consistent and fair comparison of the size of my archive format with
standard jar files, I performed the following
transformations to the benchmarks I studied:
- Remove LineNumberAttribute, LocalVariableTable and SourceFile attributes
- Garbage collect the constant pool (remove unused constants)
- Sort entries in the constant pool according to type
- Sort UTF constants according to their content
These changes typically give a 20% improvement in jar file size
Sorting of the constant pool entries can give an improvement of
several percent when the class file is compressed, because it enables zlib
to do a better job of finding repeated patterns.
In this paper,
when I report the size of original and compressed class files, those sizes
reflect the improvements gained by these transformations.
Any improvements I report for the new techniques in this paper
reflect improvements beyond those gained by
removing debugging information
and garbage collecting the constant pool.
I will often refer to gzip and zlib compression
interchangeable. However, in most situations
where I apply gzip compression I do
not include the 18 bytes for the GZIP header and trailer.
Gzip'd jar files of uncompressed class files
The compression done in normal jar files are on
a file-by-file basis. We can achieve better compression
if we compress an entire jar file, where the individual
files in the jar file have not been compressed separately.
In tables and text, I refer to these as
j0r.gz files
(
0 for
no compression within the jar file).
Next: Basic approaches
Up: Compressing Java Class Files
Previous: Introduction
William Pugh