Next: Evaluation
Up: Compressing Java Class Files
Previous: Compressing Sets of Strings
One reason that my packed format is more compact is that
multiple class files are combined into a single packed
format that shares information.
If each class file were packed separately, the
total amount of data that needs to be communicated increases.
Another question is how much of the compression in my packed format
is due to gzip, and how much is because of the more compact encoding.
On normal classfiles, gzip provides a compression factor of about 2.
These effects of combining classfiles and using gzip are broken out in
Table 5.
Not using gzip may be appropriate on very lightweight clients where
running zip is impossible or too expensive.
Table 5:
Effects of separate packing and not gzipping
|
% of size of jar file |
|
of gzip'd classfiles |
Option |
javac |
mpegaudio |
Standard |
22% |
37% |
Packed Separately |
52% |
56% |
Not gzip'd |
49% |
99% |
Packed Separately and not gzip'd |
87% |
118% |
|
There is one issue we must be careful about when decompressing
an archive. Normally, when we need to create a reference
to a constant pool entry in a reconstructed classfile,
we can just assign the element referenced to any free slot
in the constant pool. However, the bytecode LDC instruction
can only encode an index in the range 1-255. These
instructions can only reference integer, float and string
constants.
The first fix is to assign integer, float and string constant
pool entries the smallest available index.
Other constant pool entries are assigned in the largest
available index; we transmit the total number of constant pool
entries required as part of are encoding.
This almost fixes the problem. However, if there are more than 255
integer, float and string constants referenced in a classfile, which ones
are assigned small indices? We would like to ensure that the same set
of constants is assigned small indices as in the original classfile;
otherwise, we would have to change
some LDC instructions to LDC_W instructions, which are of different
sizes. This would then require patching all jump offsets that traversed
the changed instruction.
Instead, if a integer, float or string constant is referenced
with a LDC_W instruction, then it is assigned a high constant
pool index; if it is referenced with a LDC instruction, it is assigned
a low constant pool index. This assumes that a classfile
doesn't reference the same constant pool entry with both a LDC and a LDC_W
instruction. It would be inefficient to do so, and can be fixed (and made
more efficient) when the classfile is encoded if necessary.
This almost fixes the problem, except that a
integer, float or string constant can also be referenced
as a constant value for a field. We use an additional bit
in the access flags for a field
to encode whether a constant value int/float/string
should be assigned a high index.
Table 6:
Compression ratios
|
Size in KBytes |
Size as % of jar format |
Size as % of packed format |
Benchmark |
jar |
j0r.gz |
Jazz |
Packed |
j0r.gz |
Jazz |
Packed |
Strings |
Opcodes |
Ints |
Refs |
Misc |
209_db |
6 |
5 |
4 |
3 |
84% |
66% |
49% |
34% |
28% |
9% |
17% |
13% |
201_compress |
10 |
6 |
4 |
3 |
59% |
41% |
29% |
29% |
32% |
14% |
17% |
8% |
Hanoi_jax |
21 |
16 |
12 |
7 |
74% |
58% |
32% |
21% |
30% |
13% |
27% |
9% |
205_raytrace |
24 |
15 |
12 |
7 |
64% |
50% |
30% |
20% |
33% |
9% |
22% |
16% |
Hanoi_big |
30 |
20 |
15 |
9 |
67% |
52% |
29% |
25% |
27% |
14% |
26% |
8% |
Hanoi |
46 |
31 |
23 |
13 |
67% |
49% |
29% |
22% |
29% |
12% |
29% |
8% |
228_jack |
55 |
36 |
30 |
17 |
65% |
55% |
30% |
32% |
21% |
14% |
21% |
11% |
222_mpegaudio |
62 |
45 |
34 |
23 |
73% |
54% |
37% |
9% |
24% |
37% |
12% |
18% |
icebrowserbean |
116 |
88 |
80 |
39 |
76% |
69% |
34% |
21% |
31% |
11% |
26% |
12% |
javafig_dashO |
131 |
113 |
102 |
53 |
86% |
78% |
41% |
23% |
28% |
8% |
29% |
12% |
202_jess |
136 |
64 |
42 |
23 |
47% |
31% |
17% |
23% |
28% |
12% |
26% |
11% |
javafig |
170 |
143 |
122 |
64 |
84% |
71% |
38% |
28% |
26% |
8% |
27% |
11% |
jmark20 |
173 |
91 |
86 |
35 |
53% |
50% |
20% |
22% |
25% |
13% |
28% |
12% |
213_javac |
226 |
143 |
90 |
50 |
63% |
40% |
22% |
18% |
29% |
15% |
27% |
11% |
ImageEditor |
257 |
162 |
123 |
64 |
63% |
48% |
25% |
22% |
28% |
16% |
24% |
10% |
tools |
737 |
513 |
477 |
204 |
70% |
65% |
28% |
26% |
27% |
10% |
27% |
11% |
visaj |
1,157 |
703 |
691 |
238 |
61% |
60% |
21% |
23% |
26% |
12% |
31% |
8% |
swingall |
1,657 |
998 |
887 |
338 |
60% |
54% |
20% |
19% |
28% |
13% |
31% |
9% |
rt |
4,652 |
2,820 |
8,435 |
1,069 |
61% |
181% |
23% |
22% |
28% |
13% |
27% |
10% |
jar |
Size of jar file with individual class files stripped of debugging information and compressed |
j0r.gz |
Size of gzip of jar file with class files stripped of debugging information and but not compressed |
Jazz |
Size of Jazz archive [BHV98] (See Section 13.1) |
Packed |
Size of archive produced by techniques in this paper |
|
Figure 2:
Graph of compression ratios
|
Next: Evaluation
Up: Compressing Java Class Files
Previous: Compressing Sets of Strings
William Pugh