Java bytecode sequences are a mixture of opcodes, integer constants, register numbers, constant pool references and branch offsets. As has been suggested previously [EEF+97], we might be able to achieve better compression if we separate that information into separate streams and compressed them independently.
Of course, note the ``we might''. It isn't guaranteed. For example, local 0 is initially (and generally throughout a method) used to store this (for non-static methods). There are some instruction patterns that depend on the registers and other values in the bytecode sequence. For example, an aload instruction is much more commonly followed by a getfield instruction when the aload instruction loads local 0. As it turns out, we would pick this up even though bytecodes are separated, because a special opcode is used for loading a reference from local 0 (aload_0). When we separate out the operands from the opcodes, we don't separate out the implicit operands in opcodes such as iconst_2 and aload_0.
Table 4 shows sample compression factors for bytecodes, and for various components of bytecodes. As you can see, we get substantially better compression factors for a sequence of opcodes than for a sequence of bytecodes. In some unusual cases, such as mpegaudio, we get absolutely incredible compression ratios. The other sequences don't always compress as well, but the overall effect is a substantial win.
Although this approach substantially decreased number of opcodes, gzipping the resulting sequence of opcodes gave a result that was only about slightly better than gzipping the original sequence of opcodes (see ``Custom opcodes'' in Table 4). As implemented, computing the custom opcodes was relatively expensive, but was very inexpensive to decompress. However, given the meager improvements, I decided not to incorporate this technology in the results reported here. Using custom opcodes may be an attractive in situations where gzip compression is not being used (because it is not available on the client or it is too expensive to run on the client).