Our approach to handling strings is similar to that for
objects in general.
The first time a string is encounted, we encode a special index
to indicate a value not seen before, and we
write the Unicode string using the UTF encoding. Different
categories of strings (e.g., string constants or method names)
are put into seperate streams.
Strings lengths are written to
a separate stream than the Unicode characters (mixing the
two degrades compression).
When a string is encounted again, we encode a reference to
it using the scheme used for objects in general, as discussed
in Section 5 (e.g., the index into a
move-to-front queue or a fixed-id).