This question addresses issues associated with developing variable length
codes specific to
known or anticipated data sets.
A file containing 11 unique symbols
is to be encoded using Huffman coding. The frequencies of every
letter occuring in the file were tabulated, and the Huffman code in the
following table was generated by minimizing the weighted path length in the
tree. A fixed length code for the 11 unique symbols
was also generated.
Plain Text |
Huffman |
4-Bit |
Letter |
Code Word |
Code Word |
A |
100 |
0001 |
D |
1111101 |
0010 |
E |
0 |
0011 |
I |
110 |
0100 |
M |
11110 |
0101 |
O |
1111100 |
0110 |
R |
11101 |
0111 |
S |
11100 |
1000 |
T |
101 |
1001 |
space |
1111110 |
0000 |
linefeed |
1111111 |
1111 |
- 2.1 (6)
- Build the Huffman coding tree corresponding to the above code.
- 2.2 (6)
- True or False: Letter A occurs more frequently in the file
than letter I. Explain your answer for full credit.
- 2.3 (6)
- True or False: The Huffman code for a given set of
symbols and frequencies is unique. That is, exactly ONE coding scheme can
be extracted from this information. Explain
your answer for full credit.
- 2.4 (6)
- When decoding a string using a Huffman code, how do you tell
when one character stops and another one begins? That is, why don't we need
a ``new character'' or ``stop'' symbol?
- 2.5 (6)
- Is the term ``tree'' really an accurate name for the
Huffman coding tree? That is, is there another name that we could give this
structure in the context of the course thus far? Explain your answer for
full credit.
- 2.6 (6)
- If we receive another file in ASCII containing only those 9 unique symbols
and encode it using the table above,
which would you expect to produce a shorter encoded file--the
4-bit fixed length code or the Huffman code given
above? Explain your answer for full credit.
MM Hugue
2012-03-08