18 - Huffman coding

 1. Huffman coding is a (lossless) algorithm developed by David A. Huffman while he was a student at MIT
Note: You do not need to watch the video to answer the question, but it would be good to go through it to revise this topic



 2. The idea behind Huffman coding is based upon the ….

  frequency of the letters (e.g. 'A') or binary patterns (e.g. 101) in any given character set. A higher number of bits is used to encode

  None of these options

  frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

  infrequency of a symbol contained in binary bits

 3. In Huffman encoding the symbol that is the ___________________ in that sequence gets a new code that is very small

  least frequent

  present link

  most frequent

  furthest link

 4. The least frequent symbol will get a code that is ….

  very long

  full of binary digits

  converted to ASCII

  already compressed

 5. Since the frequencies of symbols vary across messages, there is no one Huffman coding that will work for all messages



 6. In the string, “beep boop beer", what is the frequency of the letter 'b'?
Character	Frequency
‘b’		?
‘e’		4
‘p’		2
‘ ‘		2
‘o’		2
‘r’		1
‘!’		1





 7. Spaces would count as a character and in "beep boop beer", how many are there?





 8. A tree structure is important in Huffman encoding. How would you define a tree?
Trees in Computer Science
Each element in the tree is called a node.
The node at the top is called the root node
Like a family tree, the 'parents' are placed
at a higher level, above the 'children'. 
In computing, each 'node' records the memory
locations of all the nodes to which it is 
connected (i.e parents or child nodes). It is __________________

   a physical data storage device

  a single variable concatenated together with other variables to form a node like tree structure

  binary input seqeuence that is automatically generated

  a data structure that allows efficient searching, sorting and managing of data

 9. A Binary tree is a specific type of tree. Fill in the blanks
Parent nodes can only have ____________________ and child nodes can have one parent node. Notice that each 'arriving' number or letter goes either to the left of the right of the 'root'. 

  one child node

  two child nodes

  four child nodes

  three child nodes

 10. Lossless techniques are often called 'dictionary compression'. What does this mean?

  It means a physical dictionary can be used as a guide for the algorithm to compress files

  a physical dictionary can be used to store and look up terms

  A programmed dictionary can automate the process of compression so it takes no encoding

  A short code word can be used to represent a longer pattern of data

 11. Fill in the blanks for the following explanation of dictionary compression:
Dictionary compression
A text file contains 10,000 repetitions of the same chunk of text, 
in this case a twenty letter chunk as follows:


This is a 20 letter word and therefore would require 20 bytes of storage
(one byte per letter).

That comes to 20 x 10,000 = 200KB just for the text: "ABCDEFGHIJKLMNOPQRST"

Dictionary compression would

1. ____________________________________________

2. It would then maintain an index as a reminder that it had made the

3. Doing this would reduce the file size down from 200KB to 10KB
(95% space saving in the file!)

  replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a longer sequence of letters

  replace the 20 letter word with a 15 letter random word

  replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a blank space

  replace the 20 letter word with a code that is generated to temporarily takes its place

 12. To rebuild the original data, the decompressor would read the index then replace every instance of the codewords with its full word. With this approach….

  All the data is lost

  No data has been lost

  Most data has been lost

  Only a very small proportion of data has been lost

 13. Fill in the blanks for the following (explanation of how the tree is created)
Lets say you have a set of numbers and their frequency of use and want to create a 
huffman encoding for them:

        ---------       -----
       	     5            1
             7            2
            10            3
            15            4
            20            5
            45            6

Creating a huffman tree is simple. Sort this list by frequency and make the
__________________________________, creating a parent node with a frequency 
that is the sum of the two lower element's frequencies:

    	/  \
      5:1   7:2

  two-lowest elements (combined) into the root node

  two-highest elements into root nodes

  two-lowest elements into parents

  two-lowest elements into leaves

 14. The shortest code word possible is a ………

  4 bit nibble

  single bit (0 or 1) used to represent the most common item in the file

  16 bit binary sequence

  1 byte code word used to represent the most common word

 15. The principle of Huffman encoding is that you are encoding as efficiently as possible, using….

  short code words to represent the least frequent words

  automatic dictionary compression, for which no coding is necessary

  the longest code words to represent every single byte

  the shortest code words to represent the most common words

 16. Fill in the blanks - What are the three main steps involved in Huffman encoding?
Huffman encoding, the three main steps
1. Create a _________________________
(count how many occurences of each letter is in the word
and place it in a table)

2. Create a _________________________ from the frequency data.
(most common data item nearest the top and less common further down)

3. Create __________________.
('walk' down the path to the letter from the root (rightmost) node,
taking note of the binary value on each branch that is travelled to
get to it. 

  1. Binary sequences 2. Tree 3. Code Word

  1. Binary Table 2. Tree 3. Binary sequence

  1. Frequency table 2. Tree 3. Code Words

  1. Tree 2. Huffman Tree 3. Huffman codes

 17. In the following diagram, note the following, and fill in the blanks where appropriate:
The 12 is derived from _________________
and should match the total frequency of the 
data we are compressing.

If it is too low or too high, 
a mistake has been made.

Note: Each right (upper/higher) branch has 
been labelled with a 1 and every left branch 
(lower) has been labelled with a 0.

  7 + 7

  12 + 0

  1+2 (concatenated to give 12)

  5 + 7

 18. The C = 3 in this tree means ….

  The frequency of all the words beginning with C is 3

  None of the above

  The frequency of '3' is C

  The frequency of 'C' in the file is 3

 19. To work out the code word for any letter, 'walk' down the path to the letter from the root (rightmost) node.
What is the code word, in reference to this diagram, for the letter 'c'?

  Answer: 1100

  Answer: 1011

  Answer: 0011

  Answer: 1110

 20. What is the code word, in reference to the diagram, for the letter 'd'?

  Answer: 0100

  Answer: 0011

  Answer: 1000


 21. From the diagram, we can deduce that ____ is the most frequent letter because it has the smallest code possible.





 22. Size of file =
Note: multiply the length of the code word (in bits) by the frequency of the code word, and this will give the total number of bits used to encode the letter in the compressed file. We can then add up the total bits used for each letter to get the size of the final file. 

  None of the above

  SUBTRACT (code word length x code word frequency)

  SUM(code word length x code word frequency)

  code word length x code word length

 23. Complete the calculation to show the saving calculation in this particular case.
Huffman encoding
The term "abracadabra" is stored as: 01101001110011110110100 
Add up the number of 1s and 0s and this will give you the number of bits required to store.  

Total = 23 bits. 
Compare this with the space needed to store in ASCII:  11 characters = 11 bytes 
11 x 8 bits = 88 bits. 
The saving in this case would be: 

  88 + 23 = 111

  88 – 23 = 65 bits saved.

  88 bits saved

  None of the above

 24. For “the big bugbit the little beetle” the encoding for 't' would be _________ and for 'b' would be______
A useful resource with examples for this topic can be found at:

  t = 00, b =00

  t = 10, b = 001

  t = 11, b = 11

  t = 01, b = 100

 25. Fill in the blanks to show what the calculation for space saving is:
Huffman encoding space saving
The phrase “the big bug bit the little beetle” can now be represented 
in binary as: 01 1010 110 00 100 1011 11111 00 100 11110 11111 00 100 
1011 01 00 01 1010 110 00 1110 1011 01 01 1110 110 00 100 110 110 01 1110 110 
Count up the number of 1s and 0s to calculate the number of bits required 
to store.  

Total = 101 
Compare this with the space needed to store in ASCII:  33 characters 

The calculation is therefore: ____________

  264 + 264 = bits saved

  33 x 8 bits = 264 bits. 264 - 101 = bits saved

  101 - 33 = bits saved

  101 +33 = bits saved