Preview

18 - Huffman coding

 1. Huffman coding is a (lossless) algorithm developed by David A. Huffman while he was a student at MIT
Note: You do not need to watch the video to answer the question, but it would be good to go through it to revise this topic

  TRUE

  FALSE

 2. The idea behind Huffman coding is based upon the ….

  frequency of the letters (e.g. 'A') or binary patterns (e.g. 101) in any given character set. A higher number of bits is used to encode

  infrequency of a symbol contained in binary bits

  frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

  None of these options

 3. In Huffman encoding the symbol that is the ___________________ in that sequence gets a new code that is very small

  present link

  furthest link

  least frequent

  most frequent

 4. The least frequent symbol will get a code that is ….

  converted to ASCII

  very long

  full of binary digits

  already compressed

 5. Since the frequencies of symbols vary across messages, there is no one Huffman coding that will work for all messages

  FALSE

  TRUE

 6. In the string, “beep boop beer", what is the frequency of the letter 'b'?
Character	Frequency
‘b’		?
‘e’		4
‘p’		2
‘ ‘		2
‘o’		2
‘r’		1
‘!’		1

  2

  5

  3

  4

 7. Spaces would count as a character and in "beep boop beer", how many are there?

  4

  5

  6

  2

 8. A tree structure is important in Huffman encoding. How would you define a tree?
Trees in Computer Science
=========================
Each element in the tree is called a node.
The node at the top is called the root node
Like a family tree, the 'parents' are placed
at a higher level, above the 'children'. 
In computing, each 'node' records the memory
locations of all the nodes to which it is 
connected (i.e parents or child nodes). It is __________________
______________________________________________
datarep_huffman_q1.png

  a single variable concatenated together with other variables to form a node like tree structure

   a physical data storage device

  binary input seqeuence that is automatically generated

  a data structure that allows efficient searching, sorting and managing of data

 9. A Binary tree is a specific type of tree. Fill in the blanks
Parent nodes can only have ____________________ and child nodes can have one parent node. Notice that each 'arriving' number or letter goes either to the left of the right of the 'root'. 
datarep_huffman_q2.jpg

  one child node

  four child nodes

  three child nodes

  two child nodes

 10. Lossless techniques are often called 'dictionary compression'. What does this mean?

  A programmed dictionary can automate the process of compression so it takes no encoding

  a physical dictionary can be used to store and look up terms

  It means a physical dictionary can be used as a guide for the algorithm to compress files

  A short code word can be used to represent a longer pattern of data

 11. Fill in the blanks for the following explanation of dictionary compression:
Dictionary compression
=======================
A text file contains 10,000 repetitions of the same chunk of text, 
in this case a twenty letter chunk as follows:

"ABCDEFGHIJKLMNOPQRST"

This is a 20 letter word and therefore would require 20 bytes of storage
(one byte per letter).

That comes to 20 x 10,000 = 200KB just for the text: "ABCDEFGHIJKLMNOPQRST"

Dictionary compression would

1. ____________________________________________

2. It would then maintain an index as a reminder that it had made the
replacement

3. Doing this would reduce the file size down from 200KB to 10KB
(95% space saving in the file!)

  replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a blank space

  replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a longer sequence of letters

  replace the 20 letter word with a 15 letter random word

  replace the 20 letter word with a code that is generated to temporarily takes its place

 12. To rebuild the original data, the decompressor would read the index then replace every instance of the codewords with its full word. With this approach….

  All the data is lost

  Most data has been lost

  No data has been lost

  Only a very small proportion of data has been lost

 13. Fill in the blanks for the following (explanation of how the tree is created)
Lets say you have a set of numbers and their frequency of use and want to create a 
huffman encoding for them:

	FREQUENCY	VALUE
        ---------       -----
       	     5            1
             7            2
            10            3
            15            4
            20            5
            45            6

Creating a huffman tree is simple. Sort this list by frequency and make the
__________________________________, creating a parent node with a frequency 
that is the sum of the two lower element's frequencies:

    	12:*
    	/  \
      5:1   7:2

  two-lowest elements (combined) into the root node

  two-highest elements into root nodes

  two-lowest elements into parents

  two-lowest elements into leaves

 14. The shortest code word possible is a ………

  single bit (0 or 1) used to represent the most common item in the file

  16 bit binary sequence

  1 byte code word used to represent the most common word

  4 bit nibble

 15. The principle of Huffman encoding is that you are encoding as efficiently as possible, using….

  automatic dictionary compression, for which no coding is necessary

  the longest code words to represent every single byte

  short code words to represent the least frequent words

  the shortest code words to represent the most common words

 16. Fill in the blanks - What are the three main steps involved in Huffman encoding?
Huffman encoding, the three main steps
=====================================
1. Create a _________________________
(count how many occurences of each letter is in the word
and place it in a table)

2. Create a _________________________ from the frequency data.
(most common data item nearest the top and less common further down)


3. Create __________________.
('walk' down the path to the letter from the root (rightmost) node,
taking note of the binary value on each branch that is travelled to
get to it. 

  1. Frequency table 2. Tree 3. Code Words

  1. Binary sequences 2. Tree 3. Code Word

  1. Binary Table 2. Tree 3. Binary sequence

  1. Tree 2. Huffman Tree 3. Huffman codes

 17. In the following diagram, note the following, and fill in the blanks where appropriate:
The 12 is derived from _________________
and should match the total frequency of the 
data we are compressing.

If it is too low or too high, 
a mistake has been made.

Note: Each right (upper/higher) branch has 
been labelled with a 1 and every left branch 
(lower) has been labelled with a 0.
datarep_huffman_q3.jpg

  12 + 0

  7 + 7

  5 + 7

  1+2 (concatenated to give 12)

 18. The C = 3 in this tree means ….
datarep_huffman_q3.jpg

  The frequency of '3' is C

  The frequency of 'C' in the file is 3

  The frequency of all the words beginning with C is 3

  None of the above

 19. To work out the code word for any letter, 'walk' down the path to the letter from the root (rightmost) node.
What is the code word, in reference to this diagram, for the letter 'c'?
datarep_huffman_q4.png

  Answer: 1110

  Answer: 1011

  Answer: 1100

  Answer: 0011

 20. What is the code word, in reference to the diagram, for the letter 'd'?
datarep_huffman_q5.png

  Answer:0000

  Answer: 1000

  Answer: 0011

  Answer: 0100

 21. From the diagram, we can deduce that ____ is the most frequent letter because it has the smallest code possible.
datarep_huffman_q5.png

  a

  b

  d

  c

 22. Size of file =
Note: multiply the length of the code word (in bits) by the frequency of the code word, and this will give the total number of bits used to encode the letter in the compressed file. We can then add up the total bits used for each letter to get the size of the final file. 

  SUBTRACT (code word length x code word frequency)

  None of the above

  code word length x code word length

  SUM(code word length x code word frequency)

 23. Complete the calculation to show the saving calculation in this particular case.
Huffman encoding
=================
The term "abracadabra" is stored as: 01101001110011110110100 
 
Add up the number of 1s and 0s and this will give you the number of bits required to store.  

Total = 23 bits. 
 
Compare this with the space needed to store in ASCII:  11 characters = 11 bytes 
 
11 x 8 bits = 88 bits. 
 
The saving in this case would be: 

  88 + 23 = 111

  None of the above

  88 bits saved

  88 – 23 = 65 bits saved.

 24. For “the big bugbit the little beetle” the encoding for 't' would be _________ and for 'b' would be______
A useful resource with examples for this topic can be found at: http://www.paullong.net/downloads/huffmancoding.pdf

  t = 10, b = 001

  t = 00, b =00

  t = 11, b = 11

  t = 01, b = 100

 25. Fill in the blanks to show what the calculation for space saving is:
Huffman encoding space saving
==============================
The phrase “the big bug bit the little beetle” can now be represented 
in binary as: 01 1010 110 00 100 1011 11111 00 100 11110 11111 00 100 
1011 01 00 01 1010 110 00 1110 1011 01 01 1110 110 00 100 110 110 01 1110 110 
 
Count up the number of 1s and 0s to calculate the number of bits required 
to store.  

Total = 101 
 
Compare this with the space needed to store in ASCII:  33 characters 

The calculation is therefore: ____________

  101 - 33 = bits saved

  264 + 264 = bits saved

  101 +33 = bits saved

  33 x 8 bits = 264 bits. 264 - 101 = bits saved