Preview

18 - Huffman coding

1. Huffman coding is a (lossless) algorithm developed by David A. Huffman while he was a student at MIT

Note: You do not need to watch the video to answer the question, but it would be good to go through it to revise this topic

FALSE

TRUE

2. The idea behind Huffman coding is based upon the ….

frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

infrequency of a symbol contained in binary bits

frequency of the letters (e.g. 'A') or binary patterns (e.g. 101) in any given character set. A higher number of bits is used to encode

None of these options

3. In Huffman encoding the symbol that is the ___________________ in that sequence gets a new code that is very small

least frequent

most frequent

present link

furthest link

4. The least frequent symbol will get a code that is ….

already compressed

full of binary digits

very long

converted to ASCII

5. Since the frequencies of symbols vary across messages, there is no one Huffman coding that will work for all messages

FALSE

TRUE

6. In the string, “beep boop beer", what is the frequency of the letter 'b'?

Character	Frequency
‘b’		?
‘e’		4
‘p’		2
‘ ‘		2
‘o’		2
‘r’		1
‘!’		1

7. Spaces would count as a character and in "beep boop beer", how many are there?

8. A tree structure is important in Huffman encoding. How would you define a tree?

Trees in Computer Science
=========================
Each element in the tree is called a node.
The node at the top is called the root node
Like a family tree, the 'parents' are placed
at a higher level, above the 'children'. 
In computing, each 'node' records the memory
locations of all the nodes to which it is 
connected (i.e parents or child nodes). It is __________________
______________________________________________

binary input seqeuence that is automatically generated

a physical data storage device

a single variable concatenated together with other variables to form a node like tree structure

a data structure that allows efficient searching, sorting and managing of data

9. A Binary tree is a specific type of tree. Fill in the blanks

Parent nodes can only have ____________________ and child nodes can have one parent node. Notice that each 'arriving' number or letter goes either to the left of the right of the 'root'.

three child nodes

two child nodes

one child node

four child nodes

10. Lossless techniques are often called 'dictionary compression'. What does this mean?

A short code word can be used to represent a longer pattern of data

a physical dictionary can be used to store and look up terms

It means a physical dictionary can be used as a guide for the algorithm to compress files

A programmed dictionary can automate the process of compression so it takes no encoding

11. Fill in the blanks for the following explanation of dictionary compression:

Dictionary compression
=======================
A text file contains 10,000 repetitions of the same chunk of text, 
in this case a twenty letter chunk as follows:

"ABCDEFGHIJKLMNOPQRST"

This is a 20 letter word and therefore would require 20 bytes of storage
(one byte per letter).

That comes to 20 x 10,000 = 200KB just for the text: "ABCDEFGHIJKLMNOPQRST"

Dictionary compression would

1. ____________________________________________

2. It would then maintain an index as a reminder that it had made the
replacement

3. Doing this would reduce the file size down from 200KB to 10KB
(95% space saving in the file!)

replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a longer sequence of letters

replace the 20 letter word with a code that is generated to temporarily takes its place

replace the 20 letter word with a 15 letter random word

replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a blank space

12. To rebuild the original data, the decompressor would read the index then replace every instance of the codewords with its full word. With this approach….

No data has been lost

Only a very small proportion of data has been lost

All the data is lost

Most data has been lost

13. Fill in the blanks for the following (explanation of how the tree is created)

Lets say you have a set of numbers and their frequency of use and want to create a 
huffman encoding for them:

	FREQUENCY	VALUE
        ---------       -----
       	     5            1
             7            2
            10            3
            15            4
            20            5
            45            6

Creating a huffman tree is simple. Sort this list by frequency and make the
__________________________________, creating a parent node with a frequency 
that is the sum of the two lower element's frequencies:

    	12:*
    	/  \
      5:1   7:2

two-lowest elements into parents

two-lowest elements into leaves

two-highest elements into root nodes

two-lowest elements (combined) into the root node

14. The shortest code word possible is a ………

single bit (0 or 1) used to represent the most common item in the file

4 bit nibble

1 byte code word used to represent the most common word

16 bit binary sequence

15. The principle of Huffman encoding is that you are encoding as efficiently as possible, using….

automatic dictionary compression, for which no coding is necessary

the shortest code words to represent the most common words

short code words to represent the least frequent words

the longest code words to represent every single byte

16. Fill in the blanks - What are the three main steps involved in Huffman encoding?

Huffman encoding, the three main steps
=====================================
1. Create a _________________________
(count how many occurences of each letter is in the word
and place it in a table)

2. Create a _________________________ from the frequency data.
(most common data item nearest the top and less common further down)


3. Create __________________.
('walk' down the path to the letter from the root (rightmost) node,
taking note of the binary value on each branch that is travelled to
get to it.

1. Tree 2. Huffman Tree 3. Huffman codes

1. Binary Table 2. Tree 3. Binary sequence

1. Frequency table 2. Tree 3. Code Words

1. Binary sequences 2. Tree 3. Code Word

17. In the following diagram, note the following, and fill in the blanks where appropriate:

The 12 is derived from _________________
and should match the total frequency of the 
data we are compressing.

If it is too low or too high, 
a mistake has been made.

Note: Each right (upper/higher) branch has 
been labelled with a 1 and every left branch 
(lower) has been labelled with a 0.

12 + 0

7 + 7

1+2 (concatenated to give 12)

5 + 7

18. The C = 3 in this tree means ….

The frequency of all the words beginning with C is 3

None of the above

The frequency of 'C' in the file is 3

The frequency of '3' is C

19. To work out the code word for any letter, 'walk' down the path to the letter from the root (rightmost) node.

What is the code word, in reference to this diagram, for the letter 'c'?

Answer: 1011

Answer: 1100

Answer: 0011

Answer: 1110

20. What is the code word, in reference to the diagram, for the letter 'd'?

Answer: 0100

Answer: 0011

Answer: 1000

Answer:0000

21. From the diagram, we can deduce that ____ is the most frequent letter because it has the smallest code possible.

22. Size of file =

Note: multiply the length of the code word (in bits) by the frequency of the code word, and this will give the total number of bits used to encode the letter in the compressed file. We can then add up the total bits used for each letter to get the size of the final file.

code word length x code word length

SUM(code word length x code word frequency)

None of the above

SUBTRACT (code word length x code word frequency)

23. Complete the calculation to show the saving calculation in this particular case.

Huffman encoding
=================
The term "abracadabra" is stored as: 01101001110011110110100 
 
Add up the number of 1s and 0s and this will give you the number of bits required to store.  

Total = 23 bits. 
 
Compare this with the space needed to store in ASCII:  11 characters = 11 bytes 
 
11 x 8 bits = 88 bits. 
 
The saving in this case would be:

88 + 23 = 111

88 bits saved

88 – 23 = 65 bits saved.

None of the above

24. For “the big bugbit the little beetle” the encoding for 't' would be _________ and for 'b' would be______

A useful resource with examples for this topic can be found at: http://www.paullong.net/downloads/huffmancoding.pdf

t = 11, b = 11

t = 01, b = 100

t = 10, b = 001

t = 00, b =00

25. Fill in the blanks to show what the calculation for space saving is:

Huffman encoding space saving
==============================
The phrase “the big bug bit the little beetle” can now be represented 
in binary as: 01 1010 110 00 100 1011 11111 00 100 11110 11111 00 100 
1011 01 00 01 1010 110 00 1110 1011 01 01 1110 110 00 100 110 110 01 1110 110 
 
Count up the number of 1s and 0s to calculate the number of bits required 
to store.  

Total = 101 
 
Compare this with the space needed to store in ASCII:  33 characters 

The calculation is therefore: ____________

101 +33 = bits saved

33 x 8 bits = 264 bits. 264 - 101 = bits saved

264 + 264 = bits saved

101 - 33 = bits saved

Report an error