# 18 - Huffman coding

1. Huffman coding is a (lossless) algorithm developed by David A. Huffman while he was a student at MIT
`Note: You do not need to watch the video to answer the question, but it would be good to go through it to revise this topic`

TRUE

FALSE

2. The idea behind Huffman coding is based upon the ….

frequency of the letters (e.g. 'A') or binary patterns (e.g. 101) in any given character set. A higher number of bits is used to encode

None of these options

frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

infrequency of a symbol contained in binary bits

3. In Huffman encoding the symbol that is the ___________________ in that sequence gets a new code that is very small

least frequent

most frequent

4. The least frequent symbol will get a code that is ….

very long

full of binary digits

converted to ASCII

5. Since the frequencies of symbols vary across messages, there is no one Huffman coding that will work for all messages

TRUE

FALSE

6. In the string, “beep boop beer", what is the frequency of the letter 'b'?
```Character	Frequency
‘b’		?
‘e’		4
‘p’		2
‘ ‘		2
‘o’		2
‘r’		1
‘!’		1```

2

4

3

5

7. Spaces would count as a character and in "beep boop beer", how many are there?

6

5

4

2

8. A tree structure is important in Huffman encoding. How would you define a tree?
```Trees in Computer Science
=========================
Each element in the tree is called a node.
The node at the top is called the root node
Like a family tree, the 'parents' are placed
at a higher level, above the 'children'.
In computing, each 'node' records the memory
locations of all the nodes to which it is
connected (i.e parents or child nodes). It is __________________
______________________________________________
``` a physical data storage device

a single variable concatenated together with other variables to form a node like tree structure

binary input seqeuence that is automatically generated

a data structure that allows efficient searching, sorting and managing of data

9. A Binary tree is a specific type of tree. Fill in the blanks
`Parent nodes can only have ____________________ and child nodes can have one parent node. Notice that each 'arriving' number or letter goes either to the left of the right of the 'root'. ` one child node

two child nodes

four child nodes

three child nodes

10. Lossless techniques are often called 'dictionary compression'. What does this mean?

It means a physical dictionary can be used as a guide for the algorithm to compress files

a physical dictionary can be used to store and look up terms

A programmed dictionary can automate the process of compression so it takes no encoding

A short code word can be used to represent a longer pattern of data

11. Fill in the blanks for the following explanation of dictionary compression:
```Dictionary compression
=======================
A text file contains 10,000 repetitions of the same chunk of text,
in this case a twenty letter chunk as follows:

"ABCDEFGHIJKLMNOPQRST"

This is a 20 letter word and therefore would require 20 bytes of storage
(one byte per letter).

That comes to 20 x 10,000 = 200KB just for the text: "ABCDEFGHIJKLMNOPQRST"

Dictionary compression would

1. ____________________________________________

2. It would then maintain an index as a reminder that it had made the
replacement

3. Doing this would reduce the file size down from 200KB to 10KB
(95% space saving in the file!)```

replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a longer sequence of letters

replace the 20 letter word with a 15 letter random word

replace the 20 letter "ABCDEFGHIJKLMNOPQRST" with a blank space

replace the 20 letter word with a code that is generated to temporarily takes its place

12. To rebuild the original data, the decompressor would read the index then replace every instance of the codewords with its full word. With this approach….

All the data is lost

No data has been lost

Most data has been lost

Only a very small proportion of data has been lost

13. Fill in the blanks for the following (explanation of how the tree is created)
```Lets say you have a set of numbers and their frequency of use and want to create a
huffman encoding for them:

FREQUENCY	VALUE
---------       -----
5            1
7            2
10            3
15            4
20            5
45            6

Creating a huffman tree is simple. Sort this list by frequency and make the
__________________________________, creating a parent node with a frequency
that is the sum of the two lower element's frequencies:

12:*
/  \
5:1   7:2```

two-lowest elements (combined) into the root node

two-highest elements into root nodes

two-lowest elements into parents

two-lowest elements into leaves

14. The shortest code word possible is a ………

4 bit nibble

single bit (0 or 1) used to represent the most common item in the file

16 bit binary sequence

1 byte code word used to represent the most common word

15. The principle of Huffman encoding is that you are encoding as efficiently as possible, using….

short code words to represent the least frequent words

automatic dictionary compression, for which no coding is necessary

the longest code words to represent every single byte

the shortest code words to represent the most common words

16. Fill in the blanks - What are the three main steps involved in Huffman encoding?
```Huffman encoding, the three main steps
=====================================
1. Create a _________________________
(count how many occurences of each letter is in the word
and place it in a table)

2. Create a _________________________ from the frequency data.
(most common data item nearest the top and less common further down)

3. Create __________________.
('walk' down the path to the letter from the root (rightmost) node,
taking note of the binary value on each branch that is travelled to
get to it. ```

1. Binary sequences 2. Tree 3. Code Word

1. Binary Table 2. Tree 3. Binary sequence

1. Frequency table 2. Tree 3. Code Words

1. Tree 2. Huffman Tree 3. Huffman codes

17. In the following diagram, note the following, and fill in the blanks where appropriate:
```The 12 is derived from _________________
and should match the total frequency of the
data we are compressing.

If it is too low or too high,

Note: Each right (upper/higher) branch has
been labelled with a 1 and every left branch
(lower) has been labelled with a 0.``` 7 + 7

12 + 0

1+2 (concatenated to give 12)

5 + 7

18. The C = 3 in this tree means …. The frequency of all the words beginning with C is 3

None of the above

The frequency of '3' is C

The frequency of 'C' in the file is 3

19. To work out the code word for any letter, 'walk' down the path to the letter from the root (rightmost) node.
`What is the code word, in reference to this diagram, for the letter 'c'?` 20. What is the code word, in reference to the diagram, for the letter 'd'? 21. From the diagram, we can deduce that ____ is the most frequent letter because it has the smallest code possible. a

b

c

d

22. Size of file =
`Note: multiply the length of the code word (in bits) by the frequency of the code word, and this will give the total number of bits used to encode the letter in the compressed file. We can then add up the total bits used for each letter to get the size of the final file. `

None of the above

SUBTRACT (code word length x code word frequency)

SUM(code word length x code word frequency)

code word length x code word length

23. Complete the calculation to show the saving calculation in this particular case.
```Huffman encoding
=================
The term "abracadabra" is stored as: 01101001110011110110100

Add up the number of 1s and 0s and this will give you the number of bits required to store.

Total = 23 bits.

Compare this with the space needed to store in ASCII:  11 characters = 11 bytes

11 x 8 bits = 88 bits.

The saving in this case would be: ```

88 + 23 = 111

88 – 23 = 65 bits saved.

88 bits saved

None of the above

24. For “the big bugbit the little beetle” the encoding for 't' would be _________ and for 'b' would be______
`A useful resource with examples for this topic can be found at: http://www.paullong.net/downloads/huffmancoding.pdf`

t = 00, b =00

t = 10, b = 001

t = 11, b = 11

t = 01, b = 100

25. Fill in the blanks to show what the calculation for space saving is:
```Huffman encoding space saving
==============================
The phrase “the big bug bit the little beetle” can now be represented
in binary as: 01 1010 110 00 100 1011 11111 00 100 11110 11111 00 100
1011 01 00 01 1010 110 00 1110 1011 01 01 1110 110 00 100 110 110 01 1110 110

Count up the number of 1s and 0s to calculate the number of bits required
to store.

Total = 101

Compare this with the space needed to store in ASCII:  33 characters

The calculation is therefore: ____________```

264 + 264 = bits saved

33 x 8 bits = 264 bits. 264 - 101 = bits saved

101 - 33 = bits saved

101 +33 = bits saved