π Background Information
Huffman codes are a prefix code that can be used to compress a text file in a lossless way. In order to create a Huffman code from a string of data, you create a binary tree which encodes the characters and their frequencies in the string. Based on that tree, you can create a table which relates different characters to binary strings based on their frequencies.
Oh, and the answer to the joke in the title of the lab? RAR files π¦
π― Problem Statement
Write a pair of functions which can compress and decompress a string using Huffman codes.
β Acceptance Criteria
- The compression function should take a plaintext string as its only argument and generate (1) a table of frequencies and codes for each character as well as (2) a string of zeros and ones which represent the compressed text. You can choose to return these values directly from the function or have them written to disk - whatever is easiest for you.
- The decompression function should take the string of zeros and ones along with the table of frequencies and codes that were generated by the compression function as arguments. Then, it should return the original string provided by the user.
- You must generate the Huffman codes in your compression function from scratch. You cannot use any built-in Huffman coding utilities in the language you are writing your code in.
- You are welcome to use the built-in data structures provided by your language (tree, hashmap, etc.).
π Dev Notes
- Your compression should differentiate capital and lower case letters. They are different!
- Your compression should include symbols and spaces. They are important when recreating your original text!
- Normally, you would probably return the compressed text as a binary of zeros and ones. In this lab, just return a string of zeros and ones - it will make it easier to debug and grade your code. Of course, converting that string to zeros and ones is not too difficult if you are interested in experimenting with it in the future.
π₯οΈ Example Output
Suppose I want to compress the text βHello Worldβ. This might look like:
compress("Hello World")
The function might write a table to disk as follows:
Symbol | Frequency | Code |
---|---|---|
space | 1 | 000 |
H | 1 | 1110 |
e | 1 | 1111 |
l | 3 | 10 |
o | 2 | 110 |
W | 1 | 001 |
r | 1 | 010 |
d | 1 | 011 |
It might also return the following compressed string: β11101111101011000000111001010011β.
If I wanted to decompress the compressed string, I might write:
decompress("11101111101011000000111001010011")
This might read the table from disk and use it to return βHello Worldβ.
π Thought Provoking Questions
- Would your code work for text written in Japanese? Why or why not?
- If a character shows up more frequently in the text, it tends to have a smaller sized code. Why is this a good thing?
πΌ Add-Ons For the Portfolio
N/A
π Useful Links
π Works Cited
N/A