What is a Dinosaur's Favorite Form of Compression?

🔖 Background Information

Huffman codes are a prefix code that can be used to compress a text file in a lossless way. In order to create a Huffman code from a string of data, you create a binary tree which encodes the characters and their frequencies in the string. Based on that tree, you can create a table which relates different characters to binary strings based on their frequencies.

Oh, and the answer to the joke in the title of the lab? RAR files 🦖

🎯 Problem Statement

Write a pair of functions which can compress and decompress a string using Huffman codes.

✅ Acceptance Criteria

The compression function should take a plaintext string as its only argument and generate (1) a table of frequencies and codes for each character as well as (2) a string of zeros and ones which represent the compressed text. You can choose to return these values directly from the function or have them written to disk - whatever is easiest for you.
The decompression function should take the string of zeros and ones along with the table of frequencies and codes that were generated by the compression function as arguments. Then, it should return the original string provided by the user.
You must generate the Huffman codes in your compression function from scratch. You cannot use any built-in Huffman coding utilities in the language you are writing your code in.
You are welcome to use the built-in data structures provided by your language (tree, hashmap, etc.).

📋 Dev Notes

Your compression should differentiate capital and lower case letters. They are different!
Your compression should include symbols and spaces. They are important when recreating your original text!
Normally, you would probably return the compressed text as a binary of zeros and ones. In this lab, just return a string of zeros and ones - it will make it easier to debug and grade your code. Of course, converting that string to zeros and ones is not too difficult if you are interested in experimenting with it in the future.

🖥️ Example Output

Suppose I want to compress the text “Hello World”. This might look like:

compress("Hello World")

The function might write a table to disk as follows:

Symbol	Frequency	Code
space	1	000
H	1	1110
e	1	1111
l	3	10
o	2	110
W	1	001
r	1	010
d	1	011

It might also return the following compressed string: “11101111101011000000111001010011”.

If I wanted to decompress the compressed string, I might write:

decompress("11101111101011000000111001010011")

This might read the table from disk and use it to return “Hello World”.

📝 Thought Provoking Questions

Would your code work for text written in Japanese? Why or why not?
If a character shows up more frequently in the text, it tends to have a smaller sized code. Why is this a good thing?

More Than Equations

Explorer

What is a Dinosaur's Favorite Form of Compression?

🔖 Background Information

🎯 Problem Statement

✅ Acceptance Criteria

📋 Dev Notes

🖥️ Example Output

📝 Thought Provoking Questions

💼 Add-Ons For the Portfolio

🔗 Useful Links

📘 Works Cited

Graph View

Table of Contents

Backlinks