Software engineers have always developed new ways of fitting a lot of data into a small space. It was true when our hard drives were tiny, and the advent of the internet has just made it more critical. File compression plays a big part in connecting us, letting us send less data down the line so we can have faster downloads and fit more connections onto busy networks.
To answer that question would involve explaining some very complicated math, certainly more than we can cover in this article, but you don’t need to understand precisely how it works mathematically to understand the basics.
The most popular libraries for compressing text rely on two compression algorithms, using both at the same time to achieve very high compression ratios. These two algorithms are “LZ77” and “Huffman coding.” Huffman coding is quite complicated, and we won’t be going into detail on that one here. Primarily, it uses some fancy math to assign shorter binary codes to individual letters, shrinking file sizes in the process. If you want to learn more about it, check out this article on how the code works, or this explainer by Computerphile.
LZ77, on the other hand, is relatively simple and is what we’ll be talking about here. It seeks to remove duplicate words and replace them with a smaller “key” that represents the word.
Take this short piece of text for example:
The LZ77 algorithm would look at this text, realize that it repeats “howtogeek” three times, and change it to this:
Then, when it wants to read the text back, it would replace every instance of (h) with “howtogeek,” bringing us back to the original phrase.
We call compression like this “lossless”—the data you put in is the same as the data you get out. Nothing is lost.
In reality, LZ77 doesn’t use a list of keys, but instead replaces the second and third occurrence with a link back in memory:
So now, when it gets to (h), it will look back to “howtogeek” and read that instead.
If you’re interested in a more detailed explanation, this video from Computerphile is pretty helpful.
Now, this is an idealized example. In reality, most text is compressed with keys as small as just a few characters. For example, the word “the” would be compressed even when it appears in words like “there,” “their,” and “then.” With repeated text, you can get some crazy compression ratios. Take this text file with the word “howtogeek” repeated 100 times. The original text file is three kilobytes in size. When compressed, though, it only takes up 158 bytes. That’s nearly 95% compression.
Now obviously, that’s a pretty extreme example since we just had the same word repeated over and over. In general practice, you’ll probably get around 30-40% compression using a compression format like ZIP on a file that’s mostly text.
This LZ77 algorithm applies to all binary data, by the way, and not just text, though text generally is easier to compress due to how many repeated words most languages use. A language like Chinese might be a little harder to compress than English, for example.
Video and audio compression works very differently. Unlike with text where you can have lossless compression, and no data is lost, with images we have what’s called “Lossy Compression” where you do lose some data. And the more you compress, the more data you lose.
This is what leads to those horrible-looking JPEGs that people have uploaded, shared, and screenshotted multiple times. Each time the image gets compressed, it loses some data.
Here’s an example. This is a screenshot I took that has not been compressed at all.
I then took that screenshot and ran it through Photoshop multiple times, each time exporting it as a low-quality JPEG. Here’s the result.
Looks pretty bad, right?
Well, this is only a worst-case scenario, exporting at 0% JPEG quality each time. For comparison, here’s a 50% quality JPEG, which is nearly indistinguishable from the source PNG image unless you blow it up and take a close look.
The PNG for this image was 200 KB in size, but this 50% quality JPEG is only 28 KB.
So how does it save so much space? Well, the JPEG algorithm is a feat of engineering. Most images store a list of numbers, with each number representing a single pixel.
JPEG does none of this. Instead, it stores images using something called a Discrete Cosine Transform, which is a collection of sine waves added together at varying intensities. It uses 64 different equations, but most of these don’t get used. This is what the quality slider for JPEG in Photoshop and other image apps does—choose how many equations to use. The apps then use Huffman encoding to reduce the file size even further.
This gives JPEGs an insanely high compression ratio, which can reduce a file that would be multiple megabytes down to a couple of kilobytes, depending on the quality. Of course, if you use it too much, you end up with this:
That image is horrible. But minor amounts of JPEG compression can have a significant impact on file size, and this makes JPEG very useful for image compression on websites. Most pictures you see online are compressed to save on download times, especially for mobile users with poor data connections. In fact, all the images on How-To Geek have been compressed to make page loading quicker, and you probably never noticed.
Video works a bit differently from images. You’d think that they would just compress each frame of video using JPEG, and they certainly do that, but there’s a better method for video.
We use something called “interframe compression,” which calculates the changes between each frame and only stores those. So, for example, if you have a relatively still shot that takes up several seconds in a video, a lot of space gets saved because the compression algorithm doesn’t need to store all the stuff in the scene that doesn’t change. Interframe compression is the main reason we have digital TV and web video at all. Without it, videos would be hundreds of gigabytes, more than the average hard drive size in 2005 when YouTube launched.
Also, since interframe compression works best with mostly stationary video, this is why confetti ruins video quality.
Note: GIF does not do this, which is why animated GIFs are often very short and small, but still have a pretty big file size.
Another thing to keep in mind about video is its bitrate—the amount of data allowed in every second. If your bitrate is 200 kb/s, for example, your video will look pretty bad. Quality goes up as the bitrate goes up, but after a couple of megabytes per second, you get diminishing returns.
This is a zoomed frame taken from a video of a jellyfish. The one on the left is at 3Mb/s, and the one on the right is 100Mb/s.
A 30x increase in file size, but not much increase in quality. Generally, YouTube videos sit around 2-10Mb/s depending on your connection, as anything more would probably not be noticed.
This demo does work better with actual video, so if you want to check it out for yourself, you can download the same bitrate test videos used here.
Audio compression works very similarly to text and image compression. Where JPEG removes detail from an image that you won’t see, audio compression does the same for sounds. You might not need to hear the creaking of the guitar pick on the string if the actual guitar is much, much louder.
MP3 also uses bitrate, ranging from the low end of 48 and 96 kbps (the low end) to 128 and 240kbps (pretty good) to 320kbps (high-end audio), and you will likely only hear the difference with exceptionally good headphones (and ears).
There are also lossless compression codecs for audio—the main one being FLAC—which uses LZ77 encoding to deliver entirely lossless audio. Some people swear by FLAC’s perfect audio quality, but with the prevalence of MP3, it seems most people either can’t tell or don’t mind the difference.