On the first day of 2016, Mozilla terminated support for a weakening security technology called SHA-1 in the Firefox web browser. Almost immediately, they reversed their decision, as it would cut access to some older websites. But in February 2017, their fears finally came true: researchers broke SHA-1 by creating the first real-world collision attack. Here’s what all that means.
What Is SHA-1?
The SHA in SHA-1 stands for Secure Hash Algorithm, and, simply put, you can think of it as a kind of math problem or method that scrambles the data that is put into it. Developed by the United States NSA, it’s a core component of many technologies used to encrypt important transmissions on the internet. Common encryption methods SSL and TLS, which you might have heard of, can use a hash function like SHA-1 to create the signed certificates you see in your browser toolbar.
We won’t go deep into the math and computer science of any of the SHA functions, but here’s the basic idea. A “hash” is a unique code based on the input of any data. Even small, random string of letters input into a hash function like SHA-1 will return a long, set number of characters, making it (potentially) impossible to revert the string of characters back to the original data. This is how password storage usually works. When you create a password, your password input is hashed and stored by the server. Upon your return, when you type in your password, it is hashed again. If it matches the original hash, the input can be assumed to be the same, and you’ll be granted access to your data.
Hash functions are useful primarily because they make it easy to tell if the input, for instance, a file or a password, has changed. When the input data is secret, like a password, the hash is nearly impossible to reverse and recover the original data (also known as the “key”). This is a bit different from “encryption”, whose purpose is scrambling data for the purpose of descrambling it later, using ciphers and secret keys. Hashes are simply meant to ensure data integrity–to make sure that everything is the same. Git, the version control and distribution software for open source code, uses SHA-1 hashes for this very reason.
That’s a lot of technical information, but to put it simply: a hash is not the same thing as encryption, since it is used to identify if a file has changed.
How Does This Technology Affect Me?
Let’s say you need to visit a website privately. Your bank, your email, even your Facebook account–all use encryption to keep the data you send them private. A professional website will provide encryption by obtaining a certificate from a trusted authority–a third party, trusted to ensure that the encryption is on the level, private between the website and user, and not being spied on by any other party. This relationship with the third party, called Certificate Authorities, or CA, is crucial, since any user can create a “self-signed” certificate–you can even do it yourself on a machine running Linux with Open SSL. Symantec and Digicert are two widely-known CA companies, for example.
Let’s run through a theoretical scenario: How-To Geek wants to keep logged in users’ sessions private with encryption, so it petitions a CA like Symantec with a Certificate Signing Request, or CSR. They create a public key and private key for encrypting and decrypting data sent over the internet. The CSR request sends the public key to Symantec along with information about the website. Symantec checks the key against its record to verify that the data is unchanged by all parties, because any small change in the data makes the hash radically different.
Those public keys and digital certificates are signed by hash functions, because the output of these functions are easy to see. A public key and certificate with a verified hash from Symantec (in our example), an authority, assures a user of How-To Geek that the key is unchanged, and not sent from someone malicious.
Because the hash is easy to monitor and impossible (some would say “difficult”) to reverse, the correct, verified hash signature means that the certificate and the connection can be trusted, and data can be agreed to be sent encrypted from end to end. But what if the hash wasn’t actually unique?
What Is a Collision Attack, and Is It Possible in the Real World?
You might have heard of the “Birthday Problem” in mathematics, although you might not have known what it was called. The basic idea is that if you gather a large enough group of people, chances are pretty high that two or more people will have the same birthday. Higher than you’d expect, in fact–enough that it seems like a weird coincidence. In a group as small as 23 people, there’s a 50% chance that two will share a birthday.
This is the inherent weakness in all hashes, including SHA-1. Theoretically, the SHA function should create a unique hash for any data that is put into it, but as the number of hashes grows, it becomes more likely that different pairs of data can create the same hash. So one could create an untrusted certificate with an identical hash to a trusted certificate. If they got you to install that untrusted certificate, it could masquerade as trusted, and distribute malicious data.
Finding matching hashes within two files is called a collision attack. At least one large scale collision attack is known to have already happened for MD5 hashes. But on Feb. 27th, 2017, Google announced SHAttered, the first-ever crafted collision for SHA-1. Google was able to create a PDF file that had the same SHA-1 hash as another PDF file, despite having different content.
SHAttered was performed on a PDF file. PDFs are a relatively loose file format; lots of tiny, bit-level changes can be made without preventing readers from opening it or causing any visible differences. PDFs are also often used to deliver malware. While SHAttered could work on other types of files, like ISOs, certificates are rigidly specified, making such an attack unlikely.
So how easy is this attack to perform? SHAttered was based on a method discovered by Marc Stevens in 2012 which required over 2^60.3 (9.223 quintillion) SHA-1 operations—a staggering number. However, this method is still 100,000 times fewer operations than would be required to achieve the same result with brute force. Google found that with 110 high-end graphics cards working in parallel, it would take approximately one year to produce a collision. Renting this compute time from Amazon AWS would cost about $110,000. Keep in mind that as prices drop for computer parts and you can get more power for less, attacks like SHAttered become easier to pull off.
$110,000 may seem like a lot, but it’s within the realm of affordability for some organizations—which means real life cybervillians could forge digital document signatures, interfere with backup and version control systems like Git and SVN, or make a malicious Linux ISO appear legitimate.
Fortunately, there are mitigating factors preventing such attacks. SHA-1 is rarely used for digital signatures anymore. Certificate Authorities no longer provide certificates signed with SHA-1, and both Chrome and Firefox have dropped support for them. Linux distributions typically release more frequently than once per year, making it impractical for an attacker to create a malicious version and then generate one padded to have the same SHA-1 hash.
On the other hand, some attacks based on SHAttered are already happening in the real world. The SVN version control system use SHA-1 to differentiate files. Uploading the two PDFs with identical SHA-1 hashes to a SVN repository will cause it to corrupt.
How Can I Protect Myself from SHA-1 Attacks?
There’s not a lot for the typical user to do. If you’re using checksums to compare files, you should use SHA-2 (SHA-256) or SHA-3 rather than SHA-1 or MD5. Likewise, if you’re a developer, be sure to use more modern hashing algorithms like SHA-2, SHA-3, or bcrypt. If you’re worried that SHAttered has been used to give two distinct files the same hash, Google has released a tool on the SHAttered site that can check for you.