Wrote this in about an hour for one of our publications next month. Comments welcome. It's a first draft and I'm still thinking about what people should do about it other than not panicing.
How Broken is SHA-1 and What are the Implications for Banking Security?
Mark Horvath, Microsoft Corporation
A recent paper by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu, Finding Collisions in the Full SHA-1 (
http://www.infosec.sdu.edu.cn/paper/sha1-crypto-auth-new-2-yao.pdf) has raised some questions about the strength of the SHA-1 hashing algorithm and it’s application in the commercial space, specifically in the areas of financial cryptography and online banking applications. In this discussion I’ll talk about three things, what is SHA-1, what does it mean to say it has been “broken” and what should the FS community do about this news?
SHA-1 is one of dozens of algorithms called hashing functions, mathematical operations which take an arbitrary sized text (any where from 1 to 2^64 bits) and reduce it down to a specifically sized, nearly unique signature. In the case of SHA-1, that’s 160 bits. The algorithm is written in such a way that’s it’s very sensitive to any changes in the input text and changing so much as a single character will produce a completely different result. So, I could feed in any document, say a mortgage application, and I would get 160 bits that would be unique to the information in that document. This provides a record or signature that I could use to verify the information later. Anytime I ran the algorithm on the document or an exact copy, I would get the same signature. On the other hand, if I changed a few characters in it (say from owing $200,000 to $000,200) and re-ran SHA-1, I would get a completely different signature. This, combined with a little public key cryptography, allows us to create digital signatures so that electronic documents can be verified at anytime to make sure that they have not changed.
I said that the signature was “nearly unique”. That’s because the algorithm takes arbitrary sized documents and reduces them to the same sized 160 bit signature. There is a chance that another document could hash to the same set of output bits, which is called a collision. Because of the nature of the algorithm, the two texts would likely look very different, one looking like a mortgage, the other looking very much like garbage. In a space 2^64 documents, very few of them look like normal or plain text with words, sentences etc. Most of look like the results of monkeys hitting random keys at typewriters. The exact probability of a collision depends very much on the details of the algorithm, which is where the problem comes in. Up until now it was believed that it would take 2^80 operations to get a collision in what is called a brute-force search, i.e. trying every possible input text to get a match.
Imagine you had a cluster of 1000 computers dedicated to the task, each one capable of trying 1,000,000 tests per second. (this is the approximate computing power of a large settlement network like SWIFT or DTCC). It would take 12,089,258,196,114,629.1747 seconds to “break” the algorithm, or 38,500,822 years, 3 months 11days, 1 hour and 25 minutes (give or take a minute). That’s a while. And that’s for one document! I’m sure over the course of 38 million years Moore’s law would help, or you could add more computers to the cluster, but generally it’s considered “secure” in that no one outside the NSA would waste that many resources on changing a mortgage.
The point of the Wang, Yin and Yu paper is that the probability isn’t 280 it’s more like 2^69, or 2^11 ( 2048 ) times more likely. This means it would take only 5,902,958,103,571.5961 seconds or 187992 or so years. Worse, but still a very long time. And this is assuming hardware and programming skills outside the access of governments or worldwide finance institutions.
The real concern here is not the drop of 2048 in ease of breaking the algorithm, it’s that the algorithm has been shown to have a weakness. It’s not impossible that the algorithm, with further study, might have additional problems, thus dropping the security still further.
It’s important to remember that this is a mathematical algorithm and it is not specific to any given vendor’s implementation of it on their software. Everyone using SHA-1, whether it be on Linux, Windows, VAX, Unix, FORTRAN, COBOL or even coded into the firmware of hardened cryptographic devices like Cylink, RSA, nCypher, SPYRUS or those used by NSA is, if they implemented it correctly, effected by this result. Hence this is a community problem, which needs to be solved through the normal security and cryptographic standards processes.
So what is an FSI Security professional to do? At this stage, the important thing is to avoid panic and recognize the problem for what it is, i.e. the normal cryptographic vetting process. As of this writing there has not been any demonstrated, successful attack on any text hashed using SHA-1 (although you might want to watch for one in the next 100,000 years or so). Good security policy would suggest the following actions:
Inventory where you are currently using the SHA-1 hashing algorithm in your digital signatures.
Other hashing algorithms exist, some much stronger than SHA-1, so start considering what it would take for your organization to switch in the future.
Work with your software vendors and see what they suggest.
Keep informed of the latest developments on security algorithms and see how SHA-1 and other tools develop.
Again, let me repeat, there is no cause for any kind of panic or mass exodus from SHA-1, but it is something to keep in mind over the next few years when upgrading your security systems.