Computer Science

Hashes

A cryptographic hash is a short, fixed-length representation of a message that is easily able to be created, and is practically impossible to reverse.

What do hashes look like?

Here are three quick examples of a particular kind of hash called SHA1:

Hello --> f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0 Hell! --> 935bfde1944e8560527e62ba63ba4801debc2b43 Hello, I love you. --> 3026976bf0dce32e68bdc09a17067ec7a3b969c8

Regardless of the length of the input string, a 40-character representation—a hash—of the message is produced. Note also that changing even just a single character in the message produces a completely different hash value.

You can recreate these hashes (or make your own) in the Terminal:

echo -n "Hello" | openssl sha1 echo -n "Hell\!" | openssl sha1

Superficially this may look like encryption, but it's not. With encryption, you take a code and convert back to the original plaintext. A hash, on the other hand, is a one-way function: an input produces a given output, and the output doesn't allow for retrieving the input.

Why would that be useful?

Using hashes to verify a file's authenticity

Hash function are used for lots of different things in computers. One of the main uses is to verify the authenticity of a message or a file.

Take a look at this screenshot of download page from the Raspberry Pi website. Underneath each download link is an SHA1 hash value, also called a digest, or sometimes a checksum.

How is this useful? We've already seen that even a small change in the value of the message produces a completely different checksum. If we want to make sure that the file we download from the website is the actual file that they want us to have, and not a file that has been delivered to us by a bad guy, we can:

  1. Download the file.
  2. Calculate the hash or checksum of the file.
  3. Compare that hash with the value published on the website. If the hashes match, we know our file hasn't been altered, either by accident or by bad guys.

I'm going to get the Torrent file and use Transmission, a BitTorrent client, to download the NOOBS file that I can use to run my Raspberry Pi.

BitTorrent is legal!

BitTorrent is a software system that allows people to easily share files that are distributed over a number of different computeres. Many businesses use BitTorrent to distribute files and software.

Most people understand that under most circumstances it is not legal to share copyrighted information. It is this exchange of copyrighted material that is unlawful, not the use of BitTorrent.

Once the file has downloaded, use openssl sha1 <filename> to get the checksum of the file, and carefully compare it with the published value. If they're the same, you know that the file is legitimate.

rwhite@MotteRouge$ cd ../Downloads rwhite@MotteRouge$ openssl sha1 NOOBS_v1_9_0.zipSHA1(NOOBS_v1_9_0.zip)= 94f7ee8a067ac57c6d35523d99d1f0097f8dc5cc

Using hashes to represent passwords

When you sign up for an online service, you are almost always asked to create a password that you will use to authenticate your log-in. When logging in, if you don't have the correct username and corresponding password, you are not allowed access to your account.

When you first create your password, websites that are acting responsibly will not actually store your password as plaintext—they will create a hash of your password, and save that information for your account in their database.

My Amazon account uses the password I like to buy stuff, but that's not what Amazon has stored on their computers. They store the hash 765982fea7bf63db408d096b3841d994c659e926, or at least that's what they'd have stored if that was really my password, and if they used SHA1 (unsalted) to create the hash value.

Why do they do this? Security! If someone breaks into their computers and steals database information for Amazon's accounts—and this has actually happened to Target, and Home Depot, and Chase bank—if the thieves get the password data, they won't know the actual passwords. They'll only know the hashes of the passwords, and because a hash is a one-way process, they can't reverse the process to find out what the true password is.

Storing passwords in plaintext is wrong.

Facebook Stored Millions Of Passwords In Plaintext—Change Yours Now [Wired magazine, March 19, 2019]

So how does Amazon know what your password is? They don't! They have saved the hash of your password when you first signed up, and every time you log in, they take the password you entered and find its hash. Because a given hash function produces the same value every time, they compare the hash of the password you just entered with the hash of the password you signed up for the account with. If the hashes match, you (probably) entered the same password both times.

What is "salting?"

SHA-1 is no longer considered a secure hashing system—it has been replaced by a different algorithm, SHA-512.