A hashing algorithm is a function that converts any input data into a fixed-length output known as a hash. It doesn’t matter whether the input is a single letter, a page from a novel, or an entire set of encyclopedias. Each input will produce a unique output expressed as an alphanumeric string of uniform length.
This article focuses on how hashing algorithms work and why they are crucial for data security. We’ll define what a hashing algorithm is and look at the properties required for an algorithm to perform at a high level. We’ll then highlight a few prominent hashing algorithms and explain how they power real-world applications, including blockchain technology.
Properties of An Effective Hashing Algorithm
Hashing algorithms are used in all sorts of applications that require fast, secure, and consistent data processing. Regardless of their design variations, all effective hashing algorithms share the same five properties. If a hashing algorithm possesses these five properties, it’s considered a suitable choice for maintaining data security.
Property #1: Easy For Computers To Run
Hashing algorithms must be designed such that you can take any input and produce an output almost immediately. Thanks to advances in computing power over the past 50 years, normal computers can handle hashing of large inputs or a high volume of inputs with ease.
Property #2: Same Input Always Gives Same Output
Hashing algorithms need to be deterministic. If you run the exact same input 10,000 times, the output must be the same, 10,000 times over. This is essential for ensuring data integrity. If you find that the output is different, then you can know with certainty that the input was changed before hashing. In this way, the output of a hashing algorithm can act as an electronic fingerprint for the input.
Property #3: Output Gives No Clues About Input
All outputs of a specific hashing algorithm are of uniform length and format. So no matter whether you enter a short word or a full essay as the input, the output will be the same length. The output should always appear to be a random alphanumeric string. In fact, the security of hashing algorithms relies upon attackers not being able to deduce an input just by looking at an output. This means that outputs can't reflect any characteristics of their input.
Property #4: Very Hard To Find Inputs That Produce Same Output
An effective hashing algorithm is able to take inputs of any size and produce a unique output. The challenge is that there are an infinite number of possible inputs and a finite number of outputs, since outputs are all of a fixed length. The probability of producing the same output from two or more inputs must be approximately zero. Algorithms that produce longer outputs are generally considered more reliable than ones that produce shorter outputs, since this reduces the chances that someone will find a “collision” in which two inputs give the same output. Once a collision is discovered for a particular hashing algorithm, that algorithm is considered "broken" and insecure.
Property #5: Virtually Impossible To Reverse Engineer An Input
Hashing algorithms are often called one-way hash functions. That's because they are designed to be irreversible. It should be really easy to take an input and produce an output. In contrast, it should be next to impossible to take a specific output and learn its input. Hashing algorithms rely upon modular arithmetic since this form of mathematics doesn't have any known inverse operation.
How Are Hashing Algorithms Used?
There are three major benefits of using algorithms to hash data: faster processing speeds, data integrity, and password security.
The idea surrounding faster processing speeds is quite simple. Large amounts of data can be retrieved almost instantaneously when hashing algorithms are used. Querying inputs of various lengths would take an extremely long time, especially with millions or even billions of entries. When inputs are run through a hashing algorithm, querying fixed-size outputs takes significantly less time.
This increased performance is especially beneficial for scientists who use data hashing for DNA and RNA sequencing. Because data can be retrieved quickly, researchers can run more simulations in a shorter amount of time. As a result, it’s possible to speed up the treatment discovery process for medical conditions and diseases.
Data integrity is an important use case for hashing. Outputs are an easy way to ensure that data hasn’t been tampered with or edited at any point in time. When saving a file, for example, it’s possible to use a hashing algorithm to produce a hash of that document. If you access that file again later on, you can run the file through the same hashing algorithm. When looking at the outputs, it only takes a split second to compare them. There’s no need to search the files manually to see if any changes have been made. Email protocols, for example, use hashing to create a digital signature that can be used to verify that messages haven't been altered in transit.
Hashing is also effective for protecting sensitive data. Passwords for email, social media, banking, and other applications are usually run through hashing algorithms before being saved on the application’s central servers. The goal is to ensure that user passwords remain secure, even when a security breach occurs.
To protect users, websites typically don’t store passwords in plain text on their central servers. If they did, and a hacker managed to gain access to these central servers, the hacker would be able to extract every user’s password in a readable, plain text format. At that point, the hacker would be able to access any user’s account without hindrance. To protect against this scenario, websites run every password through a hashing algorithm before storing it on a central server. Thus, all the passwords appear as garbled strings of data. A hacker would not be able to determine a single user’s password, even if they gain access to the website’s central servers.
A Hashing Algorithm In Practice
To get a better understanding of how hashing algorithms work, let’s look at an example of SHA-256, one of the most prominent hashing algorithms. SHA-256 possesses all five properties of an effective hashing algorithm. It’s also widely considered to be a secure and reliable choice for real-world applications.
For this example, we could use literally any input. It could be an entire book, if we wanted. To keep things simple, we’ll use a simple input: "What Is A Hashing Algorithm? " Let's see what happens when we run it through SHA256 with an online hashing tool.
Input #1: What Is A Hashing Algorithm?
Output #1: adda2416ed2a096a39e47c6ca8ae5ad1583b1b5d92b212b65286f4271f76751e
As you can see, Output #1 is seemingly random. There's no way you could guess the input just by looking at the output. But if we run Input #1 through the SHA-256 hash algorithm once again, we will get the exact same result again. No matter how many times we try Input #1, and no matter how random Output #1 seems, Input #1 will always produce Output #1. So, in that sense, the result isn’t really random at all.
Now, let’s say we wanted to change our input very slightly. If we change even the smallest detail, SHA-256 will produce a completely different output. Let’s just remove the question mark from the example above and observe how the output changes.
Input #2: What Is A Hashing Algorithm
Output #2: eeb62c7676771f01ec02962e4c7cf3e47b076fb2b7a588315a84bd197c155132
As you can see, Output #1 looks completely different from Output #2. This is what we mean when we say that the output is seemingly random. We don’t really know why the outputs appear in the combination of letters and numbers that they do, making it impossible to find clues about the input. We also can’t see any correlation between Output #1 and Output #2. Input #1 has one more character than Input #2, yet both Output #1 and Output #2 have an exact length of 64 characters.
Both outputs appear to be garbled strings that don’t really make any sense. This isn’t just the way that SHA-256 works. It’s actually how all hashing algorithms work. The only slight difference is that some hashing algorithms produce shorter strings while others produce longer strings.
Examples of Hashing Algorithms
Hashing algorithms are usually introduced as families, which include slightly different implementations of the same general design. For instance, SHA-2 includes SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. While all of these hashing algorithms are similar, they differ slightly in the way they create a hash, or output, from a given input.
Examples of today’s most common classes of hashing algorithms include:
- Message Digest Algorithm (MD)
- RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
- Secure Hashing Algorithm (SHA)
Let's take a closer look at each of these classes of hashing algorithms. Along the way, we'll run the same input ("Hashing") through each algorithm to see how the outputs compare.
BLAKE is a series of hashing algorithms that includes BLAKE, BLAKE2, and BLAKE3. BLAKE2 can be used in a number of applications. Some examples include digital signature algorithms, message authentication, and integrity protection mechanisms for Public Key Infrastructure (PKI), secure communication protocols, cloud storage, intrusion detection, forensic suites, and version control systems. BLAKE3, the newest edition to the series, is a standalone algorithm that was first announced in January 2020.
Blake2s-256 Output: daece55690fb0a6f1833fb11d67a91ee7dfaa1ae57221c6fd6e9319066119145
Message Digest (MD) Algorithm
MD is a family of four hashing algorithms that includes MD2, MD4, MD5, and MD6. MD5 is the most popular of these algorithms. It was first published in 1992 and is mainly used to verify that text or files have been unaltered by using checksums. MD5 is no longer used because it can be cracked with relative ease, given the strength of modern computers.
MD5 Output: befd1ea261d11ae5ba4f3f0363313c52
RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
RIPEMD is a family of five hashing algorithms: TRIPEMD, RIPEMD-128, RIPEMD-160, RIPEMD-256, and RIPEMD-320. RIPEMD-160 was first published in 1996 and is still used today. Along with the SHA-256 hashing algorithm, RIPEMD-160 produces Bitcoin (BTC) addresses for P2PKH and P2SH transactions. RIPEMD-160 is chosen because it produces the shortest outputs whose uniqueness is sufficiently assured.
RIPEMD-160 Output: 61dc4c6ac2d3e5ed2bf34ce5f053a388df6200fc
Secure Hash Algorithms (SHA)
SHA is a family of four hashing algorithms: SHA-0, SHA-1, SHA-2, and SHA-3. Although SHA-3 offers the highest level of security, SHA-2 is more widely used today. Beginning in January 2016, all Certificate Authorities (CAs) only issue SHA-2 SSL certificates. This means nearly all websites in existence use SHA-2 to create an encrypted connection between a web server and a web browser. SHA-2 is also prominently used by many websites for password hashing.
SHA-256 Output: 89a4382b6164bfe171507d674d5673551d87274b1bfdeba70940d326b186f5ee
Hashing Algorithms and Blockchain Technology
Hashing algorithms are particularly helpful when applied to blockchain technology. Let’s look at how hashing algorithms are used to send, receive, and validate transactions on a blockchain.
Public Key Hashing
Blockchain networks often use addresses rather than public keys. Public keys are run through a hashing algorithm. The output, called a pubkey hash, is then used to create an address. This helps to improve private key security on public networks. It also provides a shorter, more user-friendly public identifier for sending and receiving transactions.
Before one person is able to send funds to another, the data for that particular transaction is run through a hashing algorithm. The output becomes one component of the digital signature which the sender must attach for the transaction to be approved by the peer to peer network. The recipient of the transaction can then use this output to verify that the transaction data has not been altered and that the accompanying digital signature is authentic.
A merkle tree is a data structure that uses a hashing algorithm to take a large body of data and derive a single output called the Merkle Root. This single, 64-character alphanumeric string acts as an electronic fingerprint for an entire body of data. Using the Merkle Root, computers on blockchain networks can verify thousands of transactions extremely efficiently and securely.
Whereas PoS blockchains rely on staking to validate new blocks and secure the network, PoW blockchains rely on cryptocurrency mining. Hashing algorithms are a major component of mining. Stated very simply, miners compete to mine new blocks by running blockchain data through a specific hashing algorithm over and over until they produce a hash that is less than or equal to a predetermined “target.” Once a miner finds a satisfactory block hash, they broadcast it to the network. If consensus is reached, the transaction is considered valid and gets added to the blockchain.
If you'd like to learn more about blockchain technology and keep up with Komodo's progress, subscribe to our newsletter. Begin your blockchain journey with Komodo today.