Understanding Hash Functions: MD5 vs SHA Explained

Published March 2026 · 10 min read

What Are Hash Functions and Why Do They Matter?

A hash function is a mathematical algorithm that takes an input of any size and produces a fixed-size output, commonly called a hash, digest, or checksum. Whether you feed it a single character or an entire operating system image, the output is always the same length. This seemingly simple property underpins a vast range of technologies that developers interact with every day.

Every time you download a file and verify its checksum, commit code to Git, store a password in a database, or validate a digital signature, hash functions are doing the heavy lifting behind the scenes. They are one of the fundamental building blocks of modern computer science, bridging the gap between raw data and trust.

In this guide, we will explore how hash functions work from the ground up, examine the most widely used algorithms — MD5, SHA-1, SHA-256, and SHA-512 — and help you understand when to use each one. By the end, you will have the knowledge to make informed decisions about hashing in your own projects.

How Hash Functions Work

At their core, hash functions perform a one-way transformation. They take an input (often called the message or pre-image) and run it through a series of mathematical operations — bitwise shifts, XOR operations, modular arithmetic, and logical functions — to produce a fixed-length output.

Consider a simple example. Hashing the string hello with SHA-256 always produces the same 64-character hexadecimal string:

Input:  "hello"
Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Now change just one character — capitalize the h to H:

Input:  "Hello"
Output: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

The two outputs look completely different. This behavior is called the avalanche effect — a tiny change in the input produces a dramatically different hash. This is not a bug; it is a fundamental design goal that makes hash functions useful for detecting even the smallest modifications to data.

Key Characteristics

  • Deterministic: The same input always produces the same output. There is no randomness involved — hash the string a million times and you get the identical result every time.
  • Fixed output size: Regardless of whether the input is 1 byte or 1 terabyte, the hash length is always the same (e.g., 256 bits for SHA-256).
  • One-way (pre-image resistance): Given a hash value, it should be computationally infeasible to reconstruct the original input. You cannot “reverse” a hash.
  • Fast to compute: Hash functions are designed to process data quickly, making them practical for real-world use on large files and high-throughput systems.
  • Avalanche effect: A single-bit change in the input flips approximately half the bits in the output, making the two hashes appear completely unrelated.

Properties of a Good Cryptographic Hash Function

Not all hash functions are created equal. A hash function used in a hash table for fast lookups has different requirements from one used to verify software integrity. Cryptographic hash functions must satisfy a stricter set of properties to be considered secure:

1. Pre-image Resistance

Given a hash output h, it should be computationally infeasible to find any input m such that hash(m) = h. In practical terms, if someone gives you a hash value, you should not be able to figure out what the original input was. This is what makes hash functions “one-way.”

2. Second Pre-image Resistance

Given an input m1, it should be infeasible to find a different input m2 such that hash(m1) = hash(m2). This protects against an attacker who knows the original file and tries to create a modified version with the same hash.

3. Collision Resistance

It should be infeasible to find any two distinct inputs m1 and m2 that produce the same hash. Note the subtle difference from second pre-image resistance: here the attacker has freedom to choose both inputs, which makes it a strictly easier attack. A collision in a hash function used for digital signatures could allow an attacker to forge documents.

4. Avalanche Effect

As demonstrated earlier, even a single-bit change in the input should result in a completely different output. A well-designed hash function ensures that each output bit depends on every input bit through a complex chain of operations. This property makes it impossible to deduce the relationship between similar inputs by comparing their hashes.

5. Efficiency

A practical hash function must be fast enough to process large volumes of data in reasonable time. SHA-256 can hash several hundred megabytes per second on modern hardware. However, for password hashing, we actually want the opposite — deliberate slowness to thwart brute-force attacks. This is why specialized algorithms like bcrypt and Argon2 exist.

MD5: The Legacy Standard

MD5 (Message Digest Algorithm 5) was designed by Ronald Rivest in 1991 as a successor to MD4. For over a decade, it was the go-to hash function for everything from password storage to file integrity verification. MD5 produces a 128-bit (16-byte) hash, typically represented as a 32-character hexadecimal string.

MD5("Hello, World!") = 65a8e27d8879283831b664bd8b7f0ad4

How MD5 Works Internally

MD5 processes input data in 512-bit (64-byte) blocks. The message is first padded so its length is a multiple of 512 bits. The algorithm maintains a 128-bit state divided into four 32-bit words (A, B, C, D), which are initialized with specific constants. Each block goes through four rounds of 16 operations each, using nonlinear functions, modular addition, and bitwise rotations to mix the data into the state.

Known Vulnerabilities

The security of MD5 began to erode in the mid-1990s when theoretical weaknesses were discovered in its compression function. The decisive blow came in 2004 when Chinese researchers Xiaoyun Wang and Hongbo Yu demonstrated practical collision attacks, producing two different inputs with the same MD5 hash in a matter of hours. By 2008, researchers had created rogue CA certificates using MD5 collisions, proving that the vulnerability had real-world consequences.

Warning: MD5 is cryptographically broken. It should never be used for security purposes such as digital signatures, certificate validation, or password hashing. Modern hardware can generate MD5 collisions in seconds.

Where MD5 Is Still Used

Despite its cryptographic weaknesses, MD5 remains in use for non-security purposes where collision resistance is not critical:

  • Checksums for file downloads: Quick verification that a file was not corrupted during transfer (not protection against tampering).
  • Data deduplication: Identifying duplicate files in storage systems where malicious collision is not a concern.
  • Cache keys: Generating unique identifiers for cached content based on input data.
  • Legacy system compatibility: Many older systems and protocols still reference MD5 hashes.

The SHA Family: From SHA-1 to SHA-512

The Secure Hash Algorithm (SHA) family was developed by the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST). The family has evolved through several generations, each addressing weaknesses found in its predecessors.

SHA-1 (160-bit) — Deprecated

SHA-1 was published in 1995 and produces a 160-bit (20-byte) hash, represented as 40 hexadecimal characters. For many years it was the standard for SSL certificates, Git commit hashes, and code signing.

SHA-1("Hello, World!") = 0a0a9f2a6772942557ab5355d76af442f8f65e01

In 2017, Google and CWI Amsterdam demonstrated the first practical SHA-1 collision (known as the “SHAttered” attack), producing two different PDF files with the same SHA-1 hash. Since then, all major browsers and certificate authorities have deprecated SHA-1 for security purposes. Git has been transitioning to SHA-256 as well.

Warning: SHA-1 is considered broken for cryptographic use. Do not use it for digital signatures, certificates, or any application where collision resistance is required. Major browsers reject SHA-1 signed certificates.

SHA-256 (256-bit) — The Current Standard

SHA-256 is part of the SHA-2 family, published by NIST in 2001. It produces a 256-bit (32-byte) hash, represented as 64 hexadecimal characters. SHA-256 is currently the most widely used cryptographic hash function and is considered secure for all practical purposes.

SHA-256("Hello, World!") = dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

SHA-256 processes data in 512-bit blocks through 64 rounds of operations. It is the backbone of many critical systems: TLS/SSL certificates, Bitcoin mining, code signing, JWT signatures, and government standards. No practical collision or pre-image attacks against SHA-256 have been found.

Fun fact: Bitcoin's proof-of-work system relies entirely on SHA-256. Miners repeatedly hash block headers until they find a hash that starts with a certain number of zeros. The Bitcoin network currently performs over 500 exahashes (500 quintillion SHA-256 computations) per second.

SHA-384 (384-bit)

SHA-384 is a truncated version of SHA-512. It uses the same algorithm as SHA-512 but with different initial hash values and truncates the final output to 384 bits (48 bytes), represented as 96 hexadecimal characters. It offers a middle ground between SHA-256 and SHA-512 and is commonly used in TLS cipher suites that require higher security margins.

SHA-512 (512-bit)

SHA-512 produces a 512-bit (64-byte) hash, represented as 128 hexadecimal characters. It processes data in 1024-bit blocks through 80 rounds of operations, using 64-bit words internally.

SHA-512("Hello, World!") = 374d794a95cdcfd8b35993185fef9ba368f160d8daf432d08ba9f1ed1e5abe6cc69291e0fa2fe0006a52570ef18c19def4e617c33ce52ef0a6e5fbe318cb0387

On 64-bit processors, SHA-512 is often faster than SHA-256 because its internal operations are optimized for 64-bit arithmetic. It provides a larger security margin and is preferred in environments where maximum security is required, such as government classified communications and long-term archival integrity.

MD5 vs SHA: Side-by-Side Comparison

Here is a comprehensive comparison of the most commonly used hash algorithms to help you choose the right one for your use case:

PropertyMD5SHA-1SHA-256SHA-512
Output Size128 bits (32 hex)160 bits (40 hex)256 bits (64 hex)512 bits (128 hex)
Block Size512 bits512 bits512 bits1024 bits
Rounds64 (4 x 16)806480
SpeedFastestFastModerateFast on 64-bit
SecurityBrokenBrokenSecureSecure
Collision Found20042017NoneNone
Year Published1991199520012001
Recommended UseChecksums onlyLegacy onlyGeneral purposeHigh security

Recommendation: When in doubt, use SHA-256. It provides an excellent balance of security, speed, and broad ecosystem support. Choose SHA-512 when you need the highest security margin or when running on 64-bit hardware where it may actually be faster than SHA-256.

Common Use Cases for Hash Functions

Password Hashing (with Salting)

Storing passwords in plaintext is a catastrophic security failure. Instead, applications hash passwords before storing them. When a user logs in, the server hashes the provided password and compares it to the stored hash. If they match, the password is correct — without ever storing the actual password.

However, hashing alone is not enough. Attackers use precomputed tables of common password hashes (called rainbow tables) to reverse hashes instantly. The solution is salting: appending a unique random string to each password before hashing. This ensures that even identical passwords produce different hashes.

password: "mypassword123"
salt:     "x9Kp2mQ7"
hash:     SHA-256("mypassword123" + "x9Kp2mQ7")
stored:   "x9Kp2mQ7$a3f2b8c1d4e5f6..." (salt + hash)

Important: For actual password storage, do not use raw SHA-256. Use purpose-built password hashing algorithms like bcrypt, scrypt, or Argon2. These algorithms are intentionally slow and include built-in salting, making brute-force attacks orders of magnitude harder.

File Integrity and Checksums

When you download software, the provider often lists a SHA-256 hash alongside the download link. After downloading, you hash the file locally and compare the result. If the hashes match, you can be confident the file was not corrupted during transfer or tampered with by a third party.

$ sha256sum ubuntu-24.04-desktop-amd64.iso
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  ubuntu-24.04-desktop-amd64.iso

Digital Signatures

Digital signature schemes do not sign the entire document directly. Instead, they hash the document first and then sign the hash. This is far more efficient because the hash is a fixed small size regardless of document length. The recipient hashes the received document, then verifies the signature against this hash. If the hashes match, the document has not been altered since it was signed.

Data Deduplication

Cloud storage systems and backup tools use hash functions to detect duplicate data. By hashing each file or data block, the system can quickly determine if the same content already exists in storage. If it does, only a reference to the existing data is stored instead of a second copy. Services like Dropbox and Google Drive use this technique to save enormous amounts of storage space.

Caching and Content Addressing

Hash functions enable content-addressable storage, where data is referenced by its hash rather than a file path or URL. Git uses SHA-1 hashes (transitioning to SHA-256) to identify every object in a repository — commits, trees, and blobs. Docker uses SHA-256 digests to identify container image layers. This approach guarantees that if the content changes, the address changes, naturally invalidating any stale caches.

Hash Functions in Code

Let us look at how to compute hash values in three popular environments. Each platform provides built-in APIs — you do not need to install third-party libraries for basic hashing.

JavaScript (Web Crypto API)

Modern browsers provide the SubtleCrypto API for cryptographic operations. It supports SHA-1, SHA-256, SHA-384, and SHA-512 (but not MD5, as it is considered insecure):

async function sha256(message) {
  const encoder = new TextEncoder();
  const data = encoder.encode(message);
  const hashBuffer = await crypto.subtle.digest("SHA-256", data);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  return hashArray.map(b => b.toString(16).padStart(2, "0")).join("");
}

const hash = await sha256("Hello, World!");
// "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"

Python (hashlib)

Python's standard library includes the hashlib module, which provides access to every common hash algorithm:

import hashlib

message = "Hello, World!"

md5_hash = hashlib.md5(message.encode()).hexdigest()
sha1_hash = hashlib.sha1(message.encode()).hexdigest()
sha256_hash = hashlib.sha256(message.encode()).hexdigest()
sha512_hash = hashlib.sha512(message.encode()).hexdigest()

print(f"MD5:    {md5_hash}")
print(f"SHA-1:  {sha1_hash}")
print(f"SHA-256:{sha256_hash}")
print(f"SHA-512:{sha512_hash}")

Node.js (crypto module)

Node.js provides the built-in crypto module for hashing. Unlike the browser API, it supports MD5 as well:

import { createHash } from "node:crypto";

function hash(algorithm, message) {
  return createHash(algorithm).update(message).digest("hex");
}

const message = "Hello, World!";

console.log("MD5:   ", hash("md5", message));
console.log("SHA-1: ", hash("sha1", message));
console.log("SHA-256:", hash("sha256", message));
console.log("SHA-512:", hash("sha512", message));

Hashing large files: For files that do not fit in memory, use streaming/incremental hashing. All three platforms support updating a hash object with chunks of data. In Node.js, you can pipe a file read stream directly into a hash object. In Python, call update() repeatedly with each chunk.

Security Best Practices

Hash functions are only as secure as how you use them. Here are the essential guidelines every developer should follow:

Do Not Use MD5 or SHA-1 for Security

Both MD5 and SHA-1 have demonstrated practical collision attacks. Using them for digital signatures, certificate validation, or any integrity-critical application leaves you vulnerable to forgery. If you encounter MD5 or SHA-1 in a codebase, treat it as technical debt that needs to be migrated to SHA-256 at minimum.

Use bcrypt or Argon2 for Password Hashing

General-purpose hash functions like SHA-256 are designed to be fast. For password hashing, fast is the enemy. An attacker with a GPU can compute billions of SHA-256 hashes per second, making brute-force attacks trivial. Password-specific algorithms solve this by being intentionally slow and resource-intensive:

  • bcrypt: Time-tested, adjustable work factor, includes built-in salt. Widely supported across languages.
  • scrypt: Memory-hard, making it resistant to GPU and ASIC attacks. Used by some cryptocurrency systems.
  • Argon2: Winner of the 2015 Password Hashing Competition. Offers tunable memory, time, and parallelism parameters. The current recommended choice for new projects.

Always Salt Your Hashes

A salt is a random value added to the input before hashing. Without salting, identical inputs produce identical hashes, making rainbow table attacks possible. Each record should have its own unique salt, stored alongside the hash. Password hashing libraries like bcrypt and Argon2 handle salting automatically.

SHA-256 as the Minimum for Integrity

For file integrity checks, digital signatures, HMAC-based authentication, and any security-sensitive hashing, SHA-256 should be your baseline. If your threat model includes nation-state adversaries or you need to protect data for decades, consider SHA-384 or SHA-512 for a larger security margin.

Validate Algorithm Choices at the System Level

Do not let user input or external data dictate which hash algorithm is used. Hardcode or whitelist acceptable algorithms in your configuration. This prevents downgrade attacks where an attacker forces your system to use a weaker algorithm.

Looking ahead: SHA-3 (Keccak) was standardized by NIST in 2015 as an alternative to the SHA-2 family. While SHA-2 remains secure, SHA-3 uses a fundamentally different internal structure (sponge construction), providing a safety net if any structural weakness is found in SHA-2. Consider SHA-3 for new systems where forward-looking security is a priority.

Try It Yourself

Reading about hash functions is one thing — seeing them in action is another. The best way to build intuition is to experiment: try hashing different inputs, change a single character and watch the output transform completely, or hash the same input with multiple algorithms to compare their output lengths.

Our hash generator tool supports MD5, SHA-1, SHA-256, SHA-384, and SHA-512. Everything runs entirely in your browser using the Web Crypto API — your data never leaves your machine.

Generate Hashes Instantly

Compute MD5, SHA-1, SHA-256, SHA-384, and SHA-512 hashes for any text or file. Fast, private, and entirely browser-based.

Open Hash Generator