sequenceDiagram
actor Alice
participant Hash Function
actor Craig
participant Cracking Rig
Alice ->> Alice: Make 𝐤
Alice ->> Hash Function: Encrypt my password 𝐤.
Hash Function ->> Alice: hash(𝐤) = 𝐞
Alice ->> Craig: Here's my encrypted password 𝐞.
Craig ->> Cracking Rig: Find the input used to make 𝐞.
loop For all choices of 𝐤'
Cracking Rig ->> Cracking Rig: Does hash(𝐤') = 𝐞?
end
alt Craig finds 𝐤
Cracking Rig ->> Craig: Found 𝐤
Craig ->> Alice: Your password is 𝐤. I win!
else Craig can't find 𝐤
Craig ->> Cracking Rig: Stop trying to reverse 𝐞.
Craig ->> Alice: I can't figure out your password. You win.
end
The Mathematics Behind Password Safety
Let’s get cracking
I recently wrote a gentle intro on password safety (including password management tool recommendations) for the 32-Bit Café’s guide on staying safe online. Since that article’s more geared for a non-technical audience, I didn’t want to overwhelm any potential readers with technical jargon or mathematical equations. However, it would be a shame not to talk about the mathematics behind password cracking and secure password generation on my own website.
I learned about the importance of password safety the hard way when one of my social media accounts was hacked. Back then, I was an easy target for attackers because I used the same set of weak passwords for my accounts on different websites. Someone guessed, or cracked, my social media account’s password and converted my profile into a cesspool of sunglasses advertisements and shady links. While I fortunately recovered my account in time, my frustration from the attack kicked me into overdrive. It took me about a day or two to set up a password manager, reset all my passwords, and delete the “advertisements” that “I” spammed on my friends’ pages.
What strategies can we use to prevent attacks like this from happening? Personally, I like to learn how a cryptographic system works with thought experiments, puzzles, and games; passwords are no different. So, let’s imagine a password-cracking “game” between two players: Alice, the password defender, and Craig, the password cracker. Understanding this game from both Alice and Craig’s perspectives can help us learn about how to both execute and prevent password-related attacks in the future.
Rules of the Password Cracking Game
Here’s how the “game” is played:
- Alice makes up a password \(\mathbf{k} = \{k_{1}, \ldots, k_{N}\}\). The password \(\mathbf{k}\) contains \(N\) characters; Alice can use any character in the table below to make her password.
- Alice applies a cryptographic hash function to encrypt her password \(\mathbf{k}\) before sending it to Craig. While the hash function easily produces the same output from the same input, figuring out the input from the hash function’s output is nearly impossible.
- Alice sends the hashed password \(\mathbf{e}\) to Craig to see if he can figure out the original password \(\mathbf{k}\). She may give him a hint about how it was made.
- Craig uses his password cracking rig to try to undo the cryptographic hash function. It checks if the hash of a password candidate \(\mathbf{k}^{\prime}\) is equal to the encrypted password \(\mathbf{e}\). By doing this, the cracking rig only learns whether \(\mathbf{k}^{\prime}\) is equal to \(\mathbf{k}\). This process repeats many times.
- If the cracking rig finds the original password \(\mathbf{k}\), he tells Alice her password \(\mathbf{k}\) and wins the game.
- If the cracking rig can’t find the original password \(\mathbf{k}\), then he tells Alice he can’t find the password and loses the game.
| Character Type | Characters | Count |
|---|---|---|
| Lowercase letters | abcdefghijklmnopqrstuvwxyz |
26 |
| Uppercase letters | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
26 |
| Numbers | 0123456789 |
10 |
| Punctuation | ␠1 and !"#$%&'()*+,-./:;>=<?@[\]^_`{|}~ |
33 |
| Total | 95 |
We can draw the game’s sequence diagram below.
Craig can both alternate between password-guessing methods and use whatever information Alice gives him about the password to speed up his password-guessing process.
Password Cracking with Craig
First, let’s observe the password-cracking game from Craig’s perspective. What are the different methods he can use to get Alice’s password? Let’s assume that Craig’s password cracking rig can compare the encrypted password \(\mathbf{e}\) to the hashes of one trillion, or \(1 \times 10^{12}\), different password candidates in a single second.
Brute Force Attack
In the first game, Alice chooses the password abc123 and tells Craig that her password has 6 characters. Craig can crack short passwords like this one with a brute force attack: he can try every single password possible in the hopes of obtaining the correct password through pure trial and error.2
Alice: “My password’s hash is
2c6c8ab6ba8b9c98a1939450eb4089ed. It’s 6 characters long.”
Craig: “It’sabc123.”
If Craig knows that Alice’s password is \(N\) characters long, he will have to check at most \(95^{N}\) different passwords before one correct guess. In this case, that means \(7.35 \times 10^{11}\) different strings to check… which takes about 0.735 seconds to complete on his cracking rig.
\[ \frac{95^{6} \text{ [hash]} \text{}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \approx 0.735 \text{ [s]} \]
While this naive strategy would work for small passwords, the time needed to crack longer passwords increases exponentially. For example, a seven-character password would take about 70 seconds for Craig to crack by brute force, and an eight-character password would take about 111 minutes.
Dictionary Attack
In the second game, Alice chooses the password armadillo, and tells Craig that her password is a word she found in the dictionary. With this information, Craig wouldn’t need to search all possible passwords. Instead, he could go through the list of words in the dictionary to find Alice’s password. This variation on the brute-force method that cycles through lists of words is evidently called a word list attack or dictionary attack.3 Alternatively, if Alice’s password is common enough, Craig could just as easily comb through lists of common passwords that he found online to find it.
Alice: “My password’s hash is
db8ddc6bcba7c583be30fef8f81c35fa. It’s a word I picked from the dictionary, so–”
Craig: “It’sarmadillo.”
Alice: “Oh wow, that fast?”
The Moby Project’s word list contains 639,995 English words,4 and SecLists has a list of the top million most common passwords.5 It would take Craig a little over a microsecond to comb through both word lists.
\[ \frac{1.640 \times 10^{6} \text{ [hash]} \text{}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \approx 1.64 \times 10^{-6} \text{ [s]} \]
Rule-Based Attack
In the third game, Alice wants to choose the word aardvark as her password. What if Alice based her password on a dictionary word, but she changed it so that it would be unrecognizable to a dictionary attack? Maybe she substitutes characters, combines two words, or truncates the length of a longer word to fit inside the desired password length. Regardless of what rule Alice comes up with to protect her password from a dictionary, Craig can still crack Alice’s password with a rule attack. This expansion on the dictionary attack guesses different modifications of words provided from the original dictionary attack.6
Alice: “My password’s hash is
ee7c795c3f00e2f2e1e6cb7d9e4a1f2d. I took an ordinary word from the dictionary and made some elite changes.”
Craig: “It’sA4rdv4rk. Substituting A’s for 4’s is so last decade.”
Mask Attack
In the fourth game, Alice wants to choose the word axolotl as her password; however, she knows that Craig would crack this, she got her word from the dictionary. So, she instead adds two numbers before her password, two capital letters after, and a special character at the end and tells Craig how she made her password. Craig’s response is to use a mask attack. Using the information that Alice gave him, Craig once again can narrow the range of Alice’s possible passwords to brute force his way into the correct solution.7
Alice: “My password’s hash is
db108ac097de42ff0588aa0c5a932788. I’ve added two numbers before 7 lowercase letters in the middle, and added two uppercase letters and a special character in the end.”
Craig: “Awesome, thanks for the tip. Let me just set up my cracking rig to use the mask?d?d?l?l?l?l?l?l?l?u?u?s.”
Craig: “I figured it out - it’s19axolotlJX@.”
If Craig had to crack this password with brute force alone, it would have taken about seventeen thousand years. Thanks to the hint Alice gave him, he’s whittled down his rig’s cracking time by ten orders of magnitude to under half a minute.
\[ \begin{aligned} \text{brute force:} \quad & \frac{95^{12} \text{ [hash]}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \\ &\approx 5.404 \times 10^{11} \text{ [s]} \\ \\ \text{mask attack:} \quad & \frac{10^{2} \times 26^{7} \times 33\text{ [hash]}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \\ &\approx 26.505 \text{ [s]} \\ \\ \end{aligned} \]
Hybrid Attack
In the fifth game, Alice wants to try to base her password on another word, but this time she won’t tell Craig as much information.
Alice: “My password’s hash is
f7c4d717530a8f89d2173d1a394764d4. It’s made from a word in the dictionary and three numbers.”
Craig: “Your password isabalone915.”
While tacking on extra numbers seems like it’ll help increase a password’s strength, it only increased the already-small dictionary attack cracking time by 3 orders of magnitude from over a microsecond to over a millisecond.
\[ \frac{ \left( 1.64 \times 10^{6} \right) \times \left( 10^{3} \right) \text{ [hash]} \text{}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \approx 1.64 \times 10^{-3} \text{ [s]} \]
Password Generation with Alice
Let’s play this password-guessing game from Alice’s perspective. How can she make complex passwords that are difficult for Craig to crack?
Information Entropy
In information theory, the entropy \(H_{X}\) of a discrete random variable \(X\) quantifies its “uncertainty” by determining the information needed to specify its distribution over its set of outcomes \(\mathcal{X}\). The entropy definition is dependent on the different outcomes of random variable’s probabilities, \(\mathbb{P}(X = x)\).
For example, let’s say that Alice rolls a fair six-sided die that has the set of labeled sides \(\mathcal{R} = \{1, 1, 2, 2, 2, 3\}\) to choose her password. She can define the random variable \(R\) as the action of rolling the die. She can calculate the probabilities of rolling each die face with the table below.
| Die Face, \(r\) | Probability, \(\mathbb{P}(R = r)\) |
|---|---|
| 1 | \(\frac{1}{3}\) |
| 2 | \(\frac{1}{2}\) |
| 3 | \(\frac{1}{6}\) |
How would she quantify the “uncertainty” of a single roll of her die? If her die’s faces had all 1’s, then she would be certain of its outcome, so a probability of 1 has an uncertainty of 0. The less certain she is of her die’s outcome, the higher its uncertainty value should be. So, she defines the “uncertainty” of her die’s outcome with the following equation.
\[ \mathbb{I}(R = r) = -\log_{2}(\mathbb{P}(R = r)) \]
The choice of the logarithm’s base determines the units of information. When the base equals 2, then the uncertainty and entropy are measured in shannons (a.k.a., bits of entropy), the amount of information contained in a single bit with equal probability of storing a “1” or a “0”. Alice can now define the entropy of this die-rolling password generation method by using the probabilities of each outcome as weights for a weighted average of the outcomes’ uncertainties.8
\[ \begin{aligned} H_{R} &= \sum_{r \in \mathcal{R}} \mathbb{P}(R = r) \mathbb{I}(R = r) \\ &= \frac{1}{3} \left(- \log_{2} \frac{1}{3} \right) + \frac{1}{2} \left(- \log_{2} \frac{1}{2} \right) + \frac{1}{6} \left(- \log_{2} \frac{1}{6} \right) \\ &\approx 1.459 \text{ [Sh]} \end{aligned} \]
Calculating the uncertainty measure of a password requires knowing the probability distribution of all passwords ever. While she could assume that every password has equal probability of occurring, she knows that’s not true, since the distribution of user-chosen passwords of length \(N\) isn’t uniform.9 Craig also knows this isn’t true; he already checked Alice’s passwords against common ones like password, 123456, and dragon.10 Alice can’t rely on just the definition of information entropy for checking password strength, so she will need to think of password generation methods whose outputs are difficult to cross-reference with common password lists.11
Choosing Random Characters
Alice wants to make her password \(\mathbf{k}\) by choosing \(N\) characters at random. Since each character has 95 different outcomes, Alice’s password can be one of \(95^{N}\) different possible passwords with equal probability. If she defines the random variable \(K_{N}\) as making a password with this method, then she can calculate its entropy in terms of \(N\).
\[ \begin{aligned} H_{K_{N}} &= -95^{N} \left( \frac{1}{95^{N}} \log_{2} \left( \frac{1}{95^{N}} \right) \right) \\ &= N\log_{2}(95) \text{ [Sh]} \end{aligned} \]
This equation helps Alice notice that increasing the length of a password \(N\) increases the entropy more than increasing its range of characters. This is because the password length \(N\) has a linear relationship with the entropy \(H\) while the password character range (equal to 95) has a logarithmic relationship with the entropy \(H\).
She lets \(N = 16\) and chooses the password 3-X'S-?4NL9vn,?M. With the assumption that all 16-character passwords are equally likely, its information content is an impressive 105.118 shannons. It contains no words, no visible patterns, and hasn’t shown up in a password list yet, so Craig cannot rely on anything other than brute force to crack it. It would take Craig over 1 trillion years for his crack this password with his password-cracking rig.
\[ \frac{95^{16} \text{ [hash]} \text{}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \approx 4.401 \times 10^{19} \text{ [s]} \]
Alice: “OK, Craig, my password’s hash is
c117a55b6ba8f11fc4220b7b68519e78and is sixteen random characters. Have you figured it out yet?”
Craig: “I’ve been running my computers nonstop for a week, and I still haven’t figured it out.”
Alice: “Haha, yeah! You’ll never figure out that my password was… uh… I think I actually forgot it myself.”
Passphrases
While randomly generated passwords are secure, they are an absolute pain for Alice to remember. Alice knows that she can’t just use a password based on one word; what if she chains words together to form a passphrase? Alice brings back her dictionary and randomly selects four words to combine. She chooses the password joyous_festival_beforested_parapet. While it’s easy for her to memorize by picturing a beautiful rooftop party in the forest, it has a less impressive information content of 85.142 shannons (again, assuming all combinations are equally likely).
Alice: “Final game - My password’s hash is
e938c88ef7c5600b3c56143811be3834, and it’s four words I strung together with underscores. Can you guess it?”
Craig: “Still no dice. You win.”
Alice: “Aw, yeah!”
It would have taken Craig at most 5827 years to crack that password. While significantly less than one trillion years, it would be a waste of Craig’s time to try to crack it.
\[ \frac{(6.39 \times 10^{5})^{4} \text{ [hash]}}{1} \times \frac{1 \text{ [s]}}{1 \times 10^{12} \text{ [hash]}} \approx 1.667 \times 10^{11} \text{ [s]} \]
Conclusion
There are more types of password attacks than the ones that I’ve discussed in this article. These are all still important to learn about and safeguard against.
- Credential Stuffing: The attacker uses login credentials taken from a data breach and tries it on other websites to access the victim’s account. If you use the same password on other sites, your other accounts will become compromised if your login credentials get exposed by a data breach. The best ways to prevent this attack is using different secure passwords for each website and enabling multifactor authentication.12
- Man-In-The-Middle (MITM) Attack: The attacker intercepts the connection between the user and the authentication service.13
- Keylogging: Keyloggers are a type of malware that record the victim’s key presses and send the logs to the attacker. Computers infected with keyloggers can export login credentials typed with the keyboard to the attacker.
- Phishing: The victim is tricked into entering their login credentials into a fake website so that an attacker can get their passwords.14 You should always double-check website URLs before logging into accounts.15
- Password Spraying: After attempting to crack a first site’s password, an attacker tries the same password on other sites before returning to the first site. This spreads out the repeated attempts on the first website to circumvent lockout policies.16
- Shoulder Surfing: The attacker spies on their victim and watches them enter login credentials. What information about the login credentials the attacker learns (e.g., password length) is dependent on the locations of the attacker and the victim.
- Video Recording: The attacker uses the victim’s device camera to analyze how they enter the login credentials.17
- Keyboard Acoustic Side-Channel Attack: This one’s my personal favorite. With deep learning, an attacker can accurately predict what their victim’s password is by listening to the victim’s keyboard strokes through a video call or a nearby phone.18
OK, so what was the point of this thought experiment? The only reason why Alice gave hints for each of her passwords to Craig was so that I could explain each type of attack separately. However, a real attacker wouldn’t need hints because they can run several variations of the same attack in parallel to crack passwords more efficiently. Attack and defense are two sides of the same coin in cryptography, so it’s important to know one side to bolster your knowledge of the other.
References
Footnotes
This character “
␠” is a substitute for “space”.↩︎Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering.↩︎
Raza et al., “A Survey of Password Attacks and Comparative Analysis on Methods for Secure Authentication”.↩︎
Raza et al., “A Survey of Password Attacks and Comparative Analysis on Methods for Secure Authentication”.↩︎
Harrison, Toreini, and Mehrnezhad, “A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards”.↩︎