I learned the difference between haphazard and random a long time ago, on a
university statistics course. Since then, I’ve been wary of inventing passwords
by just “thinking random” or using an obfuscation algorithm on something
memorable (“replace Es by 3s, replace Ls by 7s”, or whatever). The concern is
that there is really no way to know how much
entropy
there is in such a token (in the information
theoretic sense), and it is probably less than you might think. People tend to
guess high when asked how much entropy there is in something; most are surprised
to hear that English text is down around one bit per letter, depending on the
context.
If you know how much information entropy there is in your password, you have a
good idea of how much work it would take for an attacker to guess your password
by brute force: N
bits of entropy means they have to try 2^N
possibilities.
One way to do this that I’ve used for several years is to take a fixed amount of
real randomness and express it in hexadecimal. For example, I might say this to
get a password with 32 bits (4 bytes) of entropy:
$ dd if=/dev/random bs=1 count=4 | od -t x1
...
0000000 14 37 a8 37
A password like 1437a837
is probably at the edge of memorability for most
people, but I know that it has 32 bits worth of strength to it. So, what is one
to do if there is a need for a stronger password, say one containing 64 bits of
entropy? Certainly d4850aca371ce23c
isn’t the answer for most of us.
When I was faced with a need to generate a higher entropy — but memorable — password recently, I remembered a technique used by some of the one-time password systems and described in RFC 2289. This uses a dictionary of 2048 (2^11
) short English words to represent fragments of a 64-bit random number; six such words suffice to represent the whole 64-bit string with two bits left over for a checksum. In this scheme, our unmemorable d4850aca371ce23c
becomes:
RUSE MET LORD CURT REEL ION
I couldn’t find any code that allowed me to go from the hexadecimal representation of a random bit string to something based on RFC 2289, so I wrote one myself. You can download SixWord.java if you’d like to see what I ended up with or need something like this yourself.
The code is dominated by an array holding the RFC 2289 dictionary of 2048 short
words, and another array holding the 27 test vectors given in the RFC. When
run, the program runs the test vectors then prompts for a hex string. You can
use spaces in the input if you’re pasting something you got out of od
, for
example. The result should be a six word phrase you might have a chance of
remembering. But if you put 64 bits worth of randomness in, you know that
phrase will still have the same strength as a password as the hex gibberish did.