“A nearly impenetrable thicket of geekitude…”

Imperfect

Posted on November 1, 2009 at 13:16

Many of us, particularly if we have been programmers, have got into the habit of regarding computers as flawless execution engines. People with more of an electronics background tend to be a bit more sceptical, I think.

I’ve been trying to figure out why I couldn’t burn a Fedora 11 DVD to upgrade one of my oldest machines for several months now. I had checked the SHA-256 hash of the download then copied the file from the server where I run BitTorrent across to a desktop machine’s external hard drive. The burned disk verified against the image on the machine that created it but the installation self-test always failed, claiming the disk was corrupt. I tried burning from the same image on another machine; I tried burning at different speeds; I tried different blank DVDs. No change.

Finally, today, I thought to try verifying the hash on the copied image rather than the original one. It was different. Comparing the original download with the copy, I discovered two locations in the copy where byte 0x12 of a block had dropped the 0x08 bit.

It’s probably not a coincidence that the machine on which I made the corrupted copy has recently come back from a couple of extended “warranty repair” holidays during which first the main system logic board and then (at my strong and repeated insistence) the actual DRAM were replaced. The machine had been having some intermittent problems involving applications shutting down unexpectedly; these looked like memory issues to me but the manufacturer’s diagnostics had always given it a clean bill of health. As an old-school computer guy, of course, I know that the manufacturer’s diagnostics never detect real memory issues.

The moral of the story? I’m not sure there is one: “faulty hardware sometimes gives the wrong answer” seems rather an obvious thing to say. On the other hand, if you are aware of the concept of metastability in electronics, you know that there’s no such thing as perfect hardware as long as the logic needs to talk to the outside world. So we can reduce the frequency of odd weirdness to the point where we never expect to encounter it, but we can never make it go away altogether.

Tags: