I first read about the Engineyard Programming Contest yesterday and I thought it was a silly contest, winnable only through the application of raw brute force.
For some reason, I woke up this morning obsessed with it. This is despite the fact that this “competition” is basically a lottery, in which you buy tickets with basic programming skills and large amounts of computing time.
In the spirit of sharing, I have a few (fairly obvious) things I’ve noticed in an evening of messing around.
I’m not any kind of cryptographer, but from what I know about the known weaknesses in SHA-1, none of them will apply significantly to this contest. Maybe I’m wrong though.
The Avalanche effect means you don’t have to change much in the input to see a big change in the output. So making large changes (whole word permutations) is a waste of cycles.
Permutating the word list at all is almost unnecessary. One core on my 2.2Ghz Macbook pro takes 45 minutes to check all 7.7 billion combinations of printable five-character strings for a single word list combination. Once you add the possibilities for varying capitalisation in a single sentence (at least 2^40), you have more permutations in a single word list string than a single core can run in many times the 30 hours of test time. So distributing word list permutations is, at most, the “top level” job to distribute work to each cpu.
SHA-1 uses 64-byte blocks so if your total string is more than 64 bytes and the first 64 bytes don’t change, you can calculate that hash separately just once. Testing on an 85-character test string (the one from the competition blog posting), this got me from 1.6 million hash checks per second per core to 2.5 million/second/core.
Using gcc’s __builtin_popcount() and/or the x86 popcntl instruction lets you compute hamming distance in a handful of instructons.
None of this matters at all, although it’s fun to think about. Even with all these optimisations, I still have at most 16 cores (work and home) to run this on. The winner will have hundreds or thousands of parallel cores at their disposal.
Programming skills seem to only play a minor part. Several hours of optimisation only yielded me a 60% improvement compared to my original naive C program. Although, one of the posters on HN suggested he was only getting a tenth of that performance, which suggests a combination of language choice and savvy performance design may be statistically significant in the long run.
I will laugh if the winner is a shady programmer with a medium sized botnet as his or her disposal.
Does anyone have any more optimisations to share? Despite it being kinda pointless, I find this kind of thing really interesting. I honestly don’t plan to enter, except for maybe out of curiosity to see how close a single consumer computer can get to the winning results.