This week I have been working with a colleague on optimising a piece of code. Last week, what we started out with (a first implementation) was taking over 80ms to run in my test case. We had confidence that the overall algorithm was the correct one for the job, so even without that (usually most fruitful) source of optimisation, and just by changing the code around, converting a recursion to a tail-recursion and then a loop, inlining some other functions and simplifying the resultant code, and implementing a simple caching scheme, we have achieved a one-hundredfold speedup. The same test case runs in about 0.8ms. And that’s just having altered the C++ – we haven’t even started on the highly platform-specific assembly-level optimisations that we could do.
There can’t be many jobs where you can say at the end of the week that you made something one hundred times better.