Code Performance Matters AgainPosted by Daniel McAloon in Technology
With the advent of cloud computing, processing power is coming under the microscope more and more. Last year, you could just buy a 16-core system and be done with it, for the most part. If your code was a little inefficient, the load would be high, there really wasn’t a problem. For most developers, it’s not like you’re writing digg and need to make sure you can handle a million page requests a day. So what if your site is a little inefficient, right?
Well think again. Now you’re putting your site on “the cloud” that you’ve heard so much about. On the cloud, each processor cycle costs money. Google AppEngine charges by the CPU core hour, as does Mosso. The more wasted cycles in your code, the more it will cost to run it per operation. If your code uses a custom sorting function, and you went with bubble sort because “it was only 50 milliseconds slower than merge sort and I can’t be bothered to write merge sort by hand” then be prepared for the added cost over a month’s worth of page requests. Each second of extraneous CPU time at 50,000 page views per day costs 417 HOURS of CPU time per month.
Big-O notation hasn’t really been important for the majority of programmers for the last 10 to 15 years or so. Loop unrolling, extra checks, junk variables floating around in your code, all of that stuff would just average out to “good enough” speeds once the final product was in place. Unless you’re working on the Quake engine, any change that would shave off less than 200ms probably isn’t worth the time it would take to re-engineer the code. Now, though, you have to think a lot harder about the cost of your inefficient code.
Developers who have been used to having a near-infinite supply of open CPU cycles need to re-think their approach to programming large or complex systems. You’ve been paying for public bandwidth for a long time, and it’s time to think about CPU the same manner. You have a limited amount of “total CPU” that you can use per month before the AppEngine’s limits kick in and you begin getting charged for it. If you’re using a different host, your bill will simply go up. You need to treat this sort of thing like you would bandwidth. Minimize your access to the CPU just like you’d minimize access to the public internet, and keep your memory profiles low.
The problem with this approach is that the entire programming profession has been moving away from concentrating on individual CPU cycles. Helper classes, template libraries, enormous include files with rarely-used functions; they all contribute to the CPU and memory glut of the modern application. We, as an industry, are going to need to cut back on that. You see some strides toward this with the advent of dynamic include functions and libraries that wait to parse an include file until that object or function is actually used by the execution of the program for the first time. However, that’s only the first step. If you’re going to be living on the cloud, cutting down on the number of times you access your libraries isn’t good enough. You need to cut down on the computational complexities of the libraries themselves. No more complex database queries to find a unique ID before you insert. No more custom hashing functions that take 300 cycles per character. No more rolling your own sorting functions. And certainly no more doing things in code that should be done in a database query.
Really good programmers are going to become more valuable than they already are once management realizes that they’re paying for CPU cycles, not just “a server.” When you can monetize your code efficiency, you’ll have that much more leverage with managers and in job interviews. I wouldn’t be surprised if, in the near future, an interviewer asked about cost algorithms as an analogy for efficiency. I also wouldn’t be surprised if database strategy changed in the face of charging per CPU cycle. We’ve all (hopefully) been trying for third normal form on our databases, but JOINs take up a lot of CPU cycles. You may see websites in the near future that run off large denormalized tables that are updated every evening.
For all those application developers out there, who don’t have a client to execute some code for you, you’re just going to have to learn to write more efficiently I guess. Sorry.