Code Performance Matters Again

March 18, 2009

With the advent of cloud computing, processing power is coming under the microscope more and more. Last year, you could just buy a 16-core system and be done with it, for the most part. If your code was a little inefficient, the load would be high, there really wasn't a problem. For most developers, it's not like you're writing digg and need to make sure you can handle a million page requests a day. So what if your site is a little inefficient, right?

Well think again. Now you're putting your site on "the cloud" that you've heard so much about. On the cloud, each processor cycle costs money. Google AppEngine charges by the CPU core hour, as does Mosso. The more wasted cycles in your code, the more it will cost to run it per operation. If your code uses a custom sorting function, and you went with bubble sort because "it was only 50 milliseconds slower than merge sort and I can't be bothered to write merge sort by hand" then be prepared for the added cost over a month's worth of page requests. Each second of extraneous CPU time at 50,000 page views per day costs 417 HOURS of CPU time per month.

Big-O notation hasn't really been important for the majority of programmers for the last 10 to 15 years or so. Loop unrolling, extra checks, junk variables floating around in your code, all of that stuff would just average out to "good enough" speeds once the final product was in place. Unless you're working on the Quake engine, any change that would shave off less than 200ms probably isn't worth the time it would take to re-engineer the code. Now, though, you have to think a lot harder about the cost of your inefficient code.

Developers who have been used to having a near-infinite supply of open CPU cycles need to re-think their approach to programming large or complex systems. You've been paying for public bandwidth for a long time, and it's time to think about CPU the same manner. You have a limited amount of "total CPU" that you can use per month before the AppEngine's limits kick in and you begin getting charged for it. If you're using a different host, your bill will simply go up. You need to treat this sort of thing like you would bandwidth. Minimize your access to the CPU just like you'd minimize access to the public internet, and keep your memory profiles low.

The problem with this approach is that the entire programming profession has been moving away from concentrating on individual CPU cycles. Helper classes, template libraries, enormous include files with rarely-used functions; they all contribute to the CPU and memory glut of the modern application. We, as an industry, are going to need to cut back on that. You see some strides toward this with the advent of dynamic include functions and libraries that wait to parse an include file until that object or function is actually used by the execution of the program for the first time. However, that's only the first step. If you're going to be living on the cloud, cutting down on the number of times you access your libraries isn't good enough. You need to cut down on the computational complexities of the libraries themselves. No more complex database queries to find a unique ID before you insert. No more custom hashing functions that take 300 cycles per character. No more rolling your own sorting functions. And certainly no more doing things in code that should be done in a database query.

Really good programmers are going to become more valuable than they already are once management realizes that they're paying for CPU cycles, not just "a server." When you can monetize your code efficiency, you'll have that much more leverage with managers and in job interviews. I wouldn't be surprised if, in the near future, an interviewer asked about cost algorithms as an analogy for efficiency. I also wouldn't be surprised if database strategy changed in the face of charging per CPU cycle. We've all (hopefully) been trying for third normal form on our databases, but JOINs take up a lot of CPU cycles. You may see websites in the near future that run off large denormalized tables that are updated every evening.

So take advantage of the cloud for your computing needs, but remember that it's an entirely different beast. Code efficiency is more important in these new times. Luckily, "web 2.0" has given us one good tool to decrease our CPU times. AJAX, combined with client-side JavaScript, allows a web developer to generate a web tool where the server does little more than fetch the proper data and return it. Searching, sorting, and paging can all be done on the client side given a well designed application. By moving a lot of the "busy work" to the client, you can save a lot of CPU cycles on the server.

For all those application developers out there, who don't have a client to execute some code for you, you're just going to have to learn to write more efficiently I guess. Sorry.

-Daniel

Comments

March 21st, 2009 at 12:05pm

Maybe some insight from a developer here.
I'm not expecting clouds to really change a lot as you generally should focus on writing good, easy to use and extend code (programmer time is still very expensive)

That said and as always, after you are "done" there is always a phase where you should profile your application and then focus your energy on the 5% of your code that ends up taking most of the time.
In your example the bubble-sort would stand out as an issue and would then be replaced with a better version, but I've also seen a lot of completely unrelated bottlenecks that we never suspected while looking at the code.
So instead of trying to preoptimize, spend more time on writing it properly and with proper abstractions so that it becomes easy to fix your code.

I'm somewhat with you on denormalized databases for heavy webbased applications where performance is more important than storage efficiency or good design and I expect "Drizzle" to make a good showing in the web world once it's finally done.

PS: Joins don't have to be expensive if you properly reduce the results in your query and JOIN on indexed collumns

March 24th, 2009 at 3:30pm

What goes around comes around...back in the late '80's when I was writing COBOL to bring home the bacon, the IBM 3090 mainframe to which we were all connected was billed to us based on CPU time. Naturally, my manager insisted on CPU-time friendly code!

March 26th, 2009 at 1:30pm

I dont agree at all. Lets go back to a 2 years where cloud started more or less. Amazon was one of the fist to evolve this. They charged per use, so little Joe could just test the engine he wanted and not having to pay a fortune. But soon all cloud providers realized that only little Joe wasnt enought. People that need heavy computing had dedicated servers like here on SL, so they where not attracting that target because if its costs a fortune if tyou where to use a full server on cloud, the business model was build up on small slices and pay per use, not on 1 client using a full server 30 days. So they need to attract those customers as well, where the money is, so they started to offer plans. They went to pay per MB and Per GB unit to pay a fixed for a specific amount. Now they are offering big splices that are just individual servers you pay a fixed monthly costs. Thats just brings us again back to dedicated server. People dont want to have a surprise at the end of the month on their bills. They want fixed plan and limits. Thats why cloud will eventually lead back to just leasing a dedicated server where you can use at full potentional the hardware. Dont get me wrong, cloud is a business model that will work but for testers, or people that need to deploy services only for a fixed amount of time, lets say some days. It will never work for business that need to have monthly computing needs which are powerfull. If that is the case nobody would be renting servers anymore, and most do, actually more and more people are actually and leasing servers, they are jumping from shared model like shared hosting to dedicated models like lease servers and vps. So cloud is shared, if you need dedicated cloud its not cloud anymore, its just your own servers !!!! And there is a big problem with cloud, the more cheaper cloud computing will be the more powerfull servers you will be able to buy here at SL. Intel, and Hardware companies dont want to only have 3 big clients . They want the world. If cloud is the future then nobody will be buying computers anymore and we are just going to have a console to access the cloud. I dont think so. People need and want control. Its human nature. They will probably be laptops and small handheld devices with powerfull computing needs on your hand but still you are going to log into external services. Cloud will work for specific services like electricity works today but people will still want and need to have full dedicated computing on their hands, on their PDA, Cell, and laptops, cars, etc. It will be a mix. Google is one of the biggest promotors since they think they will host the world like this. That would mean Softlayer and most business will be out. Do you think so? I dont. Hardware will always be cheaper, and people will always get better and more powerfull things each day. That means if Cloud costs 50$ a month for the equivalent of a small celeron with 512 RAM you will probably get for 50$ a full server with 2 GB of ram. Cloud will always be more costly then just connecting a single box. The reason is because if requires more layers of complex. Simple things will be cheaper always. So that means cloud will focus on the pay per use model only. Im sure most people dont even know how much computing they use or need. They would not jump into a model that can dry them down to the last penny. As for this article. Programmers will just say "Dont use small clouds for this" just like Windows Vista doesnt work with 64 RAM.

March 29th, 2009 at 5:42pm

Gary,

You would be surprised. iSeries still had this system when I last worked on them back in 2003.
Your iSeries was preprogrammed for an amount of "interactive" cpu time and an amount of "batch" cpu time.
A lot green screen applications are interactive and thus the machines got more expensive as you bought them with more interactive time even though the hardware was perfectly the same.

Some companies actually make software that allows running interactive applications twice (once as normal and once as a batch application with it's own telnet 5250 frontend)

Horribly off topic, I know.

In general you need to code for common sense, good software design and then you can start trimming the fat as you need based on real usage data. This does mean you need to performance test the application in a realistic environment and not just throw it in production and see what happens.... hmmm... if only..

Leave a Reply

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • You can enable syntax highlighting of source code with the following tags: <pre>, <blockcode>, <bash>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <php>, <python>, <ruby>. The supported tag styles are: <foo>, [foo].
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.
Categories: 

Comments

March 21st, 2009 at 12:05pm

Maybe some insight from a developer here.
I'm not expecting clouds to really change a lot as you generally should focus on writing good, easy to use and extend code (programmer time is still very expensive)

That said and as always, after you are "done" there is always a phase where you should profile your application and then focus your energy on the 5% of your code that ends up taking most of the time.
In your example the bubble-sort would stand out as an issue and would then be replaced with a better version, but I've also seen a lot of completely unrelated bottlenecks that we never suspected while looking at the code.
So instead of trying to preoptimize, spend more time on writing it properly and with proper abstractions so that it becomes easy to fix your code.

I'm somewhat with you on denormalized databases for heavy webbased applications where performance is more important than storage efficiency or good design and I expect "Drizzle" to make a good showing in the web world once it's finally done.

PS: Joins don't have to be expensive if you properly reduce the results in your query and JOIN on indexed collumns

March 24th, 2009 at 3:30pm

What goes around comes around...back in the late '80's when I was writing COBOL to bring home the bacon, the IBM 3090 mainframe to which we were all connected was billed to us based on CPU time. Naturally, my manager insisted on CPU-time friendly code!

March 26th, 2009 at 1:30pm

I dont agree at all. Lets go back to a 2 years where cloud started more or less. Amazon was one of the fist to evolve this. They charged per use, so little Joe could just test the engine he wanted and not having to pay a fortune. But soon all cloud providers realized that only little Joe wasnt enought. People that need heavy computing had dedicated servers like here on SL, so they where not attracting that target because if its costs a fortune if tyou where to use a full server on cloud, the business model was build up on small slices and pay per use, not on 1 client using a full server 30 days. So they need to attract those customers as well, where the money is, so they started to offer plans. They went to pay per MB and Per GB unit to pay a fixed for a specific amount. Now they are offering big splices that are just individual servers you pay a fixed monthly costs. Thats just brings us again back to dedicated server. People dont want to have a surprise at the end of the month on their bills. They want fixed plan and limits. Thats why cloud will eventually lead back to just leasing a dedicated server where you can use at full potentional the hardware. Dont get me wrong, cloud is a business model that will work but for testers, or people that need to deploy services only for a fixed amount of time, lets say some days. It will never work for business that need to have monthly computing needs which are powerfull. If that is the case nobody would be renting servers anymore, and most do, actually more and more people are actually and leasing servers, they are jumping from shared model like shared hosting to dedicated models like lease servers and vps. So cloud is shared, if you need dedicated cloud its not cloud anymore, its just your own servers !!!! And there is a big problem with cloud, the more cheaper cloud computing will be the more powerfull servers you will be able to buy here at SL. Intel, and Hardware companies dont want to only have 3 big clients . They want the world. If cloud is the future then nobody will be buying computers anymore and we are just going to have a console to access the cloud. I dont think so. People need and want control. Its human nature. They will probably be laptops and small handheld devices with powerfull computing needs on your hand but still you are going to log into external services. Cloud will work for specific services like electricity works today but people will still want and need to have full dedicated computing on their hands, on their PDA, Cell, and laptops, cars, etc. It will be a mix. Google is one of the biggest promotors since they think they will host the world like this. That would mean Softlayer and most business will be out. Do you think so? I dont. Hardware will always be cheaper, and people will always get better and more powerfull things each day. That means if Cloud costs 50$ a month for the equivalent of a small celeron with 512 RAM you will probably get for 50$ a full server with 2 GB of ram. Cloud will always be more costly then just connecting a single box. The reason is because if requires more layers of complex. Simple things will be cheaper always. So that means cloud will focus on the pay per use model only. Im sure most people dont even know how much computing they use or need. They would not jump into a model that can dry them down to the last penny. As for this article. Programmers will just say "Dont use small clouds for this" just like Windows Vista doesnt work with 64 RAM.

March 29th, 2009 at 5:42pm

Gary,

You would be surprised. iSeries still had this system when I last worked on them back in 2003.
Your iSeries was preprogrammed for an amount of "interactive" cpu time and an amount of "batch" cpu time.
A lot green screen applications are interactive and thus the machines got more expensive as you bought them with more interactive time even though the hardware was perfectly the same.

Some companies actually make software that allows running interactive applications twice (once as normal and once as a batch application with it's own telnet 5250 frontend)

Horribly off topic, I know.

In general you need to code for common sense, good software design and then you can start trimming the fat as you need based on real usage data. This does mean you need to performance test the application in a realistic environment and not just throw it in production and see what happens.... hmmm... if only..

Leave a Reply

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • You can enable syntax highlighting of source code with the following tags: <pre>, <blockcode>, <bash>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <php>, <python>, <ruby>. The supported tag styles are: <foo>, [foo].
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.