Spindle Theory

March 9, 2009

Comments

March 12th, 2009 at 4:05pm

I'm surprised you didn't plug softlayer's storagelayer iscsi service. It's a good way to get a bunch of spindles/tps without having to install a bunch of disks in your local machine.

Not to mention private network + portable ips + iscsi + xen server with live migration == really sweet.

March 18th, 2009 at 5:43pm

Just a quick note - you've called it RAID 10, but described RAID 0+1, RAID 10 is N RAID1 mirrored pairs, with RAID0 striping across them.

There are a few advantages of RAID 10 over RAID 0+1 even though they are similar - namely, proper RAID10 handles disk failures with less I/O impact overall, and can potentially handle 2 disk failures (if the right two disks fail), as opposed to RAID 0+1 where two simultaneous disk failures will certainly destroy the array.

Leave a Reply

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • You can enable syntax highlighting of source code with the following tags: <pre>, <blockcode>, <bash>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <php>, <python>, <ruby>. The supported tag styles are: <foo>, [foo].
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Spindle: the axis to which hard drive platters are attached.

I'm going to start this blog off with making a statement: 250 + 250 > 500. Some of you already know where this is going and you're sitting there thinking 'well, duh'. Other readers probably think I've just failed basic math. If you're in the first group, scan through and see if I bring up something you haven't thought of. If you're in the second group, this article could be useful to you.

Let us suppose you are running a medium popularity website. You have a forum, a database and you serve a bunch of video or image data. You knew that for content plus backups you'd need about 500GB. You have two hard drives, 2 x 250 GB. Content is on drive zero, and drive one is used for holding backups (you are routinely backing up, aren't you?). Being a good and noble system administrator you've kept an eye on your resource usage over time and lately you've noticed that your load average is going up.

Say you were to run a 'top' and get back something like this:

Top

First a couple of caveats:

1) For the astute in the class... yes, I've made up the numbers above.. I don't have a distressed machine handy but the made up numbers come from what I've seen in actual cases.

2) A load of 15 may not be bad for your machine and workload. The point here is that if your load is normally 5 and lately its been more like 15 then something is going on. It is all about knowing what is normal for your particular system.

So, what is top saying? Its saying that on average you've got 14 or 15 things going on and wanting to run. You'll notice from the swap line that the machine isn't particularly hitting swap space so you're probably not having a RAM issue. Lets look closer at the CPU line.

Cpu(s): 10.3% us, 5.7% sy, 0.0% ni, 15.0% id, 80.3% wa, 0.0% hi, 0.0% si

10% user time, 5% system time.. doesn't seem so bad. Even have 15% idle time. But wait, what do we have here.. 80% wa? Who is wa and why is he hogging all the time? wa% is the percentage of time your system is spending waiting on an IO request to finish. Frequently this is time spent waiting on your hard drive to deliver data. If the processor can't work on something right now (say because it needs some data) that thing goes on the run stack. Well what you can end up with is that you have a bunch of processes on the run stack because the CPU is waiting on the hard drive to cough up some data for each one. The hard drive is doing its best but sometimes there is just too much going on.

So how do we dig down further? So glad you asked. Our next friend is going to be 'iostat'. The iostat command will show you which hard drive partitions are spending most of the time processing requests. Normally when I run it I'm not concerned with the 'right then' results or even the raw numbers... rather I'm looking for the on-going action trend so I'll run it as 'iostat 5' which tells it to re-poll every 5 seconds.

isStat

(again I've had to fudge the numbers)

So if you were running 'iostat 5' and over time you were seeing the above formation what this tells you is that disk0 (or hda, in the chart) is doing all the work. This makes sense.. previously we discussed that your setup is disk0 - content, disk1 - backups. What we know so far is that disk0 is spinning its little heart out servicing requests while disk1 is sitting back at the beach just watching and giggling. This is hardly fair.

How do you right this injustice? Well, that depends on your workloads and whether you want to (or can) go local or remote. The general idea is going to be to re-arrange the way you do your storage to spread the workload around to multiple spindles and you could see a nice gain in performance (make sure a thing and its backup is NOT on the same disk though). The exact best ideas for your situation is going to be a case-by-case thing. If you have multiple busy websites you might just start with putting the busiest one on disk1, then see how it goes. If you have only one really active site though then you have to look at how that workload is structured. Are there multiple parts to that site which could be spread between disks? The whole point of all this is that you want to eliminate bottlenecks on performance. A single hard drive can be just such a bottleneck.

If the workload is largely writes (most of the time files being created/uploaded) then you're looking at local solutions such as making better use of existing drives in the system or adding new drives to the system. Depending on what you are doing it might be possible to jump up to faster drives as well, say 15,000 RPM drives. I should include mention here of RAID0. RAID0 is striping without parity. What you have is multiple physical drives presented to the operating system as one single unit. What happens is that as a file is written, parts of it end up on different drives so you have multiple spindles going for each request. This can be wickedly fast. HOWEVER...it is also dangerous because if you lose one drive you potentially have lost the entire volume. Make no mistake, hard drives will fail and they'll find the most irritating time to do it. If you think you would want to use a RAID0 and you cannot afford the downtime when a drive fails taking the whole volume with it then you might look at a RAID10. Ten is a RAID0 that is mirrored against another RAID0. This provides some fault tolerance against a failed drive.

If the workload is mostly just reading, say you host images or videos, then you could do multiple drives in the system or the "Big Dog" of adding spindles for read-only workloads is something like our CDNLayer product where a global network of caches stores the image files and people get them from their nearby cache. This takes quite a bit of the I/O load off your system and also (subject to network routing goofiness) your visitors could be getting your content from a box that is hundreds or thousands of miles closer to them.

So, with apologies to my math teacher friend, there are times when 250 + 250 > 500 can be true. Until next time, have a good day and respect your spindles!

Categories: 
Keywords:
Categories:

Comments

March 12th, 2009 at 4:05pm

I'm surprised you didn't plug softlayer's storagelayer iscsi service. It's a good way to get a bunch of spindles/tps without having to install a bunch of disks in your local machine.

Not to mention private network + portable ips + iscsi + xen server with live migration == really sweet.

March 18th, 2009 at 5:43pm

Just a quick note - you've called it RAID 10, but described RAID 0+1, RAID 10 is N RAID1 mirrored pairs, with RAID0 striping across them.

There are a few advantages of RAID 10 over RAID 0+1 even though they are similar - namely, proper RAID10 handles disk failures with less I/O impact overall, and can potentially handle 2 disk failures (if the right two disks fail), as opposed to RAID 0+1 where two simultaneous disk failures will certainly destroy the array.

Leave a Reply

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • You can enable syntax highlighting of source code with the following tags: <pre>, <blockcode>, <bash>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <php>, <python>, <ruby>. The supported tag styles are: <foo>, [foo].
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.