Author Archive: William Francis

April 13, 2008

You Don’t Know What You Don’t Know

Around the office I am commonly considered a "low-level" software engineer. If you are in the business of computer programming you know this means I generally have various pieces of computer hardware strewn about my work area, and an ASCII chart hanging on my wall complete with a cheat-table so I can quickly convert numbers between binary, decimal, and hex. If you are not in the business of developing software, think of me as guy who couldn’t decide if I wanted to be an electrical engineer or a computer programmer and thus through my own indecision eventually found myself stuck somewhere in between. I know a bit about both but am not an expert in either. (I think the Roman word for this sort of limbo is purgatory, but I find it pretty cozy most days.)

At any rate when a project comes along that walks the fence between the realms of hardware and software my name naturally comes up. Such was the case a few weeks ago when one of our systems administrators had the need to retrieve the serial number from the RAM chips already installed in a number of servers. He asked me if it could be done. I looked and saw the information was reported in the BIOS of one of my machines, so I promptly responded with a “you bet”. After all, if the BIOS can display the information on the screen I should be able to as well. Right? I told him it would take a week.

The problem in this career field I have worked for some ten years now is you don’t know what you don’t know. Fast forward two weeks. Now think the Friday before Easter. That’s right, the one I am supposed to be off lounging around the house in my pajamas. It took a little longer to pull that serial number than I expected. If you’re interested the slow down turned out to be that the information existed at a physical memory address that was not easily accessible from Microsoft Windows (luckily for the BIOS it gets to display the data before an operating system is loaded).

Remember the old Chevy Chase movie "Funny Farm"? Chevy’s character is driving around lost when he passes the old man sitting on his porch in a rocking chair. Chevy stops his vehicle, rolls down his window, and says: “Excuse me Sir. Can you tell me how you would get to Redbud?” The old man leans forward, spits, and replies: “If I were going to Redbud I sure as hell wouldn’t start from here.”

Like Mr. Chase’s character in the movie, I didn’t get to pick where I started the journey from. We need the data available to us after the operating system boots. So I am hacking my way through it. I’m nearly there now. Close enough at least that I felt comfortable taking a break from the code and blowing off some steam by writing this blog. And the truth is, while I might have been whining just a bit I actually have enjoyed this project immensely. I appreciate the fact that the management here at SoftLayer gives us the opportunity to challenge ourselves and then grow to meet those challenges. We are encouraged to “get our hands dirty”. When I finish up here I will have a deeper understanding of how the BIOS relates to the operating system (and through the BIOS indirectly to the hardware).

As for our customers, well, it just so happens once I got to digging around in the binary mud there was a whole lot of other useful insight buried in the swirls of all those zeros and ones. Instead of extracting just the serial numbers I am pulling about a dozen pages of hardware data points we can use in statistical analysis for predicting failures, standards compliance, and availability trends. Like I said, you don’t know what you don’t know. But sometimes you are pleasantly surprised once you find out. By promoting such an amiable work environment, fostering creativity, and encouraging innovation, SoftLayer continues to boldly go where no other hosting company has gone before.

Alright, time to climb down from the pulpit and finish up my software.

Thanks for listening!

-William

Categories: 
January 30, 2008

That's SMART

My grandmother used to say an ounce of prevention is worth a pound of cure. Usually this was her polite way of telling me to pick my skateboard up off the stairs before she stepped on it and broke her neck or to put a sheet of newspaper over her antique kitchen table before I began refueling my model airplane. All very sound advice looking back. And now here I find myself repeating the same adage some twenty years later in the context of predicting mechanical drive failure. An ounce of prevention is worth a pound of cure.

Hard disk drive manufacturers recognized both the reality and the advantages of being able to predict normal hard disk failures associated with drive degradation sometime around 2003. This led a number of leading hard disk makers to collaborate on a standard which eventually became known as SMART. This acronym stands for Self-Monitoring, Analysis and Reporting Technology and when used properly is a formidable weapon in any system administrator's arsenal.

The basic concept is that firmware on the hard disk itself will record and report key "attributes" of that drive which when monitored and analyzed over time can be used to predict and avoid catastrophic hard disk failures. Anyone who has been around computers for more than a day knows the terrible feeling that manifests in the pit of your stomach when it becomes apparent that your server or workstation will not boot because the hard disk has cratered. Luckily, we ALL of course back up our hard drives daily! Right?

All kidding aside even with a recent back up just the task of restoring and getting your system back in working order is a serious hassle and it’s not something you get the luxury of scheduling if the machine is critical to operations and failed in the middle of your work day or worse yet, the middle of your beauty sleep. That is where SMART comes in. When properly used SMART data can give “clues” that a drive is reaching a failure point--prior to it failing. This in turns means you can schedule a drive cloning and replacement within your next regular maintenance window. Really aside from a hard disk that lasts forever what more could an administrator ask for?

SMART drive data has been described as a jigsaw puzzle. That's because it takes monitoring a myriad of data points consistently over time to be able to put together a picture of your hard disk health. The idea is that an administrator regularly records and analyzes characteristics about the installed spinning media and looks for early warning signs that something is going wrong. While different drives have different data points, some of the key and most common attributes are:

  • head flying height
  • data throughput performance
  • spin-up time
  • re-allocated sector count
  • seek error rate
  • seek time performance
  • spin try recount
  • drive calibration retry count

These items are considered typical drive health indicators and should be base-lined at drive installation and then monitored for significant degradation. While the experts still disagree on the exact value of SMART data analysis, I have seen sources that claim at least 30% of drive failures can be detected some 60 days prior to the actual failure through the monitoring of SMART data.

Of course not all drive failures can be predicted. Plus some failures are caused by factors other than drive degradation. Consider drives damaged by power surges or drives that are dropped in shipping as good examples of drive failures that cannot normally be detected through SMART monitoring. However in my humble opinion even one hard disk failure prevented over the course of my career is something to celebrate--unless you happen to own stock in McNeil Consumer Healthcare, a.k.a. the distributors of Tylenol!

So what does this have to do with SoftLayer? Well I am certainly not claiming that SoftLayer is going to predict all your hard drive disasters so there is no reason for you to back up your data. In fact, I recommend not just backing it up but backing it up in geographically disparate locations (did I mention we have data centers in Dallas and Seattle?). What I do mean to share is that technologies like SMART data are just one of the many ways SoftLayer is currently investigating to improve what is already the best hosting company in the business.

I should know. I was tasked with writing the low-level software to extract this data. That’s right. SoftLayer has engineers working at the application layer, down at the device driver layer, and everywhere in between. If that doesn’t give you a warm fuzzy about your hosting company, I don’t know what will.

-William

January 14, 2008

Growth is a Good Thing. No Really.

The high-pitched whine of a drill sends a shiver down my spine. I jump a little in my seat at a loud bang followed by shuffling feet and mumbled voices. I involuntarily cower at the unmistakable sound of a saw blade spinning—gaining momentum—biting. Nope, I'm not sitting in a theater watching Eli Roth's next installment in the Hostel franchise. In fact, I'm at the office.

That's right. I'm sitting at my desk. Sitting at my desk and trying hard to ignore the plethora of singing power tools and crooning contractors who for the last two months have been busy putting up dry wall, wiring electrical outlets, installing locks, and occasionally setting off the fire alarm. It's the sound of growth. And at the risk of conjuring up images of bad 80's haircuts, guys in jeans way-too-tight, and shirts where the collars just wouldn't seem to stay down-- one might dare refer to the ruckus as "growing pains".

Make no mistake about it, growing is painful. Take it from me. I think I was 19 before I managed to grow enough facial hair to require the use of a razor. Combine that tidbit of info with the fact that I had every 8-bit computer known to man proudly on display in my room right next to my impressive collection of latex Hollywood style monster masks and you'll start to get the picture. Growing requires a lot of work and allows almost no planning as humans have a habit of blossoming in their own sweet time. Companies are no different.

So while management did everything possible to make the required building expansion as unobtrusive as possible, well, it's still construction work within earshot of a whole team of developers, technicians, and engineers. That's just the way it is. And while I may complain about the noise and distractions now and again, there is also something very comforting about knowing that I am working at a place that is growing. Growing phenomenally, in a time when not all technology companies are fairing so well.

When the dust settles there will be a lot of new space.

More space means a lot of new hires. More space means more opportunity for existing employees. And yes, more space means more work for everyone involved. Having worked for three failed ventures in as many years, I can tell you I am more than happy to be putting my time and effort and energies into something that is successful; something that continues to be more successful every day. It feels good to be on the winning team for a change. Hearing what some of the other engineers here are saying I don't think I'm alone in that sentiment.

That's not to say I'll miss the noise when the construction is all said and done. Which in case you are interested sounds to be winding down. As for SoftLayer, well something tells me we are just getting started.

-William

Categories: 
Subscribe to Author Archive: %