Spares at the ReadyPosted by Sam Fleitman in Executive Blog, Infrastructure
In Steve’s last post he talked about the logic of outsourcing. The rationale included the cost of redundant internet connections, the cost of the server, UPS, small AC, etc. He covers a lot of good reasons to get the server out of the broom closet and into a real datacenter. However, I would like to add one more often over looked component to that argument: the Spares Kit.
Let’s say that you do purchase your own server and you set it up in the broom closet (or a real datacenter for that matter) and you get the necessary power, cooling and internet connectivity for it. What about spare parts?
If you lose a hard drive on that server, do you have a spare one available for replacement? Maybe so – that’s a common part with mechanical features that is liable to fail – so you might have that covered. Not only do you have a spare drive, the server is configured with some level of RAID so you’re probably well covered there.
What if that RAID card fails? It happens – and it happens with all different brands of cards.
What about RAM? Do you keep a spare RAM DIMM handy or if you see failures on one stick, do you just plan to remove it and run with less RAM until you can get more on site? The application might run slower because it’s memory starved or because now your memory is not interleaved – but that might be a risk you are willing to take.
How about a power supply? Do you keep an extra one of those handy? Maybe you keep a spare. Or, you have dual power supplies. Are those power supplies plugged into separate power strips on separate circuits backed up by separate UPSs?
What if the NIC on the motherboard gets flaky or goes out completely? Do you keep a spare motherboard handy?
If you rely on out of band management of your server via an IPMI, Lights Out or DRAC card – what happens if that card goes bad while you’re on vacation?
Even if you have all necessary spare parts for your server or you have multiple servers in a load balanced configuration inside the broom closet; what happens if you lose your switch or your load balancer or your router or your… What happens if that little AC you purchased shuts down on Friday night and the broom closet heats up all weekend until the server overheats? Do you have temperature sensors in the closet that are configured to send you an alert – so that now you have to drive back to the office to empty the water pail of the spot cooler?
You might think that some of these scenarios are a bit far fetched but I can certainly assure you that they’re not. At SoftLayer, we have spares of everything. We maintain hundreds of servers in inventory at all times, we maintain a completely stocked inventory room full of critical components, and we staff it all 24/7 and back it all up with a 4 hour SLA.
Some people do have all of their bases covered. Some people are willing to take a chance, and even if you convince your employer that it’s ok to take those chances, how do you think the boss will respond when something actually happens and critical services are offline?