Posts Tagged 'Backup'

February 8, 2013

Data Center Power-Up: Installing a 2-Megawatt Generator

When I was a kid, my living room often served as a "job site" where I managed a fleet of construction vehicles. Scaled-down versions of cranes, dump trucks, bulldozers and tractor-trailers littered the floor, and I oversaw the construction (and subsequent destruction) of some pretty monumental projects. Fast-forward a few years (or decades), and not much has changed except that the "heavy machinery" has gotten a lot heavier, and I'm a lot less inclined to "destruct." As SoftLayer's vice president of facilities, part of my job is to coordinate the early logistics of our data center expansions, and as it turns out, that responsibility often involves overseeing some of the big rigs that my parents tripped over in my youth.

The video below documents the installation of a new Cummins two-megawatt diesel generator for a pod in our DAL05 data center. You see the crane prepare for the work by installing counter-balance weights, and work starts with the team placing a utility transformer on its pad outside our generator yard. A truck pulls up with the generator base in tow, and you watch the base get positioned and lowered into place. The base looks so large because it also serves as the generator's 4,000 gallon "belly" fuel tank. After the base is installed, the generator is trucked in, and it is delicately picked up, moved, lined up and lowered onto its base. The last step you see is the generator housing being installed over the generator to protect it from the elements. At this point, the actual "installation" is far from over — we need to hook everything up and test it — but those steps don't involve the nostalgia-inducing heavy machinery you probably came to this post to see:

When we talk about the "megawatt" capacity of a generator, we're talking about the bandwidth of power available for use when the generator is operating at full capacity. One megawatt is one million watts, so a two-megawatts generator could power 20,000 100-watt light bulbs at the same time. This power can be sustained for as long as the generator has fuel, and we have service level agreements to keep us at the front of the line to get more fuel when we need it. Here are a few other interesting use-cases that could be powered by a two-megawatt generator:

  • 1,000 Average Homes During Mild Weather
  • 400 Homes During Extreme Weather
  • 20 Fast Food Restaurants
  • 3 Large Retail Stores
  • 2.5 Grocery Stores
  • A SoftLayer Data Center Pod Full of Servers (Most Important Example!)

Every SoftLayer facility has an n+1 power architecture. If we need three generators to provide power for three data center pods in one location, we'll install four. This additional capacity allows us to balance the load on generators when they're in use, and we can take individual generators offline for maintenance without jeopardizing our ability to support the power load for all of the facility's data center pods.

Those of you who are in the fondly remember Tonka trucks and CAT crane toys are the true target audience for this post, but even if you weren't big into construction toys when you were growing up, you'll probably still appreciate the work we put into safeguarding our facilities from a power perspective. You don't often see the "outside the data center" work that goes into putting a new SoftLayer data center pod online, so I thought it'd give you a glimpse. Are there an topics from an operations or facilities perspectives that you also want to see?

-Robert

January 13, 2010

Always Have a Backup Plan...

Everyone always says it’s a good idea to have a backup plan just in case your primary plan bites the dust. I couldn’t agree more. Recently my personal Xbox 360 failed and this has caused plenty of grief in my household. I used my Xbox to stream content from Windows Media Player on my desktop to the TV (via Media Center edition of Windows XP). This has worked great and has been able to provide me with a means to entertain my child. Of course, this going out has caused a screaming baby because now she can’t watch her “movies”.

Now, had I had a proper backup plan, this wouldn’t be an issue. See, I put all of my trust into a single device and/or single method to accomplish something. When this device failed, my operation came to a halt. I didn’t listen to the advice I’m always telling our customers… have a backup or backup plan. This is where our “extra services” come into play. Not only do we offer backup solutions (eVault, NAS…) but we also offer solutions that allow you access to high-availability configurations (Citrix XenServer, for example). With XenServer you can configure a cluster of systems and setup automatic failover. This would prevent any major outages of your website/services. If this isn’t something you think would work for you, utilizing eVault backups might. We now offer eVault Bare Metal Restore. Now, the problem is somehow applying these to my Xbox so my kiddo can go back to watching her movies... Long story short, don’t rely on a single solution. Always have a backup plan or system in place to prevent headaches in the future. You won’t regret it if you do.

Categories: 
December 21, 2009

Why Redundancy is Important!

The other day everything was going so well – I woke up in a fantastic mood, ate a great lunch, accomplished all of my work for the day, and left on time! I was thrilled that my day went so well, then a catastrophic failure occurred. I walked out to my car grabbed my keychain from my pocket and found that my car key was gone. My only car key! This scenario can happen not just with car keys, but can happen with your server and data as well. Redundancy can give you peace of mind and save you from an expensive mistake like I made with my car. The worst part is that I could have easily prevented this scenario by just having a backup key. Some good practices to provide redundancy are listed below:

1.Redundant backups – If you have a backup to a local disk it is good to have an offsite backup or a storage solution backup so that way you have a redundant way to recover in the event something catastrophic occurs.

2.Redundant DNS – You can run your own DNS server and use our secondary DNS service or even setup another DNS server for failover.

3.Raid Arrays – Having a raid array such as raid 1, raid 5 or raid 10 gives you an extra added level of protection with your drive’s data (this is no substitution for backups just added protection).

4.Failover – An example of this being a production server. If the server fails there is another server setup and ready to take its place. This can be done manually or with our services such as our hardware load balancer or Netscaler solutions.

5.Contact information – In the event you are unable to be reached it is a good idea to have someone else available to make contact with us to address support issues etc.

By following some of these practices you can avoid encountering issues that are generally avoidable and that would cost you a lot of downtime and headaches. I know I have learned from my key mistake!

December 7, 2009

Availability with NetScaler VPX and Global Load Balancing

The concept Single Point of Failure refers to the fact that somewhere between your clients and your servers there is a single point that if it fails downtime happens. The SPoF can be the server, the network, or the power grid. The dragon Single Point of Failure is always going to be there stalking you; the idea is to push SPoF far enough out to where you have done the best you can with your ability and budget.

At the server level you could combat SPoF by using redundant power supplies and disks. You can also have redundant servers fronted by a load balancer. One of the benefits when using load balancer technology is that the traffic for an application is spread between multiple app servers. You have the ability to take an app server out of rotation for upgrades and maintenance. When you’re done you bring the server back online, the load balancer notices it UP on the next check and the server is back in service.

Using a NetScaler VPX you can even have two groups of servers—one group which generally answer your queries and another group which usually does something else—with the second group functioning as a backup against all of the primary servers for a service having to be taken down through the Backup Virtual Server function.

Result: no Single Point of Failure for the app servers.

What happens if you are load balancing and have to take the load balancer out of service for upgrades or maintenance? Right, now we’ve moved SPoF up a level. One way to handle this is by using the NetScaler VPX product we have at SoftLayer. A pair of VPX instances (NodeA/NodeB) can be teamed in a failover cluster so that if the primary VPX is taken down (either by human action or because the hardware failed) the secondary VPX will begin answering for the IPs within a few seconds and processing the actions. When you bring NodeA back online it slips into the role of secondary until such time as NodeB fails or is taken down. I will note here that VPX instances do have dependency on certain network resources and that dependency can take both VPX instances down.

Result: Loss of a single VPX is not a Single Point of Failure.

So what’s next? A wide-ranging power failure or general network failure of either the frontend or the backend network could render both of the NetScalers in a city unusable or even the entire facility unusable. This can be worked around by having resources in two cities which are able to process queries for your users and by using the Global Load Balancer product we offer. GLB load balances between the cities using DNS results. A power failure taking down Seattle just means your queries go to Dallas instead. Why not skip the VPX layer and just GLB to the app servers? You could, if you don’t have a need for the other functionalities from the VPX.

Result: no single point of failure at the datacenter level

Having redundant functionality between cities takes planning, it takes work, and it takes funding. You have to consider synchronization of content. The web content is easy. Run something like an rsync from time to time. Synching the database content between machines or across cities is a bit more complicated. I’ve seen some customers use the built-in replication capabilities of their database software while others will do a home-grown process such as having their application servers write to multiple database servers. You also have to consider issues of state for your application. Can your application handle bouncing between cities?

Redundancy planning is not always fun but it is required for serious businesses, even if the answer is ultimately to not do any redundancy. People, hardware and processes will fail. Whether a failure event is a nightmare or just an annoyance depends on your preparation.

November 16, 2009

How Many Recovery Plans Do We Need?

Several of our bloggers have written about backups in The InnerLayer. This morning, I had an experience that makes me wonder how many recovery plans we need.

I walked out of the house to the driveway and saw that my left rear tire was flat. An enormous nail had punctured my tire right in the middle of the tread, and the slow leak deflated the tire overnight. To recover from this disaster, I needed to get my vehicle drivable and get to the Discount Tire location near my house so that they could fix the flat. Below is a log of how the recovery plans worked out.

Recovery Plan #1: Call roadside assistance. While waiting on them to change my tire, logon from home and get some work done before going to Discount Tire. I have leased four different brands of vehicles over the past 10 years, and roadside assistance was always included with the lease. So I call the 800 number and they tell me I don’t have roadside assistance. (Note to self: read the fine print on the next lease.) Result: FAIL

Recovery Plan #2: Inflate tire with can of Fix-a-Flat. I retrieved the can from my garage, followed the instructions, and when I depressed the button to fill the tire, the can was defective and the contents spewed from the top of the can rather than filling the tire. Result: FAIL

Recovery Plan #3: Use foot operated bicycle pump to inflate tire and drive to Discount Tire. I have actually done this successfully before with slow leaks like this one. It is third in priority because it is harder and more tiring than the first two options. So I go to my garage and look at where the pump is stored. It isn’t there. I scour the garage to find it. It is gone. Result: FAIL

Recovery Plan #4: Change out of office clothes into junky clothes, drag out the jack and spare and change the tire myself. This is number four in priority because it is the biggest hassle. I will spare you all the slapstick comedy of a finance guy jacking up a vehicle and changing the tire (finding the special key for the locking lug nuts was an interesting sub-plot to the whole story), so I’ll summarize and say RESULT: Success!

As a side note, I must give props to Discount Tire. Having bought tires there before, I was in their database as a customer and they fixed the flat and installed it on my vehicle for no charge. I recommend them!

All this got me to thinking about not only having backups, but having redundant recovery plans. Sure, you’ve got a recent copy of all your data – that’s great! Now, what’s your plan for restoring that data? If you have an experience like my flat tire recovery this morning, it might be a good idea to think through several ways to recover and restore the data. Our EVault offering will certainly be one good strategy.

November 6, 2009

Think Large, Think Global!

As an executive at Softlayer, one of the things that I am amazed by is the number of unique and extremely innovative ideas that we see on a daily basis from our customers. We love the fact that these groups understand the value of what we do, while focusing their energy on their core competencies. It’s the perfect relationship for us and one that we try to cultivate and grow continuously.

One of the challenges that we face is sharing information related to the entire breadth of our service offerings in a simple and useful way. Our business model is such that the cycle from first contact to purchase decision tends to be short. Most customers typically come in with a specified set of required services. We often hear comments like “we didn’t know you offered that as well” from customers that come to us with a shopping list and take advantage of the self-service capabilities that we offer. Global load balancing, CDN, and Data Center to Data Center back-up are all examples of products that we have heard get overlooked. It’s a tough balance between over selling and allowing a tech savvy customer work his way through the waters (so to speak).

One of the other challenges that we face here is overcoming the “we don’t need that” syndrome. I look at it practically and associate it with insurance and how it’s never needed, until something occurs that it makes it a must have. In tech terms, I recently read an article on CNNMoney.com “The Tech Catastrophe you’re ignoring” that typifies this “we don’t need that syndrome”. The article encompasses the idea of back-ups for your data. There is discussion that the business of dead drive recovery globally is up staggering rates and it’s due to the lack of people backing up data on a continuous basis. We hear this loud and clear at SoftLayer when a customer would accidentally lose data that they wish they would have spent the extra few dollars a month to back up. It seems trivial post incident, but pre incident it’s one of those decisions that gets passed on quite frequently.

As mentioned, the uniqueness and innovation that lives in SoftLayer’s service offering is tremendous. As our CEO hammers home the message of think large and think global to us every day, I want to pass that message onto our customers. What you do is driving industry, innovation and all that comes along with it. We hope that the decision making process for you as a customer is driven by thinking large and thinking globally and that you take advantage of the solutions that we offer to make your work more functional, more secure, more robust, and more effective. I can’t imagine telling my boss that ‘we didn’t need that’ if it was something that we did need and it was right in front of me. I am sure many of you share that sentiment!

October 19, 2009

I have backups…Don’t I?

There is some confusion out there on what’s a good way to back up your data. In this article we will go over several options for good ways to backup and sore your backups along with a few ways that are not recommended.

There is some confusion out there on what’s a good way to back up your data. In this article we will go over several options for good ways to backup and sore your backups along with a few ways that are not recommended.

When it comes to backups storing them off site (off your server or on a secondary drive not running your system) is the best solution with storing them off site being the recommended course.

When raids come into consideration just because the drives are redundant (a lave mirror situation) there are several situations, which can cause a complete raid failure such as the raid controller failing, the array developing a bad stripe. Drive failure on more than one drive(this does happen though rarely) , out of date firmware on the drives and the raid card causing errors. Using a network storage device like our evault or a nas storege is also an excellent way to store backups off system. The last thing to consider is keeping your backups up to date. I suggest making a new back every week at minimum (if you have very active sites or data bases I would recommend a every other day backup or daily backup). It is up to you or your server administrator to keep up with your backups and make sure they are kept up to date. If you have a hardware failure and your backups are well out of date it’s almost like not having them at all.

In closing consider the service you provide and how your data is safe, secure, and recoverable. These things I key to running a successful server and website.

August 18, 2009

Backups Are Not the Whole Story

Last night while making my regular backup for my World of Warcraft configuration, I thought about the blog and I didn't remember seeing an article that went into more detail than "backups are good" about backing up and restoring data.

If you've been around the InnerLayer for a while you will have noticed that backing up of data comes up periodically.  This happens because we frequently see customers whose world is turned upside down due to a mistyped command wiping out their data.  If you just thought "that won't happen to me... I'm careful at a prompt"... well, how about a cracker getting in via an IIS zero day exploit?  Kernel bug corrupting the filesystem?  Hard drive failure?  Data loss will happen to you, whatever the cause.

Data that is not backed up is data that isn't viewed as important by the server administrator.  As the title of this blog mentioned, backing up isn't the end of the server administrator's responsibility.  Consider the following points.

  • Is the backup in a safe location?  Backing up to the same hard drive which houses the live data is not a good practice.
  • Is the backup valid?  Did the commands to create it all run properly?  Did they get all the information you need?  Do you have enough copies?
  • Can your backup restore a single file or directory?  Do you know how to restore it?  Simply put, a restore is getting data from a backup back into a working state on a system.

Backup Safety
At a minimum backups should be stored on a separate hard drive from the data which the backup is protecting.  Better would be a local copy of the backup on the machine in use and having a copy of the backup off the machine, perhaps in eVault, on a NAS which is _NOT_ always mounted, even on another server.  Why?  The local backup gives you quick access to the content while the off-machine copies give you the safety that if one of your employees does a secure wipe on the machine in question you haven't lost the data and the backup.

Validity
A backup is valid if it gets all the data you need to bring your workload back online in the event of a failure.  This could be web pages, database data, config files (frequently forgotten) and notes on how things work together.  Information systems get complicated and if you've got a Notepad file somewhere listing how Tab A goes into Slot B, that should be in your backups.  Yes, you know how it works... great, you get hit by a bus, does your co-admin know how that system is put together?  Don't forget dependencies.  A forum website is pretty worthless if it is backed up but the database to which it looks is not.  For me another mark of a valid backup is one which has some history.  Do not backup today and delete yesterday.  Leave a week or more of backups available.  People don't always notice immediately that something has broken.

Restores
A good way to test a restore is get a 2nd server for a month configured the same as your primary then take the backup from the primary and restore it onto the secondary.  See what happens.  Maybe it will go great.  Probably you will run into issues.  Forget about a small operating system tweak made some morning at 4am?  How about time?  How long does it take to go from a clean OS install to a working system?  If this time is too long, you might have too much going on one server and need to split up your workload among a few servers.  As with everything else in maintaining a server, practicing your restores is not a one-time thing.  Schedule yourself a couple of days once a quarter to do a disaster simulation.

For those who might be looking at this and saying "That is a lot of work".  Yes, it is.  It is part of running a server.  I do this myself on a regular basis for a small server hosting e-mail and web data for some friends.  I have a local "configbackup" directory on the server which has the mail configs, the server configs, the nameserver configs and the database data.  In my case, I've told my users straight up that their home directories are their own responsibility.  Maybe you can do that, maybe not.  Weekly that configback data is copied to a file server here at my apartment.  The fileserver itself is backed up periodically to USB drive which is kept at a friend's house.

Categories: 
August 10, 2009

Backups: It’s Good to Have Them!

Man, was this a weekend for me and backups! The first one I needed was for my second advance free fall course at skydive101.com on Friday (My Saturday). I jumped out of the plane a little early and one of my two instructors was not ready and did not jump on my call. Thank god I had a backup. Once my shoot deploys, being this is only my second jump out of a perfectly good airplane, I look up to check my slider and make sure my parachute is deployed and everything is correct I did not see my brake handles, I said some curse words I won’t put in this blog, and then was like it is OK I have a backup. I was still at 4000 ft at this time, still having another 1500 feet to decide if I want to release my main and pull the backup, luckily I found my brake handles pulled down twice and they worked and everything was OK.

So I get home after a day of skydiving and having fun in the sun (it is rather rare in Seattle, though not this summer) and I notice my trusty old Windows XP terminal has multiple errors on it. I do the first thing I always do with my personal Microsoft machines and reboot it. Ouch no operating system found, bad hard drive! Thank god I ghosted that machine 2 weeks ago as I figured the old IDE drives had been on their last legs. Saturday morning comes around, which is a big day for me, as I am hosting a party that night at a local night club. I notice I have a few (8900) email messages on my blackberry, so I decided it is about time to delete some. I tell it to delete, look back at it and 4 minutes later it says APP ERR. PLEASE RESTART. Needless to say, you guessed it. It wouldn’t boot back up and I had to force an OS onto it, and restore from my backup of a month ago, which reminded me that I need to start backup my blackberry more often.

So the moral of the story is, it is always good to have a backup, and we have plenty of backup options for you, so if you don’t have one, I would suggest contacting SLales. I would also suggest everyone trying to skydive at least once in their lifetime.

Categories: 
February 1, 2008

I Outsourced It

Have you ever wanted to tell your CIO that? His response might be, "you outsourced what?? You respond, "it!" With a perplexed look he asks again, "You outsourced what, it?" Again you respond with, "All of it." His reaction at that point could go either way. In most CIO type heads today, they can't grasp the savings associated with outsourcing and even the ones that DO understand would then have to go to the CEOs office and inform him or her that all of the company's valuable data will now be housed in a safe and secure facility off-site on dedicated servers... or "Hosted IT" even. Stop reading and go tell your CEO that right now. I'll wait...go ahead.

Ok, I see that you are back, are you still employed? We are hiring if you need a new job resumes@softlayer.com

Ok, really, how do you think that conversation would go? I have had that same conversation with ex-bosses and owners of small and medium sized businesses in the past and most of the time they don't go very well. Granted they were a few years ago so hopefully times are changing.

I have been told a few times, "no, I don't want to pay $300 per month for a server we don't own and put my data on it! That is ridiculous, just go buy me a new $3,500 server and we will put it in our local Datacenter, Server Room, Broom Closet, Bathroom, Office Managers office..." well you get my drift.

"But Sir, with this outsourced server we could easily have off-site backups, more processing power, some cool redundancy and it will not annoy everyone in the office with the loud fans and heat generation. And when we have a power outage in the office and everyone goes home for the day, they will be able to work from home because the server will still be online. Oh yeah, and our company website and email will still be functioning as well."

"Are you insane? Those challenges are so easy to overcome. We will simply add a small air conditioner to the broom closet and buy a big UPS system that will keep the server alive in the event of another power outage, and we can hire a service to come by every morning to pick up tapes and deliver them to an offsite bunker. Instead of a single connection to the internet we can buy two and have redundant connections also."

"Sir, I am no accountant, but by the time you pay someone to keep up with the depreciation of a new server, buy and install a small A/C unit and UPS unit, pay for a 2nd internet connection that will sit idle and pay a service to DRIVE here daily I really think the outsourced server would be cheaper. Not to mention in the event of data loss we could get the data restored to the server much quicker than waiting on a service to physically bring it to us." An interesting note here is, I don't care what kind of offsite data bunker you have, the Monster in Cloverfield IS going to destroy it so think multiple copies of data in multiple cities!

"Well I have made my decision; we will not be outsourcing my very valuable data - Hackers might get it, it is more secure here, so leave my office. Before you go could you please try to get my printer working again, and I am getting this annoying pop-up about spyware and it seems that my ITunes files have lost their license and I used to have a folder called Docs on my desktop with everyone's salary in it that is missing and my PDA will not sync...(zzzzzzz) -- OUTSOURCE IT!

-Skinman

Subscribe to backup