Scaling SoftLayer

April 9, 2012

SoftLayer is in the business of helping businesses scale. You need 1,000 cloud computing instances? We'll make sure our system can get them online in 10 minutes. You need to spin up some beefy dedicated servers loaded with dual 8-core Intel Xeon E5-2670 processors and high-capacity SSDs for a new application's I/O-intensive database? We'll get it online anywhere in the world in under four hours. Everywhere you look, you'll see examples of how we help our customers scale, but what you don't hear much about is how our operations team scales our infrastructure to ensure we can accommodate all of our customers' growth.

When we launch a new data center, there's usually a lot of fanfare. When AMS01 and SNG01 came online, we talked about the thousands of servers that are online and ready. We meet huge demand for servers on a daily basis, and that presents us with a challenge: What happens when the inventory of available servers starts dwindling?

Truck Day.

Truck Day not limited to a single day of the year (or even a single day in a given month) ... It's what we call any date our operations team sets for delivery and installation of new hardware. We communicate to all of our teams about the next Truck Day in each location so SLayers from every department can join the operations team in unboxing and preparing servers/racks for installation. The operations team gets more hands to speed up the unloading process, and every employee has an opportunity to get first-hand experience in how our data centers operate.

If you want a refresher course about what happens on a Truck Day, you can reference Sam Fleitman's "Truck Day Operations" blog, and if you want a peek into what it looks like, you can watch Truck Day at SR02.DAL05. I don't mean to make this post all about Truck Day, but Truck Day is instrumental in demonstrating the way SoftLayer scales our own infrastructure.

Let's say we install 1,000 servers to officially launch a new pod. Because each pod has slots for 5,000 servers, we have space/capacity for 3,000-4,000 more servers in the server room, so as soon as more server hardware becomes available, we'll order it and start preparing for our next Truck Day to supplement the pod's inventory. You'd be surprised how quickly 1,000 servers can be ordered, and because it's not very easy to overnight a pallet of servers, we have to take into account lead time and shipping speeds ... To accommodate our customers' growth, we have to stay one step ahead in our own growth.

This morning in a meeting, I saw a pretty phenomenal bullet that got me thinking about this topic:

Truck Day — 4/3 (All Sites): 2,673 Servers

In nine different data center facilities around the world, more than 2,500 servers were delivered, unboxed, racked and brought online. Last week. In one day.

Now I know the operations team wasn't looking for any kind of recognition ... They were just reporting that everything went as planned. Given the fact that an accomplishment like that is "just another day at SoftLayer" for those guys, they definitely deserve recognition for the amazing work they do. We host some of the most popular platforms, games and applications on the Internet, and the DC-Ops team plays a huge role in scaling SoftLayer so our customers can scale themselves.

-@gkdog