Posts Tagged 'Operations'

November 1, 2013

Paving the Way for the DevOps Revolution

The traditional approach to software development has been very linear: Your development team codes a release and sends it over to a team of quality engineers to be tested. When everything looks good, the code gets passed over to IT operations to be released into production. Each of these teams operates within its own silo and makes changes independent of the other groups, and at any point in the process, it's possible a release can get kicked back to the starting line. With the meteoric rise of agile development — a development philosophy geared toward iterative and incremental code releases — that old waterfall-type development approach is being abandoned in favor of a DevOps approach.

DevOps — a fully integrated development and operations approach — streamlines the software development process in an agile development environment by consolidating development, testing and release responsibilities into one cohesive team. This way, ideas, features and other developments can be released very quickly and iteratively to respond to changing and growing market needs, avoiding the delays of long, drawn-out and timed dev releases.

To help you visualize the difference between the traditional approach and the DevOps approach, take a look at these two pictures:

Traditional Waterfall Development
SoftLayer DevOps Blog

DevOps
SoftLayer DevOps Blog

Unfortunately, many businesses struggle to adopt the DevOps approach because they simply update their org chart by merging their traditional teams, but their development philosophy doesn't change at the same time. As a result, I've encountered a lot of companies who have been jaded by previous attempts to move to a DevOps model, and I'm not alone. There is a significant need in the marketplace for some good old fashioned DevOps expertise.

A couple months ago, my friend Raj Bhargava pinged me with a phenomenal idea to put on a DevOps "un-conference" in Boulder, Colorado, to address the obvious need he's observed for DevOps education and best practices. Raj is a serial, multiple-exit entrepreneur from Boulder, and he is the co-founder and CEO of a DevOps-focused startup there called JumpCloud. When he asked if I would like to co-chair the event and have SoftLayer as a headline sponsor alongside JumpCloud, the answer was a quick and easy "Yes!"

Sure, there have been other DevOps-related conferences around the world, but ours was designed to be different from the outset. As strange as it may sound, half of the conference intentionally occurred outside of the conference: One of our highest priorities was to strike up conversations between the participants before, during and after the event. If we're putting on a conference to encourage a collaborative development approach, it would be counterproductive for us to use a top-down, linear approach to engaging the attendees, right?

I'm happy to report that this inaugural attempt of our untested concept was an amazing success. We kept the event private for our first run at the concept, but the event was bursting at the seams with brilliant developers and tech influencers. Brad Feld and our friends from the Foundry Group invited all of their portfolio CEO's and CTO's. David Cohen, co-founder of Techstars and head honcho at Bullet Time Ventures did the same. JumpCloud and SoftLayer helped round out the attendee list with a few of our most innovative partners as well as a few of technologists from within our own organizations. It was an incredible mix of super-smart tech pros, business leaders and VC's from all over the world.

With such a diverse group of attendees, the conversations at the event were engaging, energizing and profound. We discussed everything from how startups should incorporate automation into their business plans at the outset to how the practice of DevOps evolves as companies scale quickly. At the end of the day, we brought all of those theoretical discussions back down to the ground by sharing case studies of real companies that have had unbelievable success in incorporating DevOps into their businesses. I had the honor of wrapping up the event as moderator of a panel with Jon Prall from Sendgrid, Scott Engstrom from Gnip and Richard Miller of Mocavo, and I couldn't have been happier with the response.

I'd like to send a big thanks to everyone who participated, especially our cosponsors — JumpCloud, VictorOps, Authentic8, DH Capital, SendGrid, Cooley, Pivot Desk, SVP and Pantheon.

I'm looking forward to opening this up to the world next year!

-@PaulFord

August 22, 2013

Network Cabling Controversy: Zip Ties v. Hook & Loop Ties

More than 210,000 users have watched a YouTube video of our data center operations team cabling a row of server racks in San Jose. More than 95 percent of the ratings left on the video are positive, and more than 160 comments have been posted in response. To some, those numbers probably seem unbelievable, but to anyone who has ever cabled a data center rack or dealt with a poorly cabled data center rack, the time-lapse video is enthralling, and it seems to have catalyzed a healthy debate: At least a dozen comments on the video question/criticize how we organize and secure the cables on each of our server racks. It's high time we addressed this "zip ties v. hook & loop (Velcro®)" cable bundling controversy.

The most widely recognized standards for network cabling have been published by the Telecommunications Industry Association and Electronics Industries Alliance (TIA/EIA). Unfortunately, those standards don't specify the physical method to secure cables, but it's generally understood that if you tie cables too tight, the cable's geometry will be affected, possibly deforming the copper, modifying the twisted pairs or otherwise physically causing performance degradation. This understanding begs the question of whether zip ties are inherently inferior to hook & loop ties for network cabling applications.

As you might have observed in the "Cabling a Data Center Rack" video, SoftLayer uses nylon zip ties when we bundle and secure the network cables on our data center server racks. The decision to use zip ties rather than hook & loop ties was made during SoftLayer's infancy. Our team had a vision for an automated data center that wouldn't require much server/cable movement after a rack is installed, and zip ties were much stronger and more "permanent" than hook & loop ties. Zip ties allow us to tighten our cable bundles easily so those bundles are more structurally solid (and prettier). In short, zip ties were better for SoftLayer data centers than hook & loop ties.

That conclusion is contrary to the prevailing opinion in the world of networking that zip ties are evil and that hook & loop ties are among only a few acceptable materials for "good" network cabling. We hear audible gasps from some network engineers when they see those little strips of nylon bundling our Ethernet cables. We know exactly what they're thinking: Zip ties negatively impact network performance because they're easily over-tightened, and cables in zip-tied bundles are more difficult to replace. After they pick their jaws up off the floor, we debunk those myths.

The first myth (that zip ties can negatively impact network performance) is entirely valid, but its significance is much greater in theory than it is in practice. While I couldn't track down any scientific experiments that demonstrate the maximum tension a cable tie can exert on a bundle of cables before the traffic through those cables is affected, I have a good amount of empirical evidence to fall back on from SoftLayer data centers. Since 2006, SoftLayer has installed more than 400,000 patch cables in data centers around the world (using zip ties), and we've *never* encountered a fault in a network cable that was the result of a zip tie being over-tightened ... And we're not shy about tightening those ties.

The fact that nylon zip ties are cheaper than most (all?) of the other more "acceptable" options is a fringe benefit. By securing our cable bundles tightly, we keep our server racks clean and uniform:

SoftLayer Cabling

The second myth (that cables in zip-tied bundles are more difficult to replace) is also somewhat flawed when it comes to SoftLayer's use case. Every rack is pre-wired to deliver five Ethernet cables — two public, two private and one out-of-band management — to each "rack U," which provides enough connections to support a full rack of 1U servers. If larger servers are installed in a rack, we won't need all of the network cables wired to the rack, but if those servers are ever replaced with smaller servers, we don't have to re-run network cabling. Network cables aren't exposed to the tension, pressure or environmental changes of being moved around (even when servers are moved), so external forces don't cause much wear. The most common physical "failures" of network cables are typically associated with RJ45 jack crimp issues, and those RJ45 ends are easily replaced.

Let's say a cable does need to be replaced, though. Servers in SoftLayer data centers have redundant public and private network connections, but in this theoretical example, we'll assume network traffic can only travel over one network connection and a data center technician has to physically replace the cable connecting the server to the network switch. With all of those zip ties around those cable bundles, how long do you think it would take to bring that connection back online? (Hint: That's kind of a trick question.) See for yourself:

The answer in practice is "less than one minute" ... The "trick" in that trick question is that the zip ties around the cable bundles are irrelevant when it comes to physically replacing a network connection. Data center technicians use temporary cables to make a direct server-to-switch connection, and they schedule an appropriate time to perform a permanent replacement (which actually involves removing and replacing zip ties). In the video above, we show a temporary cable being installed in about 45 seconds, and we also demonstrate the process of creating, installing and bundling a permanent network cable replacement. Even with all of those villainous zip ties, everything is done in less than 18 minutes.

Many of the comments on YouTube bemoan the idea of having to replace a single cable in one of these zip-tied bundles, but as you can see, the process isn't very laborious, and it doesn't vary significantly from the amount of time it would take to perform the same maintenance with a Velcro®-secured cable bundle.

Zip ties are inferior to hook & loop ties for network cabling? Myth(s): Busted.

-@khazard

P.S. Shout-out to Elijah Fleites at DAL05 for expertly replacing the network cable on an internal server for the purposes of this video!

December 17, 2012

Big Data at SoftLayer: The Importance of IOPS

The jet flow gates in the Hoover Dam can release up to 73,000 cubic feet — the equivalent of 546,040 gallons — of water per second at 120 miles per hour. Imagine replacing those jet flow gates with a single garden hose that pushes 25 gallons per minute (or 0.42 gallons per second). Things would get ugly pretty quickly. In the same way, a massive "big data" infrastructure can be crippled by insufficient IOPS.

IOPS — Input/Output Operations Per Second — measure computer storage in terms of the number of read and write operations it can perform in a second. IOPS are a primary concern for database environments where content is being written and queried constantly, and when we take those database environments to the extreme (big data), the importance of IOPS can't be overstated: If you aren't able perform database reads and writes quickly in a big data environment, it doesn't matter how many gigabytes, terabytes or petabytes you have in your database ... You won't be able to efficiently access, add to or modify your data set.

As we worked with 10gen to create, test and tweak SoftLayer's MongoDB engineered servers, our primary focus centered on performance. Since the performance of massively scalable databases is dictated by the read and write operations to that database's data set, we invested significant resources into maximizing the IOPS for each engineered server ... And that involved a lot more than just swapping hard drives out of servers until we found a configuration that worked best. Yes, "Disk I/O" — the amount of input/output operations a given disk can perform — plays a significant role in big data IOPS, but many other factors limit big data performance. How is performance impacted by network-attached storage? At what point will a given CPU become a bottleneck? How much RAM should included in a base configuration to accommodate the load we expect our users to put on each tier of server? Are there operating system changes that can optimize the performance of a platform like MongoDB?

The resulting engineered servers are a testament to the blood, sweat and tears that were shed in the name of creating a reliable, high-performance big data environment. And I can prove it.

Most shared virtual instances — the scalable infrastructure many users employ for big data — use network-attached storage for their platform's storage. When data has to be queried over a network connection (rather than from a local disk), you introduce latency and more "moving parts" that have to work together. Disk I/O might be amazing on the enterprise SAN where your data lives, but because that data is not stored on-server with your processor or memory resources, performance can sporadically go from "Amazing" to "I Hate My Life" depending on network traffic. When I've tested the IOPS for network-attached storage from a large competitor's virtual instances, I saw an average of around 400 IOPS per mount. It's difficult to say whether that's "not good enough" because every application will have different needs in terms of concurrent reads and writes, but it certainly could be better. We performed some internal testing of the IOPS for the hard drive configurations in our Medium and Large MongoDB engineered servers to give you an apples-to-apples comparison.

Before we get into the tests, here are the specs for the servers we're using:

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
CentOS 6 64-bit
36GB RAM
1Gb Network - Bonded
Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
CentOS 6 64-bit
128GB RAM
1Gb Network - Bonded
 

The numbers shown in the table below reflect the average number of IOPS we recorded with a 100% random read/write workload on each of these engineered servers. To measure these IOPS, we used a tool called fio with an 8k block size and iodepth at 128. Remembering that the virtual instance using network-attached storage was able to get 400 IOPS per mount, let's look at how our "base" configurations perform:

Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 2937
Random Write IOPS - /var/lib/mongo/logs 1306
Random Read IOPS - /var/lib/mongo/data 1720
Random Write IOPS - /var/lib/mongo/data 772
Random Read IOPS - /var/lib/mongo/data/journal 19659
Random Write IOPS - /var/lib/mongo/data/journal 8869
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 30269
Random Write IOPS - /var/lib/mongo/logs 13124
Random Read IOPS - /var/lib/mongo/data 33757
Random Write IOPS - /var/lib/mongo/data 14168
Random Read IOPS - /var/lib/mongo/data/journal 19644
Random Write IOPS - /var/lib/mongo/data/journal 8882
   
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 4820
Random Write IOPS - /var/lib/mongo/logs 2080
Random Read IOPS - /var/lib/mongo/data 2461
Random Write IOPS - /var/lib/mongo/data 1099
Random Read IOPS - /var/lib/mongo/data/journal 19639
Random Write IOPS - /var/lib/mongo/data/journal 8772
 
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32403
Random Write IOPS - /var/lib/mongo/logs 13928
Random Read IOPS - /var/lib/mongo/data 34536
Random Write IOPS - /var/lib/mongo/data 15412
Random Read IOPS - /var/lib/mongo/data/journal 19578
Random Write IOPS - /var/lib/mongo/data/journal 8835

Clearly, the 400 IOPS per mount results you'd see in SAN-based storage can't hold a candle to the performance of a physical disk, regardless of whether it's SAS or SSD. As you'd expect, the "Journal" reads and writes have roughly the same IOPS between all of the configurations because all four configurations use 2 x 64GB SSD drives in RAID1. In both configurations, SSD drives provide better Data mount read/write performance than the 15K SAS drives, and the results suggest that having more physical drives in a Data mount will provide higher average IOPS. To put that observation to the test, I maxed out the number of hard drives in both configurations (10 in the 2U MD server and 34 in the 4U LG server) and recorded the results:

Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 7175
Random Write IOPS - /var/lib/mongo/logs 3481
Random Read IOPS - /var/lib/mongo/data 6468
Random Write IOPS - /var/lib/mongo/data 1763
Random Read IOPS - /var/lib/mongo/data/journal 18383
Random Write IOPS - /var/lib/mongo/data/journal 8765
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32160
Random Write IOPS - /var/lib/mongo/logs 12181
Random Read IOPS - /var/lib/mongo/data 34642
Random Write IOPS - /var/lib/mongo/data 14545
Random Read IOPS - /var/lib/mongo/data/journal 19699
Random Write IOPS - /var/lib/mongo/data/journal 8764
   
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 17566
Random Write IOPS - /var/lib/mongo/logs 11918
Random Read IOPS - /var/lib/mongo/data 9978
Random Write IOPS - /var/lib/mongo/data 6526
Random Read IOPS - /var/lib/mongo/data/journal 18522
Random Write IOPS - /var/lib/mongo/data/journal 8722
 
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 34220
Random Write IOPS - /var/lib/mongo/logs 15388
Random Read IOPS - /var/lib/mongo/data 35998
Random Write IOPS - /var/lib/mongo/data 17120
Random Read IOPS - /var/lib/mongo/data/journal 17998
Random Write IOPS - /var/lib/mongo/data/journal 8822

It should come as no surprise that by adding more drives into the configuration, we get better IOPS, but you might be wondering why the results aren't "betterer" when it comes to the IOPS in the SSD drive configurations. While the IOPS numbers improve going from four to ten drives in the medium engineered server and six to thirty-four drives in the large engineered server, they don't increase as significantly as the IOPS differences in the SAS drives. This is what I meant when I explained that several factors contribute to and potentially limit IOPS performance. In this case, the limiting factor throttling the (ridiculously high) IOPS is the RAID card we are using in the servers. We've been working with our RAID card vendor to test a new card that will open a little more headroom for SSD IOPS, but that replacement card doesn't provide the consistency and reliability we need for these servers (which is just as important as speed).

There are probably a dozen other observations I could point out about how each result compares with the others (and why), but I'll stop here and open the floor for you. Do you notice anything interesting in the results? Does anything surprise you? What kind of IOPS performance have you seen from your server/cloud instance when running a tool like fio?

-Kelly

November 16, 2012

Going Global: Domo Arigato, Japan

I'm SoftLayer's director of international operations, so I have the unique pleasure of spending a lot of time on airplanes and in hotels as I travel between Dallas, Amsterdam, Singapore and wherever else our event schedule dictates. In the past six months, I've spent most of my time in Asia, and I've tried to take advantage of the opportunity relearn the culture to help shape SoftLayer Asia's business.

To really get a sense the geographic distance between Dallas and Singapore, find a globe and put one index finger on Dallas and put your other index finger on Singapore. To travel from one location to the other, you fly to the other side of the planet. Given the space considerations, our network map uses a scaled-down representative topology to show our points of presence in a single view, and you get a sense of how much artistic license was used when you actually make the trip to Singapore.

Global Network

The longest currently scheduled commercial flight on the planet takes you from Singapore to Newark in a cool 19 hours, but I choose to maintain my sanity rather than set world records for amount of time spent in a metal tube. I usually hop from Dallas to Tokyo (a mere 14 hours away) where I spend a few days, and I get on another plane down to Singapore.

The break between the two legs of the trip serves a few different purposes ... I get a much needed escape from the confines of an airplane, I'm able to spend time in an amazing city (where I lived 15 years ago), and I can use the opportunity to explore the market for SoftLayer. Proximity and headcount dictated that we spend most of our direct marketing and sales time focusing on the opportunities radiating from Singapore, so we haven't been able to spend as much time as we'd like in Japan. Fortunately, we've been able organically grow our efforts in the country through community-based partnerships and sponsorships, and we owe a great deal of our success to our partners in the region and our new-found friends. I've observed from our experience in Japan that the culture breeds two contrasting business realities that create challenges and opportunities for companies like SoftLayer: Japan is insular and Japan is global.

When I say that Japan is insular, I mean that IT purchases are generally made in the realm of either Japanese firms or foreign firms that have spent decades building reputation in market. Becoming a trusted part of that market is a time-consuming (and expensive) endeavor, and it's easy for a business to be dissuaded as an outsider. The contrasting reality that Japanese businesses also have a huge need for global reach is where SoftLayer can make an immediate impact.

Consider the Japanese electronics and the automobile industries. Both were built internally before making the leap to other geographies, and over the course of decades, they have established successful brands worldwide. Japanese gaming companies, social media companies and vibrant start-up communities follow a similar trend ... only faster. The capital investment required to go global is negligible compared to their forebears because they don't need to build factories or put elaborate logistics operations in place anymore. Today, a Japanese company with a SaaS solution, a game or a social media experience can successfully share it with the world in a matter minutes or hours at minimal cost, and that's where SoftLayer is able to immediately serve the Japanese market.

The process of building the SoftLayer brand in Asia has been accelerated by the market's needs, and we don't take that for granted. We plan to continue investing in local communities and working with our partners to become a trusted and respected resource in the market, and we are grateful for the opportunities those relationships have opened for us ... Or as Styx would say, "Domo Arigato, Mr. Roboto."

-@quigleymar

July 19, 2012

The Human Element of SoftLayer - DAL05 DC Operations

One of the founding principles of SoftLayer is automation. Automation has enabled this company to provide our customers with a world class experience, and it enables employees to provide excellent service. It allows us to quickly deploy a variety of solutions at the click of a button, and it guarantees consistency in the products that we deliver. Automation isn't the whole story, though. The human element plays a huge role in SoftLayer's success.

As a Site Manager for the corporate facility, I thought I could share a unique perspective when it comes to what that human element looks like, specifically through the lens of the Server Build Team's responsibilities. You recently heard how my colleague, Broc Chalker, became an SBT, and so I wanted take it a step further by providing a high-level breakdown of how the Server Build Team enables SoftLayer to keep up with the operational demands of a rapidly growing, global infrastructure provider.

The Server Build Team is responsible for filling all of the beautiful data center environments you see in pictures and videos of SoftLayer facilities. Every day, they are in the DC, building out new rows for inventory. It sounds pretty simple, but it's actually a pretty involved process. When it comes to prepping new rows, our primary focus is redundancy (for power, cooling and network). Each rack is powered by dual power sources, four switches in a stacked configuration (two public network, two private network), and an additional switch that provides KVM access to the server. To make it possible to fill the rack with servers, we also have to make sure it's organized well, and that takes a lot of time. Just watch the video of the Go Live Crew cabling a server rack in SJC01, and you can see how time- and labor-intensive the process is. And if there are any mistakes or if the cables don't look clean, we'll cut all the ties and start over again.



 

In addition to preparing servers for new orders, SBTs also handle hardware-related requests. This can involve anything from changing out components for a build, performing upgrades / maintenance on active servers, or even troubleshooting servers. Any one of these requests has to be treated with significant urgency and detail.



 

The responsibilities do not end there. Server Build Technicians also perform a walk of the facility twice per shift. During this walk, technicians check for visual alerts on the servers and do a general facility check of all SoftLayer pods. Note: Each data center facility features one or more pods or "server rooms," each built to the same specifications to support up to 5,000 servers.



 

The DAL05 facility has a total of four pods, and at the end of the build-out, we should be running 18,000-20,000 servers in this facility alone. Over the past year, we completed the build out of SR02 and SR03 (pod 2 and 3, respectively), and we're finishing the final pod (SR04) right now. We've spent countless hours building servers and monitoring operating system provisions when new orders roll in, and as our server count increases, our team has grown to continue providing the support our existing customers expect and deserve when it comes to upgrade requests and hardware-related support tickets.



 

To be successful, we have to stay ahead of the game from an operations perspective. The DAL05 crew is working hard to build out this facility's last pod (SR04), but for the sake of this blog post, I pulled everyone together for a quick photo op to introduce you to the team.

DAL05 Day / Evening Team and SBT Interns (with the remaining racks to build out in DAL05):
DAL05 DC Ops

DAL05 Overnight Server Build Technician Team:
DAL05 DC Ops

Let us know if there's ever anything we can do to help you!

-Joshua

July 13, 2012

When Opportunity Knocks

I've been working in the web hosting industry for nearly five years now, and as is the case with many of the professionals of my generation, I grew up side by side with the capital-I Internet. Over those five years, the World Wide Web has evolved significantly, and it's become a need. People need the Internet to communicate, store information, enable societal connectivity and entertain. And they need it 24 hours per day, seven days a week. To affirm that observation, you just need to look at an excerpt from a motion submitted to the Human Rights Council and recently passed by the United Nations General Assembly:

The General Session ... calls upon all States to promote and facilitate access to the Internet and international cooperation aimed at the development of media and information and communications facilities in all countries.

After a platform like the Internet revolutionizes the way we see the world, it's culturally impossible to move backward. Its success actually inspires us to look forward for the next world-changing innovation. Even the most non-technical citizen of the Internet has come to expect those kinds of innovations as the Internet and its underlying architecture have matured and seem to be growing like Moore's Law: Getting faster, better, and bigger all the time. The fact that SoftLayer is able to keep up with that growth (and even continue innovating in the process) is one of the things I admire most about the company.

I love that our very business model relies on our ability to enable our customers' success. Just look at how unbelievably successful companies like Tumblr and HostGator have become, and you start to grasp how big of a deal it is that we can help their businesses. We're talking billions of pageviews per month and hundreds of thousands of businesses that rely on SoftLayer through our customers. And that's just through two customers. Because we're on the cutting edge, and we provide unparalleled access and functionality, we get to see a lot of the up-and-coming kickstarts that are soon to hit it big, and we get to help them keep up with their own success.

On a personal level, I love that SoftLayer provides opportunities for employees. Almost every department has a career track you can follow as you learn more about the business and get a little more experience, and you're even able to transition into another department if you're drawn to a new passion. I recently move to the misty northwest (Seattle) when given the opportunity by SoftLayer, and after working in the data center, I decided to pursue a role as a systems administrator. It took a lot of hard work, but I made the move. Hard work is recognized, and every opportunity I've taken advantage of has been fulfilled. You probably think I'm biased because I've done well in the organization, and that might be a fair observation, but in reality, the opportunities don't just end with me.

One of my favorite stories to share about SoftLayer is the career path of my best friend, Goran. I knew he was a hard worker, so I referred him to the company a few years ago, and he immediately excelled as an Operations Tech. He proved himself on the Go-Live Crew in Amsterdam by playing a big role in the construction of AMS01, and he was promoted to a management position in that facility. He had been missing Europe for the better part of a decade, SoftLayer gave him a way to go back home while doing what he loves (and what he's good at).

If that Goran's story isn't enough for you, I could tell you about Robert. He started at SoftLayer as a data center tech, and he worked hard to become a systems administrator, then he was named a site manager, then he was promoted to senior operations manager, and now he's the Director of Operations. You'll recognize him as the guy with all of the shirts in Lance's "Earn Your Bars" blog post from December. He took every rung on the ladder hand-over-hand because no challenge could overwhelm him. He sought out what needed to be done without being asked, and he was proactive about make SoftLayer even better.

I could tell you about dozens of others in the company that have the same kinds of success stories because they approached the opportunities SoftLayer provided them with a passion and positive attitude that can't be faked. If being successful in an organization makes you biased, we're all biased. We love this environment. We're presented with opportunities and surrounded by people encouraging us to take advantage of those opportunities, and as a result, we can challenge ourselves and reach our potential. No good idea is ignored, and no hard work goes unrecognized.

I'm struggling to suppress the countless "opportunity" stories I've seen in my tenure at SoftLayer, but I think the three stories above provide a great cross-section of what it looks like to work for SoftLayer. If you like being challenged (and being rewarded for your hard work), you might want to take this opportunity to see which SoftLayer Career could be waiting for you.

When opportunity knocks, let it in.

-Hilary

Categories: 
July 12, 2012

An Insider's Look at SoftLayer's Growth in Amsterdam

Last week, SoftLayer was featured on the NOS national news here in the Netherlands in a segment that allowed us to tell our story and share how we're settling into our new Amsterdam home. I've only been a SLayer for about nine months now, and as I watched the video, I started to reflect on how far we've come in such a surprisingly short time. Take a second to check it out (don't worry, it's not all in Dutch):

To say that I had to "hit the ground running" when I started at SoftLayer would be an understatement. The day after I got the job, I was on a plane to SoftLayer's Dallas headquarters to meet the team behind the company. To be honest, it was a pretty daunting task, but I was energized at the opportunity to learn about how SoftLayer became largest privately owned hosting company in the world from the people who started it. When I look back at the interview Kevin recorded with me, I'm surprised that I didn't look like a deer in the headlights. At the time, AMS01 was still in the build-out phase, so my tours and meetings in DAL05 were both informative and awe-inspiring.

When I returned to Europe, I was energized to start playing my role in the company's new pursuit of its global goals.

It didn't take long before I started seeing the same awe-inspiring environment take place in our Amsterdam facility ... So much so that I'm convinced that at least a few of the "Go Live Crew" members were superhuman. As it turns out, when you build identical data center pods in every location around the world, you optimize the process and figure out the best ways to efficiently use your time.

By the time the Go Live Crew started packing following the successful (and on-time) launch of AMS01, I started feeling the pressure. The first rows of server racks were already being filled by customers, but the massive data center space seemed impossibly large when I started thinking of how quickly we could fill it. Most of my contacts in Europe were not familiar with the SoftLayer name, and because my assigned region was Europe Middle East and Africa — a HUGE diverse region with many languages, cultures and currencies — I knew I had my work cut out for me.

I thought, "LET'S DO THIS!"

EMEA is home to some of the biggest hosting markets in the world, so my first-week whirlwind tour of Dallas actually set the stage quite nicely for what I'd be doing in the following months: Racking up air miles, jumping onto trains, attending countless trade shows, meeting with press, reaching out to developer communities and corresponding with my fellow SLayers in the US and Asia ... All while managing the day-to-day operations of the Amsterdam office. As I look back at that list, I'm amazed how the team came together to make sure everything got done.

We have come a long way.

As I started writing this blog, BusinessReview Europe published a fantastic piece on SoftLayer in their July 2012 magazine (starting on page 172) that seems to succinctly summarize how we've gotten where we are today: "Innovation Never Sleeps."

BusinessReview Europe

Our first pod is almost full of servers humming and flashing. When we go to tradeshows and conferences throughout Europe, people not only know SoftLayer, many of them are customers with servers in AMS01. That's the kind of change we love.

The best part of my job right now is that our phenomenal success in the past nine months is just a glimmer of what the future holds. Come to think of it, we're going to need some more people.

-@jpwisler

May 30, 2012

What Does Automation Look Like?

Innovation. Automation. Innovation. Automation. Innovation. Automation. That's been our heartbeat since SoftLayer was born on May 5, 2005. The "Innovation" piece is usually the most visible component of that heartbeat while "Automation" usually hangs out behind the scenes (enabling the "Innovation"). When we launch a new product line like Object Storage, add new functionality to the SoftLayer API, announce a partnership with a service provider like RightScale, or simply receive and rack the latest and greatest server hardware from our vendors, our automated platform allows us to do it quickly and seamlessly. Because our platform is built to do exactly what it's supposed to without any manual intervention, it's easily overlooked.

But what if we wanted to show what automation actually looks like?

It seems like a silly question to ask. If our automated platform is powered by software built by the SoftLayer development team, there's no easy way to show what that automation looks like ... At least not directly. While the bits and bytes aren't easily visible, the operational results of automation are exceptionally photogenic. Let's take a look at a few examples of what automation enables to get an indirect view of what it actually looks like.

Example: A New Server Order

A customer orders a dedicated server. That customer wants a specific hardware configuration with a specific suite of software in a specific data center, and it needs to be delivered within four hours. What does that usually look like from an operations perspective?

SoftLayer Server Rack

If you want to watch those blinking lights for two or three hours, you'll have effectively watched a new server get provisioned at SoftLayer. When an order comes in, the automated provisioning system will find a server matching the order's hardware requirements in the requested data center facility, and the software will be installed before it is handed over to the the customer.

Example: Server Reboot or Operating System Reload

A customer needs to reboot a server or install a new operating system. Whether they want a soft reboot, a hard reboot with a full power cycle or a blank operating system install, the scene in the data center will look eerily familiar:

SoftLayer Server Rack

Gone are the days of server build technicians wheeling a terminal over to every server that needs work done. From thousands of miles away, a customer can remotely "unplug" his or her server via the rack's power strip, initiate a soft reboot or reinstall an operating system. But what if they want even more accessibility?

Example: What's on the Screen?

When remotely rebooting or power cycling a server isn't enough, a customer might want someone in the data center to wheel over to their server in the rack to look at any of the messages that can only be read with a monitor attached. This would generally happen behind the server, but for the sake of this example, we'll just watch the data center technician pass in front of the servers to get to the back:

SoftLayer Server Rack

Yeah, you probably could have seen that one coming.

Because KVM over IP is included on every server, physical carts carrying "keyboard, video and mouse" are few and far between. By automating customers' access to their server and providing as much virtual access as we possibly can, we're able to "get out of the way" of our technical users and only step in to help when that help is needed.

I could go on and on with examples of cloud computing upgrades and downgrades, provisioning a firewall or adding a load balancers, but I'll practice a little restraint. If you want the full effect, you can scroll up and watch the blinking lights a little while longer.

Automation looks like what you don't see. No humanoid robots or needlessly complex machines (that I know of) ... Just a data center humming along with some beautiful flashing server lights.

-Duke

P.S. If you want to be able to remotely bask in the glow of some blinking server lights, bookmark the larger-sized SoftLayer Rack animated gif ... You could even title the bookmark, "Check on the Servers."

April 9, 2012

Scaling SoftLayer

SoftLayer is in the business of helping businesses scale. You need 1,000 cloud computing instances? We'll make sure our system can get them online in 10 minutes. You need to spin up some beefy dedicated servers loaded with dual 8-core Intel Xeon E5-2670 processors and high-capacity SSDs for a new application's I/O-intensive database? We'll get it online anywhere in the world in under four hours. Everywhere you look, you'll see examples of how we help our customers scale, but what you don't hear much about is how our operations team scales our infrastructure to ensure we can accommodate all of our customers' growth.

When we launch a new data center, there's usually a lot of fanfare. When AMS01 and SNG01 came online, we talked about the thousands of servers that are online and ready. We meet huge demand for servers on a daily basis, and that presents us with a challenge: What happens when the inventory of available servers starts dwindling?

Truck Day.

Truck Day not limited to a single day of the year (or even a single day in a given month) ... It's what we call any date our operations team sets for delivery and installation of new hardware. We communicate to all of our teams about the next Truck Day in each location so SLayers from every department can join the operations team in unboxing and preparing servers/racks for installation. The operations team gets more hands to speed up the unloading process, and every employee has an opportunity to get first-hand experience in how our data centers operate.

If you want a refresher course about what happens on a Truck Day, you can reference Sam Fleitman's "Truck Day Operations" blog, and if you want a peek into what it looks like, you can watch Truck Day at SR02.DAL05. I don't mean to make this post all about Truck Day, but Truck Day is instrumental in demonstrating the way SoftLayer scales our own infrastructure.

Let's say we install 1,000 servers to officially launch a new pod. Because each pod has slots for 5,000 servers, we have space/capacity for 3,000-4,000 more servers in the server room, so as soon as more server hardware becomes available, we'll order it and start preparing for our next Truck Day to supplement the pod's inventory. You'd be surprised how quickly 1,000 servers can be ordered, and because it's not very easy to overnight a pallet of servers, we have to take into account lead time and shipping speeds ... To accommodate our customers' growth, we have to stay one step ahead in our own growth.

This morning in a meeting, I saw a pretty phenomenal bullet that got me thinking about this topic:

Truck Day — 4/3 (All Sites): 2,673 Servers

In nine different data center facilities around the world, more than 2,500 servers were delivered, unboxed, racked and brought online. Last week. In one day.

Now I know the operations team wasn't looking for any kind of recognition ... They were just reporting that everything went as planned. Given the fact that an accomplishment like that is "just another day at SoftLayer" for those guys, they definitely deserve recognition for the amazing work they do. We host some of the most popular platforms, games and applications on the Internet, and the DC-Ops team plays a huge role in scaling SoftLayer so our customers can scale themselves.

-@gkdog

December 13, 2011

Do Your Homework!

As far back as I can remember, I hated homework. Homework was cutting into MY time as a kid, then teenager, then young adult ... and since I am still a "young adult," that's where I have to stop my list. One of the unfortunate realizations that I've come to in my "young adult" life is that homework can be a good thing. I know that sounds crazy, so I've come prepared with a couple of examples:

The Growing Small Business Example
You run a small Internet business, and you've been slowly growing over the years until suddenly you get your product/service mix just right and a wave of customers are beating down the door ... or in your case, they're beating down your website. The excitement of the surge in business is quickly replaced by panic, and you find yourself searching for cheap web servers that can be provisioned quickly. You find one that looks legit and you buy a dozen new dedicated servers and some cloud storage.

You alert your customers of the maintenance window and spend the weekend migrating and your now-valuable site to the new infrastructure. On Monday, you get the new site tuned and ready, and you hit the "go" button. Your customers are back, flocking to the site again, and all is golden. As the site gains more traffic over the next couple of weeks, you start to see some network lag and some interesting issues with hardware. You see a thread or two in the social media world about your new shiny site becoming slow and cumbersome, and you look at the network graphs where you notice there are some capacity issues with your provider.

Frustrated, you do a little "homework," and you find out that the cheap service provider you chose has a sketchy history and many complaints about the quality of their network. As a result, you go on a new search for a hosting provider with good reviews, and you have to hang another maintenance sign while you do all the hard work behind the scenes once again. Not doing your homework before making the switch in this case probably cost you a good amount of sleep, some valuable business, and the quality of service you wanted to provide your customers.

The Compliance-Focused Example
I still live, eat, and breathe compliance for SoftLayer, and we had an eye-opening experience when sorting through the many compliance differences. As you probably recall (Skinson 1634AR15), I feel like everyone should agree to an all-inclusive compliance model and stick to just that one, but that feeling hasn't caught on anywhere outside of our office.

In 2011, SoftLayer ramped up some of our compliance efforts and started planning for 2012. With all the differences in how compliance processes for things like FISMA, HIPAA, PCI Level 1 - 4, SSAE16, SOC 1 and SOC2 are measured, it was tough to work on one without affecting another. We were working with a few different vendors, if we flipped "Switch A," Auditor #1 was happy. When we told Auditor #2 that we flipped "Switch A," they hated it so much they almost started crying. It started to become the good ol' "our way is not just the better way, it's the only way" scenario.

So what did we do? Homework! We spent the last six months looking at all the compliances and mapping them against each other. Surprisingly enough, we started noticing a lot of similarities. From there, we started interviewing auditing and compliance firms and finally found one that was ahead of us in the similarity game and already had a matrix of similarities and best practices that affect most (if not all) of the compliances we wanted to focus on.

Not only did a little homework save us a ton of cash in the long run, it saved the small trees and bushes under the offices of our compliance department from the bodies that would inevitably crash down on them when we all scampered away from the chaos and confusion seemingly inherent in pursuing multiple difference compliances at the same time.

The moral of the story: Kiddos, do your homework. It really is good for something, we promise.

-@Skinman454

Subscribe to operations