Infrastructure Posts

June 9, 2014

Visualizing a SoftLayer Billing Order

In my time spent as a data and object modeler, I’ve dealt with both good and bad examples of model visualization. As an IBMer through the Rational acquisition, I have been using modeling tools for a long time. I can appreciate a nice diagram shining a ray of light on an object structure, and abhor a behemoth spaghetti diagram.

When I started studying SoftLayer’s API documentation, I saw both the relational and hierarchical nature of SoftLayer’s concept model. The naming convention of API services and data types embodies their hierarchical structure. While reading about “relational properties” in data types, I thought it would be helpful to see diagrams showing relationships between services and data types versus clicking through reference pages. After all, diagramming data models is a valuable complement to verbal descriptions.

One way people can deal with complex data models is to digest them a little at a time. I can’t imagine a complete data model diagram of SoftLayer’s cloud offering, but I can try to visualize small portions of it. In this spirit, after reviewing article and blog entries on creating product orders using SoftLayer’s API, I drew an E-R diagram, using IBM Rational Software Architect, of basic order elements.

The diagram, Figure 1, should help people understand data entities involved in creating SoftLayer product orders and the relationships among the entities. In particular, IBM Business Partners implementing custom re-branded portals to support the ordering of SoftLayer resources will benefit from visualization of the data model. Picture this!

Figure 1. Diagram of the SoftLayer Billing Order

A user account can have many associated billing orders, which are composed of billing order items. Billing order items can contain multiple order containers that hold a product package. Each package can have several configurations including product item categories. They can be composed of product items with each item having several possible prices.

-Andrew

Andrew Hoppe, Ph.D., is a Worldwide Channel Solutions Architect for SoftLayer, an IBM Company.

March 7, 2014

Why the Cloud Scares Traditional IT

My background is "traditional IT." I've been architecting and promoting enterprise virtualization solutions since 2002, and over the past few years, public and hybrid cloud solutions have become a serious topic of discussion ... and in many cases, contention. The customers who gasped with excitement when VMware rolled out a new feature for their on-premises virtualized environments would dismiss any recommendations of taking a public cloud or a hybrid cloud approach. Off-premises cloud environments were surrounded by marketing hype, and the IT departments considering them had legitimate concerns, especially around security and compliance.

I completely understood their concerns, and until recently, I often agreed with them. The cloud model is intimidating. If you've had control over every aspect of your IT environment for a few decades, you don't want to give up access to your infrastructure, much less have to trust another company to protect your business-critical information. But now, I think about those concerns as the start of a conversation about cloud, rather than a "no-go" zone. The cloud is different, but a company's approach to it should still be the same.

What do I mean by that? Enterprise developers and engineers still have to serve as architects to determine the functional and operational requirements for their services. In that process, they need to determine the suitability of a given platform for the computing workload and the company's business objectives and core competencies. Unfortunately, many of the IT decision-makers don't consider the bigger business context, and they choose to build their own "public" IaaS offerings to accommodate internal workloads, and in many cases, their own external clients.

This approach might makes sense for service providers, integrators and telcos because infrastructure resources are core components of their businesses, but I've seen the same thing happen at financial institutions, rental companies, and even an airline. Over time, internal IT departments carved out infrastructure-services revenue streams that are totally unrelated to the company's core business. The success of enterprise virtualization often empowered IT departments through cost savings and automation — making the promise of delivering public cloud “in-house” a natural extension and seemingly attractive proposition. Reshaping their perspectives around information security and compliance in that way is often a functional approach, but is it money well spent?

Instead of spending hundreds of thousands or millions of dollars in capital to build out (often commoditized) infrastructure, these businesses could be investing those resources in developing and marketing their core business areas. To give you an example of how a traditional IT task is performed in the cloud, I can share my experience from when I first accessed my SoftLayer account: I deployed a physical ESX host alongside a virtual compute instance, fully pre-configured with OS and vCenter, and I connected it via VPN to my existing (on-prem) vCenter environment. In the old model, that process would have probably taken a couple of days to complete, and I got it done in 3 hours.

Now more than ever, it is the responsibility of the core business line to validate internal IT strategies and evaluate alternatives. Public cloud is not always the right answer for all workloads, but driven by the rapidly evolving maturity and proliferation of IaaS, PaaS and SaaS offerings, most organizations will see significant benefits from it. Ultimately, the best way to understand the potential value is just to give it a try.

-Andy

Andreas Groth is an IBM worldwide channel solutions architect, focusing primarily on SoftLayer. Follow him on Twitter: @andreasgroth

February 6, 2014

Building a Bridge to the OpenStack API

OpenStack is experiencing explosive growth in the cloud market. With more than 200 companies contributing code to the source and new installations coming online every day, OpenStack is pushing hard to become a global standard for cloud computing. Dozens of useful tools and software products have been developed using the OpenStack API, so a growing community of administrators, developers and IT organizations have access to easy-to-use, powerful cloud resources. This kind of OpenStack integration is great for users on a full OpenStack cloud, but it introduces a challenge to providers and users on other cloud platforms: Should we consider deploying or moving to an OpenStack environment to take advantage of these tools?

If a cloud provider spends years developing a unique platform with a proprietary API, implementing native support for the OpenStack API or deploying a full OpenStack solution may be cost prohibitive, even with significant customer and market demand. The provider can either bite the bullet to implement OpenStack compatibility, hope that a third party library like libclouds or fog is updated to support its API, or choose to go it alone and develop an ecosystem of products around its own API.

Introducing Jumpgate

When we were faced with this situation at SoftLayer, we chose a fourth option. We wanted to make the process of creating an OpenStack-compatible API simpler and more modular. That's where Jumpgate was born. Jumpgate is a middleware that acts as a compatibility layer between the OpenStack API and a provider's proprietary API. Externally, it exposes endpoints that adhere to OpenStack's published and accepted API specification, which it then translates into the provider's API using a series of drivers. Think of it as a mechanism to enable passing from one realm/space into another — like the jumpgates featured in science fiction works.

Connection

How Jumpgate Works
Let's take a look at a high-level example: When you want to create a new virtual instance on OpenStack, you might use the Horizon dashboard or the Nova command line client. When you issue the request, the tool first makes a REST call to a Keystone endpoint for authentication, which returns an authorization token. The client then makes another REST call to a Nova endpoint, which manages the computing instances, to create the actual virtual instance. Nova may then make calls to other tools within the cluster for networking (Quantum), image information (Glance), block storage (Cinder), or more. In addition, your client may also send requests directly to some of these endpoints to query for status updates, information about available resources, and so on.

With Jumpgate, your tool first hits the Jumpgate middleware, which exposes a Keystone endpoint. Jumpgate takes the request, breaks it apart into its relevant pieces, then loads up your provider's appropriate API driver. Next, Jumpgate reformats your request into a form that the driver supports and sends it to the provider's API endpoint. Once the response comes back, Jumpgate again uses the driver to break apart the proprietary API response, reformats it into an OpenStack compatible JSON payload, and sends it back to your client. The result is that you interact with an OpenStack-compatible API, and your cloud provider processes those interactions on their own backend infrastructure.

Internally, Jumpgate is a lightweight middleware built in Python using the Falcon Framework. It provides endpoints for nearly every documented OpenStack API call and allows drivers to attach handlers to these endpoints. This modular approach allows providers to implement only the endpoints that are of the highest importance, rolling out OpenStack API compatibility in stages rather than in one monumental effort. Since it sits alongside the provider's existing API, Jumpgate provides a new API interface without risking the stability already provided by the existing API. It's a value-add service that increases customer satisfaction without a huge increase in cost. Once full implementations is finished, a provider with a proprietary cloud platform can benefit from and offer all the tools that are developed to work with the OpenStack API.

Jumpgate allows providers to test the proper OpenStack compatibility of their drivers by leveraging the OpenStack Tempest test suite. With these tests, developers run the full suite of calls used by OpenStack itself, highlighting edge cases or gaps in functionality. We've even included a helper script that allows Tempest to only run a subset of tests rather than the entire suite to assist with a staged rollout.

Current Development
Jumpgate is currently in an early alpha stage. We've built the compatibility framework itself and started on the SoftLayer drivers as a reference. So far, we've implemented key endpoints within Nova (computing instances), Keystone (identification and authorization), and Glance (image management) to get most of the basic functionality within Horizon (the web dashboard) working. We've heard that several groups outside SoftLayer are successfully using Jumpgate to drive products like Trove and Heat directly on SoftLayer, which is exciting and shows that we're well beyond the "proof of concept" stage. That being said, there's still a lot of work to be done.

We chose to develop Jumpgate in the open with a tool set that would be familiar to developers working with OpenStack. We're excited to debut this project for the broader OpenStack community, and we're accepting pull requests if you're interested in contributing. Making more clouds compatible with the OpenStack API is important and shouldn’t be an individual undertaking. If you're interested in learning more or contributing, head over to our in-flight project page on GitHub: SoftLayer Jumpgate. There, you'll find everything you need to get started along with the updates to our repository. We encourage everyone to contribute code or drivers ... or even just open issues with feature requests. The more community involvement we get, the better.

-Nathan

Categories: 
January 31, 2014

Simplified OpenStack Deployment on SoftLayer

"What is SoftLayer doing with OpenStack?" I can't even begin to count the number of times I've been asked that question over the last few years. In response, I'll usually explain how we've built our object storage platform on top of OpenStack Swift, or I'll give a few examples of how our customers have used SoftLayer infrastructure to build and scale their own OpenStack environments. Our virtual and bare metal cloud servers provide a powerful and flexible foundation for any OpenStack deployment, and our unique three-tiered network integrates perfectly with OpenStack's Compute and Network node architecture, so it's high time we make it easier to build an OpenStack environment on SoftLayer infrastructure.

To streamline and simplify OpenStack deployment for the open source community, we've published Opscode Chef recipes for both OpenStack Grizzly and OpenStack Havana on GitHub: SoftLayer Chef-Openstack. With Chef and SoftLayer, your own OpenStack cloud is a cookbook away. These recipes were designed with the needs of growth and scalability in mind. Let's take a deeper look into what exactly that means.

OpenStack has adopted a three-node design whereby a controller, compute, and network node make up its architecture:

OpenStack Architecture on SoftLayer

Looking more closely at any one node reveal the services it provides. Scaling the infrastructure beyond a few dozen nodes, using this model, could create bottlenecks in services such as your block store, OpenStack Cinder, and image store, OpenStack Glance, since they are traditionally located on the controller node. Infrastructure requirements change from service to service as well. For example OpenStack Neutron, the networking service, does not need much disk I/O while the Cinder storage service might heavily rely on a node's hard disk. Our cookbook allows you to choose how and where to deploy the services, and it even lets you break apart the MySQL backend to further improve platform performance.

Quick Start: Local Demo Environment

To make it easy to get started, we've created a rapid prototype and sandbox script for use with Vagrant and Virtual Box. With Vagrant, you can easily spin up a demo environment of Chef Server and OpenStack in about 15 minutes on moderately good laptops or desktops. Check it out here. This demo environment is an all-in-one installation of our Chef OpenStack deployment. It also installs a basic Chef server as a sandbox to help you see how the SoftLayer recipes were deployed.

Creating a Custom OpenStack Deployment

The thee-node OpenStack model does well in small scale and meets the needs of many consumers; however, control and customizability are the tenants for the design of the SoftLayer OpenStack Chef cookbook. In our model, you have full control over the configuration and location of eleven different components in your deployed environment:

Our Chef recipes will take care of populating the configuration files with the necessary information so you won't have to. When deploying, you merely add the role for the matching service to a hardware or virtual server node, and Chef will deploy the service to it with all the configuration done automatically, including adding multiple Neutron, Nova, and Cinder nodes. This approach allows you to tailor the needs of each service to the hardware it will be deployed to--you might put your Neutron hardware node on a server with 10-gigabit network interfaces and configure your Cinder hardware node with RAID 1+0 15k SAS drives.

OpenStack is a fast growing project for the implementation of IaaS in public and private clouds, but its deployment and configuration can be overwhelming. We created this cookbook to make the process of deploying a full OpenStack environment on SoftLayer quick and straightforward. With the simple configuration of eleven Chef roles, your OpenStack cloud can be deployed onto as little as one node and scaled up to many as hundreds (or thousands).

To follow this project, visit SoftLayer on GitHub. Check out some of our other projects on GitHub, and let us know if you need any help or want to contribute.

-@marcalanjones

August 22, 2013

Network Cabling Controversy: Zip Ties v. Hook & Loop Ties

More than 210,000 users have watched a YouTube video of our data center operations team cabling a row of server racks in San Jose. More than 95 percent of the ratings left on the video are positive, and more than 160 comments have been posted in response. To some, those numbers probably seem unbelievable, but to anyone who has ever cabled a data center rack or dealt with a poorly cabled data center rack, the time-lapse video is enthralling, and it seems to have catalyzed a healthy debate: At least a dozen comments on the video question/criticize how we organize and secure the cables on each of our server racks. It's high time we addressed this "zip ties v. hook & loop (Velcro®)" cable bundling controversy.

The most widely recognized standards for network cabling have been published by the Telecommunications Industry Association and Electronics Industries Alliance (TIA/EIA). Unfortunately, those standards don't specify the physical method to secure cables, but it's generally understood that if you tie cables too tight, the cable's geometry will be affected, possibly deforming the copper, modifying the twisted pairs or otherwise physically causing performance degradation. This understanding begs the question of whether zip ties are inherently inferior to hook & loop ties for network cabling applications.

As you might have observed in the "Cabling a Data Center Rack" video, SoftLayer uses nylon zip ties when we bundle and secure the network cables on our data center server racks. The decision to use zip ties rather than hook & loop ties was made during SoftLayer's infancy. Our team had a vision for an automated data center that wouldn't require much server/cable movement after a rack is installed, and zip ties were much stronger and more "permanent" than hook & loop ties. Zip ties allow us to tighten our cable bundles easily so those bundles are more structurally solid (and prettier). In short, zip ties were better for SoftLayer data centers than hook & loop ties.

That conclusion is contrary to the prevailing opinion in the world of networking that zip ties are evil and that hook & loop ties are among only a few acceptable materials for "good" network cabling. We hear audible gasps from some network engineers when they see those little strips of nylon bundling our Ethernet cables. We know exactly what they're thinking: Zip ties negatively impact network performance because they're easily over-tightened, and cables in zip-tied bundles are more difficult to replace. After they pick their jaws up off the floor, we debunk those myths.

The first myth (that zip ties can negatively impact network performance) is entirely valid, but its significance is much greater in theory than it is in practice. While I couldn't track down any scientific experiments that demonstrate the maximum tension a cable tie can exert on a bundle of cables before the traffic through those cables is affected, I have a good amount of empirical evidence to fall back on from SoftLayer data centers. Since 2006, SoftLayer has installed more than 400,000 patch cables in data centers around the world (using zip ties), and we've *never* encountered a fault in a network cable that was the result of a zip tie being over-tightened ... And we're not shy about tightening those ties.

The fact that nylon zip ties are cheaper than most (all?) of the other more "acceptable" options is a fringe benefit. By securing our cable bundles tightly, we keep our server racks clean and uniform:

SoftLayer Cabling

The second myth (that cables in zip-tied bundles are more difficult to replace) is also somewhat flawed when it comes to SoftLayer's use case. Every rack is pre-wired to deliver five Ethernet cables — two public, two private and one out-of-band management — to each "rack U," which provides enough connections to support a full rack of 1U servers. If larger servers are installed in a rack, we won't need all of the network cables wired to the rack, but if those servers are ever replaced with smaller servers, we don't have to re-run network cabling. Network cables aren't exposed to the tension, pressure or environmental changes of being moved around (even when servers are moved), so external forces don't cause much wear. The most common physical "failures" of network cables are typically associated with RJ45 jack crimp issues, and those RJ45 ends are easily replaced.

Let's say a cable does need to be replaced, though. Servers in SoftLayer data centers have redundant public and private network connections, but in this theoretical example, we'll assume network traffic can only travel over one network connection and a data center technician has to physically replace the cable connecting the server to the network switch. With all of those zip ties around those cable bundles, how long do you think it would take to bring that connection back online? (Hint: That's kind of a trick question.) See for yourself:

The answer in practice is "less than one minute" ... The "trick" in that trick question is that the zip ties around the cable bundles are irrelevant when it comes to physically replacing a network connection. Data center technicians use temporary cables to make a direct server-to-switch connection, and they schedule an appropriate time to perform a permanent replacement (which actually involves removing and replacing zip ties). In the video above, we show a temporary cable being installed in about 45 seconds, and we also demonstrate the process of creating, installing and bundling a permanent network cable replacement. Even with all of those villainous zip ties, everything is done in less than 18 minutes.

Many of the comments on YouTube bemoan the idea of having to replace a single cable in one of these zip-tied bundles, but as you can see, the process isn't very laborious, and it doesn't vary significantly from the amount of time it would take to perform the same maintenance with a Velcro®-secured cable bundle.

Zip ties are inferior to hook & loop ties for network cabling? Myth(s): Busted.

-@khazard

P.S. Shout-out to Elijah Fleites at DAL05 for expertly replacing the network cable on an internal server for the purposes of this video!

July 24, 2013

Deconstructing SoftLayer's Three-Tiered Network

When Sun Microsystems VP John Gage coined the phrase, "The network is the computer," the idea was more wishful thinking than it was profound. At the time, personal computers were just starting to show up in homes around the country, and most users were getting used to the notion that "The computer is the computer." In the '80s, the only people talking about networks were the ones selling network-related gear, and the idea of "the network" was a little nebulous and vaguely understood. Fast-forward a few decades, and Gage's assertion has proven to be prophetic ... and it happens to explain one of SoftLayer's biggest differentiators.

SoftLayer's hosting platform features an innovative, three-tier network architecture: Every server in a SoftLayer data center is physically connected to public, private and out-of-band management networks. This "network within a network" topology provides customers the ability to build out and manage their own global infrastructure without overly complex configurations or significant costs, but the benefits of this setup are often overlooked. To best understand why this network architecture is such a game-changer, let's examine each of the network layers individually.

SoftLayer Private Network

Public Network

When someone visits your website, they are accessing content from your server over the public network. This network connection is standard issue from every hosting provider since your content needs to be accessed by your users. When SoftLayer was founded in 2005, we were the first hosting provider to provide multiple network connections by default. At the time, some of our competitors offered one-off private network connections between servers in a rack or a single data center phase, but those competitors built their legacy infrastructures with an all-purpose public network connection. SoftLayer offers public network connection speeds up to 10Gbps, and every bare metal server you order from us includes free inbound bandwidth and 5TB of outbound bandwidth on the public network.

Private Network

When you want to move data from one server to another in any of SoftLayer's data centers, you can do so quickly and easily over the private network. Bandwidth between servers on the private network is unmetered and free, so you don't incur any costs when you transfer files from one server to another. Having a dedicated private network allows you to move content between servers and facilities without fighting against or getting in the way of the users accessing your server over the public network.

It should come as no surprise to learn that all private network traffic stays on SoftLayer's network exclusively when it travels between our facilities. The blue lines in this image show how the private network connects all of our data centers and points of presence:

SoftLayer Private Network

To fully replicate the functionality provided by the SoftLayer private network, competitors with legacy single-network architecture would have to essentially double their networking gear installation and establish safeguards to guarantee that customers can only access information from their own servers via the private network. Because that process is pretty daunting (and expensive), many of our competitors have opted for "virtual" segmentation that logically links servers to each other. The traffic between servers in those "virtual" private networks still travels over the public network, so they usually charge you for "private network" bandwidth at the public bandwidth rate.

Out-of-Band Management Network

When it comes to managing your server, you want an unencumbered network connection that will give you direct, secure access when you need it. Splitting out the public and private networks into distinct physical layers provides significant flexibility when it comes to delivering content where it needs to go, but we saw a need for one more unique network layer. If your server is targeted for a denial of service attack or a particular ISP fails to route traffic to your server correctly, you're effectively locked out of your server if you don't have another way to access it. Our management-specific network layer uses bandwidth providers that aren't included in our public/private bandwidth mix, so you're taking a different route to your server, and you're accessing the server through a dedicated port.

If you've seen pictures or video from a SoftLayer data center (or if you've competed in the Server Challenge), you probably noticed the three different colors of Ethernet cables connected at the back of every server rack, and each of those colors carries one of these types of network traffic exclusively. The pink/red cables carry public network traffic, the blue cables carry private network traffic, and the green cables carry out-of-band management network traffic. All thirteen of our data centers have the same colored cables in the same configuration doing the same jobs, so we're able to train our operations staff consistently between all thirteen of our data centers. That consistency enables us to provide quicker service when you need it, and it lessens the chance of human error on the data center floor.

The most powerful server on the market can be sidelined by a poorly designed, inefficient network. If "the network is the computer," the network should be a primary concern when you select your next hosting provider.

-@khazard

July 16, 2013

Riak Performance Analysis: Bare Metal v. Virtual

In December, I posted a MongoDB performance analysis that showed the quantitative benefits of using bare metal servers for MongoDB workloads. It should come as no surprise that in the wake of SoftLayer's Riak launch, we've got some similar data to share about running Riak on bare metal.

To run this test, we started by creating five-node clusters with Riak 1.3.1 on SoftLayer bare metal servers and on a popular competitor's public cloud instances. For the SoftLayer environment, we created these clusters using the Riak Solution Designer, so the nodes were all provisioned, configured and clustered for us automatically when we ordered them. For the public cloud virtual instance Riak cluster, each node was provisioned indvidually using a Riak image template and manually configured into a cluster after all had come online. To optimize for Riak performance, I made a few tweaks at the OS level of our servers (running CentOS 64-bit):

Noatime
Nodiratime
barrier=0
data=writeback
ulimit -n 65536

The common Noatime and Nodiratime settings eliminate the need for writes during reads to help performance and disk wear. The barrier and writeback settings are a little less common and may not be what you'd normally set. Although those settings present a very slight risk for loss of data on disk failure, remember that the Riak solution is deployed in five-node rings with data redundantly available across multiple nodes in the ring. With that in mind and considering each node also being deployed with a RAID10 storage array, you can see that the minor risk for data loss on the failure of a single disk in the entire solution would have no impact on the entire data set (as there are plenty of redundant copies for that data available). Given the minor risk involved, the performance increases of those two settings justify their use.

With all of the nodes tweaked and configured into clusters, we set up Basho's test harness — Basho Bench — to remotely simulate load on the deployments. Basho Bench allows you to create a configurable test plan for a Riak cluster by configuring a number of workers to utilize a driver type to generate load. It comes packaged as an Erlang application with a config file example that you can alter to create the specifics for the concurrency, data set size, and duration of your tests. The results can be viewed as CSV data, and there is an optional graphics package that allows you to generate the graphs that I am posting in this blog. A simplified graphic of our test environment would look like this:

Riak Test Environment

The following Basho Bench config is what we used for our testing:

{mode, max}.
{duration, 120}.
{concurrent, 8}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator,{int_to_bin,{uniform_int,1000000}}}.
{value_generator,{exponential_bin,4098,50000}}.
{riakc_pb_ips, [{10,60,68,9},{10,40,117,89},{10,80,64,4},{10,80,64,8},{10,60,68,7}]}.
{riakc_pb_replies, 2}.
{operations, [{get, 10},{put, 1}]}.

To spell it out a little simpler:

Tests Performed

Data Set: 400GB
10:1 Query-to-Update Operations
8 Concurrent Client Connections
Test Duration: 2 Hours

You may notice that in the test cases that use SoftLayer "Medium" Servers, the virtual provider nodes are running 26 virtual compute units against our dual proc hex-core servers (12 cores total). In testing with Riak, memory is important to the operations than CPU resources, so we provisioned the virtual instances to align with the 36GB of memory in each of the "Medium" SoftLayer servers. In the public cloud environment, the higher level of RAM was restricted to packages with higher CPU, so while the CPU counts differ, the RAM amounts are as close to even as we could make them.

One final "housekeeping" note before we dive into the results: The graphs below are pulled directly from the optional graphics package that displays Basho Bench results. You'll notice that the scale on the left-hand side of graphs differs dramatically between the two environments, so a cursory look at the results might not tell the whole story. Click any of the graphs below for a larger version. At the end of each test case, we'll share a few observations about the operations per second and latency results from each test. When we talk about latency in the "key observation" sections, we'll talk about the 99th percentile line — 99% of the results had latency below this line. More simply you could say, "This is the highest latency we saw on this platform in this test." The primary reason we're focusing on this line is because it's much easier to read on the graphs than the mean/median lines in the bottom graphs.

Riak Test 1: "Small" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Small Riak Server Node
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
4 x 500GB SATAII – RAID10
1Gb Bonded Network
Virtual Provider Node
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
4 x 500GB Network Storage – RAID10
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

The SoftLayer environment showed much more consistency in operations per second with an average throughput around 450 Op/sec. The virtual environment throughput varied significantly between about 50 operations per second to more than 600 operations per second with the trend line fluctuating slightly between about 220 Op/sec and 350 Op/sec.

Comparing the latency of get and put requests, the 99th percentile of results in the SoftLayer environment stayed around 50ms for gets and under 200ms for puts while the same metric for the virtual environment hovered around 800ms in gets and 4000ms in puts. The scale of the graphs is drastically different, so if you aren't looking closely, you don't see how significantly the performance varies between the two.

Riak Test 2: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 300GB 15K SAS – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

Similar to the results of Test 1, the throughput numbers from the bare metal environment are more consistent (and are consistently higher) than the throughput results from the virtual instance environment. The SoftLayer environment performed between 1500 and 1750 operations per second on average while the virtual provider environment averaged around 1200 operations per second throughout the test.

The latency of get and put requests in Test 2 also paints a similar picture to Test 1. The 99th percentile of results in the SoftLayer environment stayed below 50ms and under 400ms for puts while the same metric for the virtual environment averaged about 250ms in gets and over 1000ms in puts. Latency in a big data application can be a killer, so the results from the virtual provider might be setting off alarm bells in your head.

Riak Test 3: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 128GB SSD – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

In Test 3, we're using the same specs in our virtual provider nodes, so the results for the virtual node environment are the same in Test 3 as they are in Test 2. In this Test, the SoftLayer environment substitutes SSD hard drives for the 15K SAS drives used in Test 2, and the throughput numbers show the impact of that improved I/O. The average throughput of the bare metal environment with SSDs is between 1750 and 2000 operations per second. Those numbers are slightly higher than the SoftLayer environment in Test 2, further distancing the bare metal results from the virtual provider results.

The latency of gets for the SoftLayer environment is very difficult to see in this graph because the latency was so low throughout the test. The 99th percentile of puts in the SoftLayer environment settled between 500ms and 625ms, which was a little higher than the bare metal results from Test 2 but still well below the latency from the virtual environment.

Summary

The results show that — similar to the majority of data-centric applications that we have tested — Riak has more consistent, better performing, and lower latency results when deployed onto bare metal instead of a cluster of public cloud instances. The stark differences in consistency of the results and the latency are noteworthy for developers looking to host their big data applications. We compared the 99th percentile of latency, but the mean/median results are worth checking out as well. Look at the mean and median results from the SoftLayer SSD Node environment: For gets, the mean latency was 2.5ms and the median was somewhere around 1ms. For puts, the mean was between 7.5ms and 11ms and the median was around 5ms. Those kinds of results are almost unbelievable (and that's why I've shared everything involved in completing this test so that you can try it yourself and see that there's no funny business going on).

It's commonly understood that local single-tenant resources that bare metal will always perform better than network storage resources, but by putting some concrete numbers on paper, the difference in performance is pretty amazing. Virtualizing on multi-tenant solutions with network attached storage often introduces latency issues, and performance will vary significantly depending on host load. These results may seem obvious, but sometimes the promise of quick and easy deployments on public cloud environments can lure even the sanest and most rational developer. Some applications are suited for public cloud, but big data isn't one of them. But when you have data-centric apps that require extreme I/O traffic to your storage medium, nothing can beat local high performance resources.

-Harold

February 8, 2013

Data Center Power-Up: Installing a 2-Megawatt Generator

When I was a kid, my living room often served as a "job site" where I managed a fleet of construction vehicles. Scaled-down versions of cranes, dump trucks, bulldozers and tractor-trailers littered the floor, and I oversaw the construction (and subsequent destruction) of some pretty monumental projects. Fast-forward a few years (or decades), and not much has changed except that the "heavy machinery" has gotten a lot heavier, and I'm a lot less inclined to "destruct." As SoftLayer's vice president of facilities, part of my job is to coordinate the early logistics of our data center expansions, and as it turns out, that responsibility often involves overseeing some of the big rigs that my parents tripped over in my youth.

The video below documents the installation of a new Cummins two-megawatt diesel generator for a pod in our DAL05 data center. You see the crane prepare for the work by installing counter-balance weights, and work starts with the team placing a utility transformer on its pad outside our generator yard. A truck pulls up with the generator base in tow, and you watch the base get positioned and lowered into place. The base looks so large because it also serves as the generator's 4,000 gallon "belly" fuel tank. After the base is installed, the generator is trucked in, and it is delicately picked up, moved, lined up and lowered onto its base. The last step you see is the generator housing being installed over the generator to protect it from the elements. At this point, the actual "installation" is far from over — we need to hook everything up and test it — but those steps don't involve the nostalgia-inducing heavy machinery you probably came to this post to see:

When we talk about the "megawatt" capacity of a generator, we're talking about the bandwidth of power available for use when the generator is operating at full capacity. One megawatt is one million watts, so a two-megawatts generator could power 20,000 100-watt light bulbs at the same time. This power can be sustained for as long as the generator has fuel, and we have service level agreements to keep us at the front of the line to get more fuel when we need it. Here are a few other interesting use-cases that could be powered by a two-megawatt generator:

  • 1,000 Average Homes During Mild Weather
  • 400 Homes During Extreme Weather
  • 20 Fast Food Restaurants
  • 3 Large Retail Stores
  • 2.5 Grocery Stores
  • A SoftLayer Data Center Pod Full of Servers (Most Important Example!)

Every SoftLayer facility has an n+1 power architecture. If we need three generators to provide power for three data center pods in one location, we'll install four. This additional capacity allows us to balance the load on generators when they're in use, and we can take individual generators offline for maintenance without jeopardizing our ability to support the power load for all of the facility's data center pods.

Those of you who are in the fondly remember Tonka trucks and CAT crane toys are the true target audience for this post, but even if you weren't big into construction toys when you were growing up, you'll probably still appreciate the work we put into safeguarding our facilities from a power perspective. You don't often see the "outside the data center" work that goes into putting a new SoftLayer data center pod online, so I thought it'd give you a glimpse. Are there an topics from an operations or facilities perspectives that you also want to see?

-Robert

December 20, 2012

MongoDB Performance Analysis: Bare Metal v. Virtual

Developers can be cynical. When "the next great thing in technology" is announced, I usually wait to see how it performs before I get too excited about it ... Show me how that "next great thing" compares apples-to-apples with the competition, and you'll get my attention. With the launch of MongoDB at SoftLayer, I'd guess a lot of developers outside of SoftLayer and 10gen have the same "wait and see" attitude about the new platform, so I put our new MongoDB engineered servers to the test.

When I shared MongoDB architectural best practices, I referenced a few of the significant optimizations our team worked with 10gen to incorporate into our engineered servers (cheat sheet). To illustrate the impact of these changes in MongoDB performance, we ran 10gen's recommended benchmarking harness (freely available for download and testing of your own environment) on our three tiers of engineered servers alongside equivalent shared virtual environments commonly deployed by the MongoDB community. We've made a pretty big deal about the performance impact of running MongoDB on optimized bare metal infrastructure, so it's time to put our money where our mouth is.

The Testing Environment

For each of the available SoftLayer MongoDB engineered servers, data sets of 512kb documents were preloaded onto single MongoDB instances. The data sets were created with varying size compared to available memory to allow for data sets that were both larger (2X) and smaller than available memory. Each test also ensured that the data set was altered during the test run frequently enough to prevent the queries from caching all of the data into memory.

Once the data sets were created, JMeter server instances with 4 cores and 16GB of RAM were used to drive 'benchrun' from the 10gen benchmarking harness. This diagram illustrates how we set up the testing environment (click for a better look):

MongoDB Performance Analysis Setup

These Jmeter servers function as the clients generating traffic on the MongoDB instances. Each client generated random query and update requests with a ratio of six queries per update (The update requests in the test were to ensure that data was not allowed to fully cache into memory and never exercise reads from disk). These tests were designed to create an extreme load on the servers from an exponentially increasing number of clients until the system resources became saturated, and we recorded the resulting performance of the MongoDB application.

At the Medium (MD) and Large (LG) engineered server tiers, performance metrics were run separately for servers using 15K SAS hard drive data mounts and servers using SSD hard drive data mounts. If you missed the post comparing the IOPS statistics between different engineered server hard drive configurations, be sure to check it out. For a better view of the results in a given graph, click the image included in the results below to see a larger version.

Test Case 1: Small MongoDB Engineered Servers vs Shared Virtual Instance

Servers

Small (SM) MongoDB Engineered Server
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
2 x 500GB SATAII - RAID1
1Gb Network
Virtual Provider Instance
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
2 x 500GB Network Storage - RAID1
1Gb Network
 

Tests Performed

Small Data Set (8GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 32
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Test Case 2: Medium MongoDB Engineered Servers vs Shared Virtual Instance

Servers (15K SAS Data Mount Comparison)

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
4 x 300GB 15K SAS - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage - RAID1 (Journal Mount)
4 x 300GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Servers (SSD Data Mount Comparison)

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
4 x 400GB SSD - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage - RAID1 (Journal Mount)
4 x 300GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Test Case 3: Large MongoDB Engineered Servers vs Shared Virtual Instance

Servers (15K SAS Data Mount Comparison)

Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
6 x 600GB 15K SAS - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage - RAID1 (Journal Mount)
6 x 600GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Servers (SSD Data Mount Comparison)

Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
6 x 400GB SSD - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage - RAID1 (Journal Mount)
6 x 600GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned over 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Impressions from Performance Testing

The results speak for themselves. Running a Mongo DB big data solution on a shared virtual environment has significant drawbacks when compared to running MongoDB on a single-tenant bare metal offering. Disk I/O is by far the most limiting resource for MongoDB, and relying on shared network-attached storage (with much lower disk I/O) makes this limitation very apparent. Beyond the average and peak statistics above, performance varied much more significantly in the virtual instance environment, so it's not as consistent and predictable as a bare metal.

Highlights:

  • When a working data set is smaller than available memory, query performance increases.
  • The number of clients performing queries has an impact on query performance because more data is being actively cached at a rapid rate.
  • The addition of a separate Journal Mount volume significantly improves performance. Because the Small (SM) engineered server does not include a secondary mount for Journals, whenever MongoDB began to journal, the disk I/O associated with journalling was disruptive to the query and update operations performed on the Data Mount.
  • The best deployments in terms of operations per second, stability and control were the configurations with a RAID10 SSD Data Mount and a RAID1 SSD Journal Mount. These configurations are available in both our Medium and Large offerings, and I'd highly recommend them.

-Harold

December 17, 2012

Big Data at SoftLayer: The Importance of IOPS

The jet flow gates in the Hoover Dam can release up to 73,000 cubic feet — the equivalent of 546,040 gallons — of water per second at 120 miles per hour. Imagine replacing those jet flow gates with a single garden hose that pushes 25 gallons per minute (or 0.42 gallons per second). Things would get ugly pretty quickly. In the same way, a massive "big data" infrastructure can be crippled by insufficient IOPS.

IOPS — Input/Output Operations Per Second — measure computer storage in terms of the number of read and write operations it can perform in a second. IOPS are a primary concern for database environments where content is being written and queried constantly, and when we take those database environments to the extreme (big data), the importance of IOPS can't be overstated: If you aren't able perform database reads and writes quickly in a big data environment, it doesn't matter how many gigabytes, terabytes or petabytes you have in your database ... You won't be able to efficiently access, add to or modify your data set.

As we worked with 10gen to create, test and tweak SoftLayer's MongoDB engineered servers, our primary focus centered on performance. Since the performance of massively scalable databases is dictated by the read and write operations to that database's data set, we invested significant resources into maximizing the IOPS for each engineered server ... And that involved a lot more than just swapping hard drives out of servers until we found a configuration that worked best. Yes, "Disk I/O" — the amount of input/output operations a given disk can perform — plays a significant role in big data IOPS, but many other factors limit big data performance. How is performance impacted by network-attached storage? At what point will a given CPU become a bottleneck? How much RAM should included in a base configuration to accommodate the load we expect our users to put on each tier of server? Are there operating system changes that can optimize the performance of a platform like MongoDB?

The resulting engineered servers are a testament to the blood, sweat and tears that were shed in the name of creating a reliable, high-performance big data environment. And I can prove it.

Most shared virtual instances — the scalable infrastructure many users employ for big data — use network-attached storage for their platform's storage. When data has to be queried over a network connection (rather than from a local disk), you introduce latency and more "moving parts" that have to work together. Disk I/O might be amazing on the enterprise SAN where your data lives, but because that data is not stored on-server with your processor or memory resources, performance can sporadically go from "Amazing" to "I Hate My Life" depending on network traffic. When I've tested the IOPS for network-attached storage from a large competitor's virtual instances, I saw an average of around 400 IOPS per mount. It's difficult to say whether that's "not good enough" because every application will have different needs in terms of concurrent reads and writes, but it certainly could be better. We performed some internal testing of the IOPS for the hard drive configurations in our Medium and Large MongoDB engineered servers to give you an apples-to-apples comparison.

Before we get into the tests, here are the specs for the servers we're using:

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
CentOS 6 64-bit
36GB RAM
1Gb Network - Bonded
Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
CentOS 6 64-bit
128GB RAM
1Gb Network - Bonded
 

The numbers shown in the table below reflect the average number of IOPS we recorded with a 100% random read/write workload on each of these engineered servers. To measure these IOPS, we used a tool called fio with an 8k block size and iodepth at 128. Remembering that the virtual instance using network-attached storage was able to get 400 IOPS per mount, let's look at how our "base" configurations perform:

Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 2937
Random Write IOPS - /var/lib/mongo/logs 1306
Random Read IOPS - /var/lib/mongo/data 1720
Random Write IOPS - /var/lib/mongo/data 772
Random Read IOPS - /var/lib/mongo/data/journal 19659
Random Write IOPS - /var/lib/mongo/data/journal 8869
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 30269
Random Write IOPS - /var/lib/mongo/logs 13124
Random Read IOPS - /var/lib/mongo/data 33757
Random Write IOPS - /var/lib/mongo/data 14168
Random Read IOPS - /var/lib/mongo/data/journal 19644
Random Write IOPS - /var/lib/mongo/data/journal 8882
   
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 4820
Random Write IOPS - /var/lib/mongo/logs 2080
Random Read IOPS - /var/lib/mongo/data 2461
Random Write IOPS - /var/lib/mongo/data 1099
Random Read IOPS - /var/lib/mongo/data/journal 19639
Random Write IOPS - /var/lib/mongo/data/journal 8772
 
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32403
Random Write IOPS - /var/lib/mongo/logs 13928
Random Read IOPS - /var/lib/mongo/data 34536
Random Write IOPS - /var/lib/mongo/data 15412
Random Read IOPS - /var/lib/mongo/data/journal 19578
Random Write IOPS - /var/lib/mongo/data/journal 8835

Clearly, the 400 IOPS per mount results you'd see in SAN-based storage can't hold a candle to the performance of a physical disk, regardless of whether it's SAS or SSD. As you'd expect, the "Journal" reads and writes have roughly the same IOPS between all of the configurations because all four configurations use 2 x 64GB SSD drives in RAID1. In both configurations, SSD drives provide better Data mount read/write performance than the 15K SAS drives, and the results suggest that having more physical drives in a Data mount will provide higher average IOPS. To put that observation to the test, I maxed out the number of hard drives in both configurations (10 in the 2U MD server and 34 in the 4U LG server) and recorded the results:

Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 7175
Random Write IOPS - /var/lib/mongo/logs 3481
Random Read IOPS - /var/lib/mongo/data 6468
Random Write IOPS - /var/lib/mongo/data 1763
Random Read IOPS - /var/lib/mongo/data/journal 18383
Random Write IOPS - /var/lib/mongo/data/journal 8765
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32160
Random Write IOPS - /var/lib/mongo/logs 12181
Random Read IOPS - /var/lib/mongo/data 34642
Random Write IOPS - /var/lib/mongo/data 14545
Random Read IOPS - /var/lib/mongo/data/journal 19699
Random Write IOPS - /var/lib/mongo/data/journal 8764
   
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 17566
Random Write IOPS - /var/lib/mongo/logs 11918
Random Read IOPS - /var/lib/mongo/data 9978
Random Write IOPS - /var/lib/mongo/data 6526
Random Read IOPS - /var/lib/mongo/data/journal 18522
Random Write IOPS - /var/lib/mongo/data/journal 8722
 
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 34220
Random Write IOPS - /var/lib/mongo/logs 15388
Random Read IOPS - /var/lib/mongo/data 35998
Random Write IOPS - /var/lib/mongo/data 17120
Random Read IOPS - /var/lib/mongo/data/journal 17998
Random Write IOPS - /var/lib/mongo/data/journal 8822

It should come as no surprise that by adding more drives into the configuration, we get better IOPS, but you might be wondering why the results aren't "betterer" when it comes to the IOPS in the SSD drive configurations. While the IOPS numbers improve going from four to ten drives in the medium engineered server and six to thirty-four drives in the large engineered server, they don't increase as significantly as the IOPS differences in the SAS drives. This is what I meant when I explained that several factors contribute to and potentially limit IOPS performance. In this case, the limiting factor throttling the (ridiculously high) IOPS is the RAID card we are using in the servers. We've been working with our RAID card vendor to test a new card that will open a little more headroom for SSD IOPS, but that replacement card doesn't provide the consistency and reliability we need for these servers (which is just as important as speed).

There are probably a dozen other observations I could point out about how each result compares with the others (and why), but I'll stop here and open the floor for you. Do you notice anything interesting in the results? Does anything surprise you? What kind of IOPS performance have you seen from your server/cloud instance when running a tool like fio?

-Kelly

Subscribe to infrastructure