Posts Tagged 'Analysis'

July 16, 2013

Riak Performance Analysis: Bare Metal v. Virtual

In December, I posted a MongoDB performance analysis that showed the quantitative benefits of using bare metal servers for MongoDB workloads. It should come as no surprise that in the wake of SoftLayer's Riak launch, we've got some similar data to share about running Riak on bare metal.

To run this test, we started by creating five-node clusters with Riak 1.3.1 on SoftLayer bare metal servers and on a popular competitor's public cloud instances. For the SoftLayer environment, we created these clusters using the Riak Solution Designer, so the nodes were all provisioned, configured and clustered for us automatically when we ordered them. For the public cloud virtual instance Riak cluster, each node was provisioned indvidually using a Riak image template and manually configured into a cluster after all had come online. To optimize for Riak performance, I made a few tweaks at the OS level of our servers (running CentOS 64-bit):

Noatime
Nodiratime
barrier=0
data=writeback
ulimit -n 65536

The common Noatime and Nodiratime settings eliminate the need for writes during reads to help performance and disk wear. The barrier and writeback settings are a little less common and may not be what you'd normally set. Although those settings present a very slight risk for loss of data on disk failure, remember that the Riak solution is deployed in five-node rings with data redundantly available across multiple nodes in the ring. With that in mind and considering each node also being deployed with a RAID10 storage array, you can see that the minor risk for data loss on the failure of a single disk in the entire solution would have no impact on the entire data set (as there are plenty of redundant copies for that data available). Given the minor risk involved, the performance increases of those two settings justify their use.

With all of the nodes tweaked and configured into clusters, we set up Basho's test harness — Basho Bench — to remotely simulate load on the deployments. Basho Bench allows you to create a configurable test plan for a Riak cluster by configuring a number of workers to utilize a driver type to generate load. It comes packaged as an Erlang application with a config file example that you can alter to create the specifics for the concurrency, data set size, and duration of your tests. The results can be viewed as CSV data, and there is an optional graphics package that allows you to generate the graphs that I am posting in this blog. A simplified graphic of our test environment would look like this:

Riak Test Environment

The following Basho Bench config is what we used for our testing:

{mode, max}.
{duration, 120}.
{concurrent, 8}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator,{int_to_bin,{uniform_int,1000000}}}.
{value_generator,{exponential_bin,4098,50000}}.
{riakc_pb_ips, [{10,60,68,9},{10,40,117,89},{10,80,64,4},{10,80,64,8},{10,60,68,7}]}.
{riakc_pb_replies, 2}.
{operations, [{get, 10},{put, 1}]}.

To spell it out a little simpler:

Tests Performed

Data Set: 400GB
10:1 Query-to-Update Operations
8 Concurrent Client Connections
Test Duration: 2 Hours

You may notice that in the test cases that use SoftLayer "Medium" Servers, the virtual provider nodes are running 26 virtual compute units against our dual proc hex-core servers (12 cores total). In testing with Riak, memory is important to the operations than CPU resources, so we provisioned the virtual instances to align with the 36GB of memory in each of the "Medium" SoftLayer servers. In the public cloud environment, the higher level of RAM was restricted to packages with higher CPU, so while the CPU counts differ, the RAM amounts are as close to even as we could make them.

One final "housekeeping" note before we dive into the results: The graphs below are pulled directly from the optional graphics package that displays Basho Bench results. You'll notice that the scale on the left-hand side of graphs differs dramatically between the two environments, so a cursory look at the results might not tell the whole story. Click any of the graphs below for a larger version. At the end of each test case, we'll share a few observations about the operations per second and latency results from each test. When we talk about latency in the "key observation" sections, we'll talk about the 99th percentile line — 99% of the results had latency below this line. More simply you could say, "This is the highest latency we saw on this platform in this test." The primary reason we're focusing on this line is because it's much easier to read on the graphs than the mean/median lines in the bottom graphs.

Riak Test 1: "Small" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Small Riak Server Node
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
4 x 500GB SATAII – RAID10
1Gb Bonded Network
Virtual Provider Node
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
4 x 500GB Network Storage – RAID10
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

The SoftLayer environment showed much more consistency in operations per second with an average throughput around 450 Op/sec. The virtual environment throughput varied significantly between about 50 operations per second to more than 600 operations per second with the trend line fluctuating slightly between about 220 Op/sec and 350 Op/sec.

Comparing the latency of get and put requests, the 99th percentile of results in the SoftLayer environment stayed around 50ms for gets and under 200ms for puts while the same metric for the virtual environment hovered around 800ms in gets and 4000ms in puts. The scale of the graphs is drastically different, so if you aren't looking closely, you don't see how significantly the performance varies between the two.

Riak Test 2: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 300GB 15K SAS – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

Similar to the results of Test 1, the throughput numbers from the bare metal environment are more consistent (and are consistently higher) than the throughput results from the virtual instance environment. The SoftLayer environment performed between 1500 and 1750 operations per second on average while the virtual provider environment averaged around 1200 operations per second throughout the test.

The latency of get and put requests in Test 2 also paints a similar picture to Test 1. The 99th percentile of results in the SoftLayer environment stayed below 50ms and under 400ms for puts while the same metric for the virtual environment averaged about 250ms in gets and over 1000ms in puts. Latency in a big data application can be a killer, so the results from the virtual provider might be setting off alarm bells in your head.

Riak Test 3: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 128GB SSD – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

In Test 3, we're using the same specs in our virtual provider nodes, so the results for the virtual node environment are the same in Test 3 as they are in Test 2. In this Test, the SoftLayer environment substitutes SSD hard drives for the 15K SAS drives used in Test 2, and the throughput numbers show the impact of that improved I/O. The average throughput of the bare metal environment with SSDs is between 1750 and 2000 operations per second. Those numbers are slightly higher than the SoftLayer environment in Test 2, further distancing the bare metal results from the virtual provider results.

The latency of gets for the SoftLayer environment is very difficult to see in this graph because the latency was so low throughout the test. The 99th percentile of puts in the SoftLayer environment settled between 500ms and 625ms, which was a little higher than the bare metal results from Test 2 but still well below the latency from the virtual environment.

Summary

The results show that — similar to the majority of data-centric applications that we have tested — Riak has more consistent, better performing, and lower latency results when deployed onto bare metal instead of a cluster of public cloud instances. The stark differences in consistency of the results and the latency are noteworthy for developers looking to host their big data applications. We compared the 99th percentile of latency, but the mean/median results are worth checking out as well. Look at the mean and median results from the SoftLayer SSD Node environment: For gets, the mean latency was 2.5ms and the median was somewhere around 1ms. For puts, the mean was between 7.5ms and 11ms and the median was around 5ms. Those kinds of results are almost unbelievable (and that's why I've shared everything involved in completing this test so that you can try it yourself and see that there's no funny business going on).

It's commonly understood that local single-tenant resources that bare metal will always perform better than network storage resources, but by putting some concrete numbers on paper, the difference in performance is pretty amazing. Virtualizing on multi-tenant solutions with network attached storage often introduces latency issues, and performance will vary significantly depending on host load. These results may seem obvious, but sometimes the promise of quick and easy deployments on public cloud environments can lure even the sanest and most rational developer. Some applications are suited for public cloud, but big data isn't one of them. But when you have data-centric apps that require extreme I/O traffic to your storage medium, nothing can beat local high performance resources.

-Harold

April 17, 2012

High Performance Computing for Everyone

This guest blog was submitted by Sumit Gupta, senior director of NVIDIA's Tesla High Performance Computing business.

The demand for greater levels of computational performance remains insatiable in the high performance computing (HPC) and technical computing industries, as researchers, geophysicists, biochemists, and financial quants continue to seek out and solve the world's most challenging computational problems.

However, access to high-powered HPC systems has been a constant problem. Researchers must compete for supercomputing time at popular open labs like Oak Ridge National Labs in Tennessee. And, small and medium-size businesses, even large companies, cannot afford to constantly build out larger computing infrastructures for their engineers.

Imagine the new discoveries that could happen if every researcher had access to an HPC system. Imagine how dramatically the quality and durability of products would improve if every engineer could simulate product designs 20, 50 or 100 more times.

This is where NVIDIA and SoftLayer come in. Together, we are bringing accessible and affordable HPC computing to a much broader universe of researchers, engineers and software developers from around the world.

GPUs: Accelerating Research

High-performance NVIDIA Tesla GPUs (graphics processing units) are quickly becoming the go-to solution for HPC users because of their ability to accelerate all types of commercial and scientific applications.

From the Beijing to Silicon Valley — and just about everywhere in between — GPUs are enabling breakthroughs and discoveries in biology, chemistry, genomics, geophysics, data analytics, finance, and many other fields. They are also driving computationally intensive applications, like data mining and numerical analysis, to much higher levels of performance — as much as 100x faster.

The GPU's "secret sauce" is its unique ability to provide power-efficient HPC performance while working in conjunction with a system's CPU. With this "hybrid architecture" approach, each processor is free to do what it does best: GPUs accelerate the parallel research application work, while CPUs process the sequential work.

The result is an often dramatic increase in application performance.

SoftLayer: Affordable, On-demand HPC for the Masses

Now, we're coupling GPUs with easy, real-time access to computing resources that don't break the bank. SoftLayer has created exactly that with a new GPU-accelerated hosted HPC solution. The service uses the same technology that powers some of the world's fastest HPC systems, including dual-processor Intel E5-2600 (Sandy Bridge) based servers with one or two NVIDIA Tesla M2090 GPUs:

NVIDIA Tesla

SoftLayer also offers an on-demand, consumption-based billing model that allows users to access HPC resources when and how they need to. And, because SoftLayer is managing the systems, users can keep their own IT costs in check.

You can get more system details and pricing information here: SoftLayer HPC Servers

I'm thrilled that we are able to bring the value of hybrid HPC computing to larger numbers of users. And, I can't wait to see the amazing engineering and scientific advances they'll achieve.

-Sumit Gupta, NVIDIA - Tesla

November 22, 2011

Semper Fi + Innovate or Die

How can I emphasize how cool my job is and how much I like it? I can't believe SoftLayer pays me to do what I love. I should really be paying tuition for the experience I'm gaining here (Note to the CFO: Let's forget the "I should be paying to work here" part when we go through my next annual review).

My name is Beau Carpenter and I'm writing my first blog for SoftLayer to introduce myself and share some of my background and experience to give you an idea of what life is like for someone in finance at a hosting company. In a nutshell, my mission with is to understand, organize and report every dollar that comes into and goes out of the company. These financial reports are reviewed internally, shared with our investors and used when we have a trigger event like the merger with The Planet last year.

To give you a little background about who I am, the most notable thing about me is that I'm a third generation Marine. My grandfather served in WWII, my father served in Vietnam, and I joined during the Gulf War, serving from 1991–1995. After completing my tour and receiving an honorable discharge, I returned home to Texas to get my education and start working ... while growing a family of four.

After I earned my bachelor's degree, I went to work at Rice University for Nobel Laureate Richard Smalley, winner of the 1996 Nobel Prize for discovering nanotechnology. Rick was a fantastic mentor, and when he recommend that I join Rice's MBA program, I thought it was a pretty good idea. It didn't hurt that his glowing recommendation gave me a great foot in the door to the program. I earned my MBA from Rice in May of 2005, and headed out into the corporate world ... If you can call SoftLayer, "corporate."

The majority of my coworkers probably have no idea what I do because I spend a lot time tucked away in my office running numbers. As you probably could have guessed, in financial analysis/reporting, strong numbers are a lot easier to report than bad ones, and SoftLayer's numbers have been so good that they keep me up at night. I know that sounds strange, but I'm up every Sunday night and month-end at midnight so I can communicate our company's progress for the past week or month as soon as it is over. Some may not find that late night work appealing, but being numbers jockey, I can't help but be excited about sharing the latest information ... even if it could technically wait until the next morning.

I've been in denial for a few years, but after rereading that last paragraph, I have to admit I'm officially a nerd now.

I've done financial and nonfinancial metrics analysis for a couple of companies before I landed at SoftLayer, and the difference between this company and others I've worked for is night and day. The culture here is healthy and positive, everyone's focused on their work, and the company provides a lot of perks to keep everyone going. Energy drinks, super-cool coffee machines, endless snacks ... but the most important perk is the general sense of camaraderie you get from being around a team of professionals who are passionate about their work.

Kevin asked me how I'd compare my experience at SoftLayer to my experience in the Marines, and I think the most resonating similarities are the shared sense of purpose and the close ties I have with my team.

Semper Fi + Innovate or Die.

-Beau

March 31, 2011

The Path to Hosting 19+ Million Domain Names

If you own a business, your goal is to be wildly successful. You might look at financial growth, operational efficiencies or customer satisfaction, but at the end of the day, you want to execute on your vision to continue it. With SoftLayer's management team, company culture, innovative platform and focus on the customer experience, we've managed to become a phenomenally successful and fast-growing company.

I run the Market Intelligence group at SoftLayer, and my team is responsible for reviewing success metrics internally and in comparison with many of our competitors. We have a wealth of data at our fingertips, and one of the most interesting statistics I track is related to market switching data.

Today, I was looking closely at some of our most recent domain name data, and I came across some pretty amazing information. We have millions of data points instantly available for filtering and sorting, so we can produce some pretty insightful market intelligence that can help us make better business and customer decisions.

While reviewing that domain name information, I did a quick pivot exercise in Excel to see the number of domain names hosted by SoftLayer - not just DNS hosted by us, but a pretty comprehensive view of the number of domains hosted on our infrastructure. As of March 1, 2011, we had 19,164,117 domains. Yes, you read that correctly: More than 19 million domains are hosted by SoftLayer. To give that a little context, the total domain name pool was 282,602,796, so we hosts about 6.78% of all domain names on the Internet.

That's impressive, but it's not the end of the story.

The number of net new domains coming to SoftLayer on a monthly basis is even more remarkable ... From October 2010 to March 2011 - a 6 month snapshot - the total number of domains hosted on SoftLayer infrastructure had compounded growth of 124%:

Domain Growth

What will the next six months hold? You can bet I'll be refreshing the data to keep an eye on it. Without extrapolating much other information, I'd say that the growth numbers are astounding and they're indicative of an unwavering confidence from our customers.

-Todd

Categories: 
July 15, 2009

Subjecting Subjectivity To Math

I recently read an article about an endeavor that is currently being undertaken to develop a “Speech Analysis Algorithm Crafted to Detect and Help Dissatisfied Customers”. In short, a team of engineers are hoping to create software that will recognize when a caller is becoming stressed and immediately phone a manager to alert them of a developing situation. Wow! It is rare that you would see math and science applied to something that is so subjective. After all, math is used to quantify and measure things all based on a known or a baseline. In this particular effort, I would surmise that the team of engineer’s most difficult task will be to determine how to establish a unique baseline for each unique call and caller. Once upon a time as a student of Electrical Engineering, I took on my share of convolution integrals and that’s a path that I do not care to venture down again. I’ve also taken on my share of convoluted customer calls in a past life and witnessed our frontline assisting customers in complex situations here at SoftLayer.

Until there is such an application that can detect and address a conversation that may be heading in the wrong direction, we have to rely on good ole’ training and experience. With each call and query, the baseline is reset. I’d even go further to say that with each exchange; the baseline is reset as our Customer Service Agents seek information to get to the root of the issue. It’s not hard to imagine the frustration that can build in a back-and-forth conversation as two people look to come to a solution or an amiable conclusion just as it is understandable that sometimes, a customer may simply need to vent. How do you calculate and anticipate those scenarios?

I wish much success to the team involved in the customer service speech analysis program. And programmatically speaking, I see many CASE, SWITCH, FOR, WHILE, BREAK, CONTINUE, IF, ELSE, ELSE IF, NEXT statements in your future. Good Luck!

June 24, 2009

Clouds and Elephants

So there I was after work today, sitting in my favorite watering hole drinking my Jagerbomb, when Caira, my bartender asked what was on my mind. I told her that I had been working with clouds and elephants all day at work and neither of those things are little. She laughed and asked if I had stopped anywhere to get a drink prior to her bar. I replied no, I'm serious I had to make some large clouds and a stampede of elephants work together. I then explained to her what Hadoop was. Hadoop is a popular open source implementation of Google's MapReduce. It allows transformation and extensive analysis of large data sets using thousands of nodes while processing peta-bytes of data. It is used by websites such as Yahoo!, Facebook, Google, and China's best search engine Baidu. I explained to her what cloud computing was (multiple computing nodes working together) hence my reference to the clouds, and how Hadoop was named after the stuffed elephant that belonged to one of the founders - Doug Cutting - child. Now she doesn't think I am as crazy.

Categories: 
Subscribe to analysis