Riak Performance Analysis: Bare Metal v. Virtual

July 16, 2013

In December, I posted a MongoDB performance analysis that showed the quantitative benefits of using bare metal servers for MongoDB workloads. It should come as no surprise that in the wake of SoftLayer's Riak launch, we've got some similar data to share about running Riak on bare metal.

To run this test, we started by creating five-node clusters with Riak 1.3.1 on SoftLayer bare metal servers and on a popular competitor's public cloud instances. For the SoftLayer environment, we created these clusters using the Riak Solution Designer, so the nodes were all provisioned, configured and clustered for us automatically when we ordered them. For the public cloud virtual instance Riak cluster, each node was provisioned indvidually using a Riak image template and manually configured into a cluster after all had come online. To optimize for Riak performance, I made a few tweaks at the OS level of our servers (running CentOS 64-bit):

Noatime
Nodiratime
barrier=0
data=writeback
ulimit -n 65536

The common Noatime and Nodiratime settings eliminate the need for writes during reads to help performance and disk wear. The barrier and writeback settings are a little less common and may not be what you'd normally set. Although those settings present a very slight risk for loss of data on disk failure, remember that the Riak solution is deployed in five-node rings with data redundantly available across multiple nodes in the ring. With that in mind and considering each node also being deployed with a RAID10 storage array, you can see that the minor risk for data loss on the failure of a single disk in the entire solution would have no impact on the entire data set (as there are plenty of redundant copies for that data available). Given the minor risk involved, the performance increases of those two settings justify their use.

With all of the nodes tweaked and configured into clusters, we set up Basho's test harness — Basho Bench — to remotely simulate load on the deployments. Basho Bench allows you to create a configurable test plan for a Riak cluster by configuring a number of workers to utilize a driver type to generate load. It comes packaged as an Erlang application with a config file example that you can alter to create the specifics for the concurrency, data set size, and duration of your tests. The results can be viewed as CSV data, and there is an optional graphics package that allows you to generate the graphs that I am posting in this blog. A simplified graphic of our test environment would look like this:

Riak Test Environment

The following Basho Bench config is what we used for our testing:

{mode, max}.
{duration, 120}.
{concurrent, 8}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator,{int_to_bin,{uniform_int,1000000}}}.
{value_generator,{exponential_bin,4098,50000}}.
{riakc_pb_ips, [{10,60,68,9},{10,40,117,89},{10,80,64,4},{10,80,64,8},{10,60,68,7}]}.
{riakc_pb_replies, 2}.
{operations, [{get, 10},{put, 1}]}.

To spell it out a little simpler:

Tests Performed

Data Set: 400GB
10:1 Query-to-Update Operations
8 Concurrent Client Connections
Test Duration: 2 Hours

You may notice that in the test cases that use SoftLayer "Medium" Servers, the virtual provider nodes are running 26 virtual compute units against our dual proc hex-core servers (12 cores total). In testing with Riak, memory is important to the operations than CPU resources, so we provisioned the virtual instances to align with the 36GB of memory in each of the "Medium" SoftLayer servers. In the public cloud environment, the higher level of RAM was restricted to packages with higher CPU, so while the CPU counts differ, the RAM amounts are as close to even as we could make them.

One final "housekeeping" note before we dive into the results: The graphs below are pulled directly from the optional graphics package that displays Basho Bench results. You'll notice that the scale on the left-hand side of graphs differs dramatically between the two environments, so a cursory look at the results might not tell the whole story. Click any of the graphs below for a larger version. At the end of each test case, we'll share a few observations about the operations per second and latency results from each test. When we talk about latency in the "key observation" sections, we'll talk about the 99th percentile line — 99% of the results had latency below this line. More simply you could say, "This is the highest latency we saw on this platform in this test." The primary reason we're focusing on this line is because it's much easier to read on the graphs than the mean/median lines in the bottom graphs.

Riak Test 1: "Small" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Small Riak Server Node
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
4 x 500GB SATAII – RAID10
1Gb Bonded Network
Virtual Provider Node
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
4 x 500GB Network Storage – RAID10
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

The SoftLayer environment showed much more consistency in operations per second with an average throughput around 450 Op/sec. The virtual environment throughput varied significantly between about 50 operations per second to more than 600 operations per second with the trend line fluctuating slightly between about 220 Op/sec and 350 Op/sec.

Comparing the latency of get and put requests, the 99th percentile of results in the SoftLayer environment stayed around 50ms for gets and under 200ms for puts while the same metric for the virtual environment hovered around 800ms in gets and 4000ms in puts. The scale of the graphs is drastically different, so if you aren't looking closely, you don't see how significantly the performance varies between the two.

Riak Test 2: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 300GB 15K SAS – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

Similar to the results of Test 1, the throughput numbers from the bare metal environment are more consistent (and are consistently higher) than the throughput results from the virtual instance environment. The SoftLayer environment performed between 1500 and 1750 operations per second on average while the virtual provider environment averaged around 1200 operations per second throughout the test.

The latency of get and put requests in Test 2 also paints a similar picture to Test 1. The 99th percentile of results in the SoftLayer environment stayed below 50ms and under 400ms for puts while the same metric for the virtual environment averaged about 250ms in gets and over 1000ms in puts. Latency in a big data application can be a killer, so the results from the virtual provider might be setting off alarm bells in your head.

Riak Test 3: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 128GB SSD – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

In Test 3, we're using the same specs in our virtual provider nodes, so the results for the virtual node environment are the same in Test 3 as they are in Test 2. In this Test, the SoftLayer environment substitutes SSD hard drives for the 15K SAS drives used in Test 2, and the throughput numbers show the impact of that improved I/O. The average throughput of the bare metal environment with SSDs is between 1750 and 2000 operations per second. Those numbers are slightly higher than the SoftLayer environment in Test 2, further distancing the bare metal results from the virtual provider results.

The latency of gets for the SoftLayer environment is very difficult to see in this graph because the latency was so low throughout the test. The 99th percentile of puts in the SoftLayer environment settled between 500ms and 625ms, which was a little higher than the bare metal results from Test 2 but still well below the latency from the virtual environment.

Summary

The results show that — similar to the majority of data-centric applications that we have tested — Riak has more consistent, better performing, and lower latency results when deployed onto bare metal instead of a cluster of public cloud instances. The stark differences in consistency of the results and the latency are noteworthy for developers looking to host their big data applications. We compared the 99th percentile of latency, but the mean/median results are worth checking out as well. Look at the mean and median results from the SoftLayer SSD Node environment: For gets, the mean latency was 2.5ms and the median was somewhere around 1ms. For puts, the mean was between 7.5ms and 11ms and the median was around 5ms. Those kinds of results are almost unbelievable (and that's why I've shared everything involved in completing this test so that you can try it yourself and see that there's no funny business going on).

It's commonly understood that local single-tenant resources that bare metal will always perform better than network storage resources, but by putting some concrete numbers on paper, the difference in performance is pretty amazing. Virtualizing on multi-tenant solutions with network attached storage often introduces latency issues, and performance will vary significantly depending on host load. These results may seem obvious, but sometimes the promise of quick and easy deployments on public cloud environments can lure even the sanest and most rational developer. Some applications are suited for public cloud, but big data isn't one of them. But when you have data-centric apps that require extreme I/O traffic to your storage medium, nothing can beat local high performance resources.

-Harold