Posts Tagged 'Testing'

December 20, 2012

MongoDB Performance Analysis: Bare Metal v. Virtual

Developers can be cynical. When "the next great thing in technology" is announced, I usually wait to see how it performs before I get too excited about it ... Show me how that "next great thing" compares apples-to-apples with the competition, and you'll get my attention. With the launch of MongoDB at SoftLayer, I'd guess a lot of developers outside of SoftLayer and 10gen have the same "wait and see" attitude about the new platform, so I put our new MongoDB engineered servers to the test.

When I shared MongoDB architectural best practices, I referenced a few of the significant optimizations our team worked with 10gen to incorporate into our engineered servers (cheat sheet). To illustrate the impact of these changes in MongoDB performance, we ran 10gen's recommended benchmarking harness (freely available for download and testing of your own environment) on our three tiers of engineered servers alongside equivalent shared virtual environments commonly deployed by the MongoDB community. We've made a pretty big deal about the performance impact of running MongoDB on optimized bare metal infrastructure, so it's time to put our money where our mouth is.

The Testing Environment

For each of the available SoftLayer MongoDB engineered servers, data sets of 512kb documents were preloaded onto single MongoDB instances. The data sets were created with varying size compared to available memory to allow for data sets that were both larger (2X) and smaller than available memory. Each test also ensured that the data set was altered during the test run frequently enough to prevent the queries from caching all of the data into memory.

Once the data sets were created, JMeter server instances with 4 cores and 16GB of RAM were used to drive 'benchrun' from the 10gen benchmarking harness. This diagram illustrates how we set up the testing environment (click for a better look):

MongoDB Performance Analysis Setup

These Jmeter servers function as the clients generating traffic on the MongoDB instances. Each client generated random query and update requests with a ratio of six queries per update (The update requests in the test were to ensure that data was not allowed to fully cache into memory and never exercise reads from disk). These tests were designed to create an extreme load on the servers from an exponentially increasing number of clients until the system resources became saturated, and we recorded the resulting performance of the MongoDB application.

At the Medium (MD) and Large (LG) engineered server tiers, performance metrics were run separately for servers using 15K SAS hard drive data mounts and servers using SSD hard drive data mounts. If you missed the post comparing the IOPS statistics between different engineered server hard drive configurations, be sure to check it out. For a better view of the results in a given graph, click the image included in the results below to see a larger version.

Test Case 1: Small MongoDB Engineered Servers vs Shared Virtual Instance

Servers

Small (SM) MongoDB Engineered Server
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
2 x 500GB SATAII - RAID1
1Gb Network
Virtual Provider Instance
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
2 x 500GB Network Storage - RAID1
1Gb Network
 

Tests Performed

Small Data Set (8GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 32
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Test Case 2: Medium MongoDB Engineered Servers vs Shared Virtual Instance

Servers (15K SAS Data Mount Comparison)

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
4 x 300GB 15K SAS - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage - RAID1 (Journal Mount)
4 x 300GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Servers (SSD Data Mount Comparison)

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
4 x 400GB SSD - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage - RAID1 (Journal Mount)
4 x 300GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Test Case 3: Large MongoDB Engineered Servers vs Shared Virtual Instance

Servers (15K SAS Data Mount Comparison)

Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
6 x 600GB 15K SAS - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage - RAID1 (Journal Mount)
6 x 600GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Servers (SSD Data Mount Comparison)

Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
6 x 400GB SSD - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage - RAID1 (Journal Mount)
6 x 600GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned over 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Impressions from Performance Testing

The results speak for themselves. Running a Mongo DB big data solution on a shared virtual environment has significant drawbacks when compared to running MongoDB on a single-tenant bare metal offering. Disk I/O is by far the most limiting resource for MongoDB, and relying on shared network-attached storage (with much lower disk I/O) makes this limitation very apparent. Beyond the average and peak statistics above, performance varied much more significantly in the virtual instance environment, so it's not as consistent and predictable as a bare metal.

Highlights:

  • When a working data set is smaller than available memory, query performance increases.
  • The number of clients performing queries has an impact on query performance because more data is being actively cached at a rapid rate.
  • The addition of a separate Journal Mount volume significantly improves performance. Because the Small (SM) engineered server does not include a secondary mount for Journals, whenever MongoDB began to journal, the disk I/O associated with journalling was disruptive to the query and update operations performed on the Data Mount.
  • The best deployments in terms of operations per second, stability and control were the configurations with a RAID10 SSD Data Mount and a RAID1 SSD Journal Mount. These configurations are available in both our Medium and Large offerings, and I'd highly recommend them.

-Harold

October 20, 2008

Can I Touch Your Meatball, Please?

A few years ago I injured my arm. I won’t go into details about the stupid things some of us do when we are off work, but the long and the short of it was that I ended up with a broken elbow. The surgery to repair the damage left me with a knot near my elbow. Hardly noticeable, in my opinion, but there if you know what you are looking for.

Not too long after the accident, my son, who was 5 going on 6, asked if he could have a friend spend the night. Sure. I picked the two of them up, loaded them in the back seat, and headed for my house. When we reached the first red light between the school and my house, I snatched my Diet Mountain Dew from the console and took a big swig. I felt a tap on my shoulder. I turned my head. It was my son’s friend.

“Mr. Francis,” he said shyly. I thought I knew what was coming. His mom had been very specific. No caffeine.

“Yes,” I replied quickly tilting the bottle to my lips operating on the premise the best defense was a good offense and if I just drained the soda entirely my problem would be solved.

“Can I touch your meatball, please?”

About then is when the carbonated soda came spewing forth from both nostrils.

“What?” I sputtered, my eyes watering and my nose burning. I checked the rearview mirror certain Chris Hansen from Dateline’s “To Catch a Predator” was going to smiling at me from the backseat along with an entire NBC camera crew. There was no Chris Hansen. Just my son and his school buddy giggling.

“Your meatball,” the kid said, pointing to the bump near my elbow. My own child nodded enthusiastically.

Ah, now I understood. There had simply been a miscommunication.

Certain the last thing I needed was some kid going home telling his parents Mr. Francis let him touch his meatball, I politely told him not only could he not touch my meatball but it would be best if we didn't talk about my meatball at all. Both boys seemed mildly disappointed but quickly got over it when I suggested we make a detour for the nearest McDonald’s.

Fast forward to a couple weeks ago when we had our monthly development meeting here in the SoftLayer headquarters facility. Our VP of Development, Matt Chilek, gave us a talk about the importance of clear and concise communications. Specifically error messages in the portal.

The SoftLayer customer portal is probably the most sophisticated tool of its kind for remote management of servers. So no matter how much testing we do internally, now and again an error will pop up. Sometimes, these errors are legitimate bugs. Other times, they are runtime issues, such as a temporary outage of a database or some support hardware. In either case, how we present the error to the customer is of the utmost importance.

I’ll give you an example. The first time I worked on the WSUS update page in the portal, if my application failed to get a response from the MS Windows Update Server I threw up an error message: “fatal error”. Which is accurate. Sort of. The error is fatal to the application at that particular time. But that doesn’t really give the customer or our datacenter technicians a lot to go on. A better error message is “No response from WSUS server @192.100.12.1. This server could be temporarily offline for maintenance or updates. Please try again in a few minutes. If the problem persists contact technical support.”

While both error messages alert us that something went wrong, the second lets us know what the error was. Exactly which hardware was the culprit. And that the issue might only be temporary so give it a few minutes before crying that the sky is falling. Clear. Concise. To the point. That is the only way to keep a tool as complex and feature rich as the SoftLayer portal from overwhelming our customers and employees alike.

So the SoftLayer development team is making a concerted effort to do just that. And we could really use the help of SoftLayer employees from other departments as well as our customers who use the portal on a regular basis, in pointing out any areas where the language used or information presented is not as clear as it could be. It only takes a minute to fill out a ticket with a note to the dev team, and, in the end, it is you who will benefit.

Alright, I suppose I should get back to writing code instead of writing about writing code. But first I think I’ll make a quick trip to the employee break room to grab some caffeine. And if by chance you run into me in the hallway, no you can’t touch my meatball—so don’t even ask.

-William

Categories: 
July 25, 2008

Thinkin' Like a Programmer

"I can't figure this out. My email client says I can't attach more than 10 M of data, but then it says I have 16501 K of data attached, and it can't send that. What's a 'M'? What's a 'K'? Why is the second number so big? I only attached a few files!"

I explained to my uncle that "M" stood for Megabyte, and "K" stood for Kilobyte. That a simple calculation to convert "K"s to "M"s was to take the last three digits off the "K"s and you had the size in "M"s, give or take one. That he had 16-17 Megabytes of data attached to his email, and he can only have 10.

His response was to wonder (1) why didn't the client just tell him he had "too much data," and (2) why did the program give him Ms AND Ks, instead of picking one?

My reply consisted of (1) it did, that's what the message said, and (2) because the programmer was thinking like a programmer.

See, my uncle is a very, very smart man. He worked in a video arcade as the guy who rewired the arcade machines when they exploded when somebody poured a Coke on them. He knew how the machines worked in and out. And got paid good money. When he moved back to Texas, he took up industrial and residential electric work, and is now a fully licensed foreman who's in high demand all through the area. When he says "I won't take a job that pays less than $20 a hour," it's not because he's picky, it's because he doesn't have to. Sharp as a tack. But he's not a computer pro. Not a problem, people can't be pros at everything. This ain't the 1700s, where you can pick up a test tube and learn everything known about chemistry in a few days.

But why would a programmer write a error message for an email program that would be unreadable to end users? Because it's perfectly readable to him! When my uncle read out the message, my first response was "You have about 7 Megabytes too many attachments. Send a second email."

Therefore, a programmer checking his work would think this was a great error message. Not only does it tell you that the email can't be sent, but it tells you why. The limit is in Megabytes, but email messages are typically sent in Kilobytes, so the data is already there. See how helpful I am! And the unit conversion between Ks and Ms are very easy; programmers do it 10 times a day and wouldn't even notice it.

That's why we have end user testing, to try to catch these things that programmers won't notice. It's just a simple conversion of units! But for an electrician trying to send an email, it was as opaque as to him as if he had told me that I had a single pole dual throw make-break when I need a dual pole single throw break-make. It makes perfect sense, if you're used to it. And if I think about it for a minute, I could figure it out most likely… but the point is, his error message is useless to me as it's formatted. But it makes perfect sense to him.

So, what's the moral of the story? Well, moral 1 is, try to be sure that all users of your product can understand what you say. We have an extensive testing process here at SoftLayer to make sure our data screens are usable without any confusion. Moral 2 is that programmers don't "actively" attempt to "keep people from using their computers" by "making their programs too complex." For us, it's completely transparent and useful, as useful to us as a circuit diagram is to an electrician. Just let us know if we make something a bit to opaque; it wasn't on purpose, and sometimes it's an easy fix. We were just thinking like programmers.

-Zoey

Categories: 
Subscribe to testing