<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: MongoDB Performance Analysis: Bare Metal v. Virtual</title>
	<atom:link href="http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/</link>
	<description>A Behind the Scenes Look at the Best Hosting Provider in the World</description>
	<lastBuildDate>Tue, 18 Jun 2013 10:20:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
	<item>
		<title>By: Big Data at SoftLayer: Riak &#8211; SoftLayer Blog</title>
		<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/comment-page-1/#comment-47699</link>
		<dc:creator>Big Data at SoftLayer: Riak &#8211; SoftLayer Blog</dc:creator>
		<pubDate>Tue, 30 Apr 2013 15:15:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softlayer.com/?p=10070#comment-47699</guid>
		<description>[...] environment on bare metal infrastructure, we made life a lot easier for developers who demanded performance and on-demand scalability for their big data applications, and it&#8217;s clear that our simple [...]</description>
		<content:encoded><![CDATA[<p>[...] environment on bare metal infrastructure, we made life a lot easier for developers who demanded performance and on-demand scalability for their big data applications, and it&#8217;s clear that our simple [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harold Hannon</title>
		<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/comment-page-1/#comment-45550</link>
		<dc:creator>Harold Hannon</dc:creator>
		<pubDate>Fri, 21 Dec 2012 16:09:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softlayer.com/?p=10070#comment-45550</guid>
		<description>@David, The virtual instance provider did not offer instances that were an exact match to the offerings we have, so we did our best to find a match with the competitor&#039;s offering (which in some cases they just didn&#039;t have enough ram). In reality since there are blended update, insert, and queries occurring in the test, there is very little opportunity for MongoDB to cache the records for query since the data set is changing rapidly. This was done to help prevent MongoDB from running exclusively in memory.

By far when running &#039;top&#039;, iostat, and mongostat during these tests you could observe that the limiting factor on both platforms ended up being storage I/O. If the tests performed were query-only centric, then you are 100% correct that these tests would have been heavily skewed and the data fairly worthless. .5gb and 2gb differences in memory size given that overall data set size and the use of updating records while querying did not have a significant impact on these tests. Even given the larger difference in the final offering (once again the virtual provider just didn&#039;t have an offering that matched) doesn&#039;t indicate that the RAM had as much of an impact. If it had, it would have shown as a large skewing to the query results only which doesn&#039;t appear to be the case in the numbers. Once again, this is because care was taken to avoid allowing MongoDB to cache too much data into memory.

I hope that helps, it was a little challenging to line up the two platforms.</description>
		<content:encoded><![CDATA[<p>@David, The virtual instance provider did not offer instances that were an exact match to the offerings we have, so we did our best to find a match with the competitor&#8217;s offering (which in some cases they just didn&#8217;t have enough ram). In reality since there are blended update, insert, and queries occurring in the test, there is very little opportunity for MongoDB to cache the records for query since the data set is changing rapidly. This was done to help prevent MongoDB from running exclusively in memory.</p>
<p>By far when running &#8216;top&#8217;, iostat, and mongostat during these tests you could observe that the limiting factor on both platforms ended up being storage I/O. If the tests performed were query-only centric, then you are 100% correct that these tests would have been heavily skewed and the data fairly worthless. .5gb and 2gb differences in memory size given that overall data set size and the use of updating records while querying did not have a significant impact on these tests. Even given the larger difference in the final offering (once again the virtual provider just didn&#8217;t have an offering that matched) doesn&#8217;t indicate that the RAM had as much of an impact. If it had, it would have shown as a large skewing to the query results only which doesn&#8217;t appear to be the case in the numbers. Once again, this is because care was taken to avoid allowing MongoDB to cache too much data into memory.</p>
<p>I hope that helps, it was a little challenging to line up the two platforms.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lennie</title>
		<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/comment-page-1/#comment-45545</link>
		<dc:creator>Lennie</dc:creator>
		<pubDate>Fri, 21 Dec 2012 13:57:54 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softlayer.com/?p=10070#comment-45545</guid>
		<description>I work at a provider, we are thinking of deploying Ceph and maybe with a local cache on the virtualisation hosts.

You can obviously also create seperate groups in Ceph of different types of devices, like SSD- and HDD-backed group.

Ceph basically has a similair setup as Lustre (possibly Ceph has a superior design) and Lustre is in use for a number of top500 systems, so at least in theory it should scale pretty much as far as the budget can take you. As Ceph is fairly new, I&#039;m sure their are still some limits of course.</description>
		<content:encoded><![CDATA[<p>I work at a provider, we are thinking of deploying Ceph and maybe with a local cache on the virtualisation hosts.</p>
<p>You can obviously also create seperate groups in Ceph of different types of devices, like SSD- and HDD-backed group.</p>
<p>Ceph basically has a similair setup as Lustre (possibly Ceph has a superior design) and Lustre is in use for a number of top500 systems, so at least in theory it should scale pretty much as far as the budget can take you. As Ceph is fairly new, I&#8217;m sure their are still some limits of course.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Mytton</title>
		<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/comment-page-1/#comment-45544</link>
		<dc:creator>David Mytton</dc:creator>
		<pubDate>Fri, 21 Dec 2012 11:39:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softlayer.com/?p=10070#comment-45544</guid>
		<description>In every case the data set is too large to fit into memory on the virtual provider instance but there is more than enough RAM on the Softlayer server. Wouldn&#039;t this skew the results by causing page faults on the virtualised instance where there would be none on the Softlayer instance?</description>
		<content:encoded><![CDATA[<p>In every case the data set is too large to fit into memory on the virtual provider instance but there is more than enough RAM on the Softlayer server. Wouldn&#8217;t this skew the results by causing page faults on the virtualised instance where there would be none on the Softlayer instance?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harold Hannon</title>
		<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/comment-page-1/#comment-45526</link>
		<dc:creator>Harold Hannon</dc:creator>
		<pubDate>Thu, 20 Dec 2012 16:33:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softlayer.com/?p=10070#comment-45526</guid>
		<description>Good questions, Lennie!

The largest part of the performance hit comes from having a network hop to the network attached storage. I have done some personal testing on our CCI&#039;s with local storage and tests where the data was completely cached in memory on CCI&#039;s with remote storage. In both cases MongoDB performed well. Now to be fair, I didn&#039;t do a local storage test on the other vendor (it wasn&#039;t available), but given our own internal testing the data indicates that the majority of the performance hit is around the I/O to the storage. In addition to being a network hop, most network attached storage solutions don&#039;t have the ability to offer QOS per mount. This makes a shared environment with network storage not only less performant in most cases, it also makes the deviation in performance unpredictable.

The compute side of the equation really comes into play when dealing with data sets smaller than available memory where MongoDB can cache the entire working set. When this happens, the bus speed/clock speed of the compute resources becomes the limiting factor and is no longer the storage I/O. In these cases obviously if your provider can clearly tell you what kind of CPU architecture you are running on, you can make a sound decision on if the platform will meet your needs. Raw compute power only matters (we are talking cores and architecture now) when you are doing map-reduce aggregation operations. These seem to be the only thing thing that begins to utilize CPU over the other resources.

On the flash cache, I would love to find a provider that would allow such an option. You could always &quot;roll your own&quot; and just have a local mounted SSD available. We haven&#039;t tested any flash cache setups ourselves yet, but in theory that would help alleviate some of the I/O issues if your working set was smaller than the cache. I think if I had that option though, I would just mount the local SSD as the data drive itself and be done with it. In certain cases though where your total data set might be large but your working data set is smaller than the cache SSD drive it might be advisable to try out a flash cache setup and see if it helped. It would be fun to test it regardless!</description>
		<content:encoded><![CDATA[<p>Good questions, Lennie!</p>
<p>The largest part of the performance hit comes from having a network hop to the network attached storage. I have done some personal testing on our CCI&#8217;s with local storage and tests where the data was completely cached in memory on CCI&#8217;s with remote storage. In both cases MongoDB performed well. Now to be fair, I didn&#8217;t do a local storage test on the other vendor (it wasn&#8217;t available), but given our own internal testing the data indicates that the majority of the performance hit is around the I/O to the storage. In addition to being a network hop, most network attached storage solutions don&#8217;t have the ability to offer QOS per mount. This makes a shared environment with network storage not only less performant in most cases, it also makes the deviation in performance unpredictable.</p>
<p>The compute side of the equation really comes into play when dealing with data sets smaller than available memory where MongoDB can cache the entire working set. When this happens, the bus speed/clock speed of the compute resources becomes the limiting factor and is no longer the storage I/O. In these cases obviously if your provider can clearly tell you what kind of CPU architecture you are running on, you can make a sound decision on if the platform will meet your needs. Raw compute power only matters (we are talking cores and architecture now) when you are doing map-reduce aggregation operations. These seem to be the only thing thing that begins to utilize CPU over the other resources.</p>
<p>On the flash cache, I would love to find a provider that would allow such an option. You could always &#8220;roll your own&#8221; and just have a local mounted SSD available. We haven&#8217;t tested any flash cache setups ourselves yet, but in theory that would help alleviate some of the I/O issues if your working set was smaller than the cache. I think if I had that option though, I would just mount the local SSD as the data drive itself and be done with it. In certain cases though where your total data set might be large but your working data set is smaller than the cache SSD drive it might be advisable to try out a flash cache setup and see if it helped. It would be fun to test it regardless!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lennie</title>
		<link>http://blog.softlayer.com/2012/mongodb-performance-analysis-bare-metal-v-virtual/comment-page-1/#comment-45524</link>
		<dc:creator>Lennie</dc:creator>
		<pubDate>Thu, 20 Dec 2012 15:47:05 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softlayer.com/?p=10070#comment-45524</guid>
		<description>Thanks for the tests.

My first question would be: what about non-virtual with shared network-attached storage ?

How does that perform ? To compare what hit you take by using the virtualisation and if the performance variance comes from the virtualisation or the shared network-attached storage.

What if you add a SSD as cache at the host that does the virtualisation (think flashcache or bcache) ? Does that solve a large part of the problem ?</description>
		<content:encoded><![CDATA[<p>Thanks for the tests.</p>
<p>My first question would be: what about non-virtual with shared network-attached storage ?</p>
<p>How does that perform ? To compare what hit you take by using the virtualisation and if the performance variance comes from the virtualisation or the shared network-attached storage.</p>
<p>What if you add a SSD as cache at the host that does the virtualisation (think flashcache or bcache) ? Does that solve a large part of the problem ?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
