Infrastructure Posts

December 17, 2012

Big Data at SoftLayer: The Importance of IOPS

The jet flow gates in the Hoover Dam can release up to 73,000 cubic feet — the equivalent of 546,040 gallons — of water per second at 120 miles per hour. Imagine replacing those jet flow gates with a single garden hose that pushes 25 gallons per minute (or 0.42 gallons per second). Things would get ugly pretty quickly. In the same way, a massive "big data" infrastructure can be crippled by insufficient IOPS.

IOPS — Input/Output Operations Per Second — measure computer storage in terms of the number of read and write operations it can perform in a second. IOPS are a primary concern for database environments where content is being written and queried constantly, and when we take those database environments to the extreme (big data), the importance of IOPS can't be overstated: If you aren't able perform database reads and writes quickly in a big data environment, it doesn't matter how many gigabytes, terabytes or petabytes you have in your database ... You won't be able to efficiently access, add to or modify your data set.

As we worked with 10gen to create, test and tweak SoftLayer's MongoDB engineered servers, our primary focus centered on performance. Since the performance of massively scalable databases is dictated by the read and write operations to that database's data set, we invested significant resources into maximizing the IOPS for each engineered server ... And that involved a lot more than just swapping hard drives out of servers until we found a configuration that worked best. Yes, "Disk I/O" — the amount of input/output operations a given disk can perform — plays a significant role in big data IOPS, but many other factors limit big data performance. How is performance impacted by network-attached storage? At what point will a given CPU become a bottleneck? How much RAM should included in a base configuration to accommodate the load we expect our users to put on each tier of server? Are there operating system changes that can optimize the performance of a platform like MongoDB?

The resulting engineered servers are a testament to the blood, sweat and tears that were shed in the name of creating a reliable, high-performance big data environment. And I can prove it.

Most shared virtual instances — the scalable infrastructure many users employ for big data — use network-attached storage for their platform's storage. When data has to be queried over a network connection (rather than from a local disk), you introduce latency and more "moving parts" that have to work together. Disk I/O might be amazing on the enterprise SAN where your data lives, but because that data is not stored on-server with your processor or memory resources, performance can sporadically go from "Amazing" to "I Hate My Life" depending on network traffic. When I've tested the IOPS for network-attached storage from a large competitor's virtual instances, I saw an average of around 400 IOPS per mount. It's difficult to say whether that's "not good enough" because every application will have different needs in terms of concurrent reads and writes, but it certainly could be better. We performed some internal testing of the IOPS for the hard drive configurations in our Medium and Large MongoDB engineered servers to give you an apples-to-apples comparison.

Before we get into the tests, here are the specs for the servers we're using:

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
CentOS 6 64-bit
1Gb Network - Bonded
Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
CentOS 6 64-bit
1Gb Network - Bonded

The numbers shown in the table below reflect the average number of IOPS we recorded with a 100% random read/write workload on each of these engineered servers. To measure these IOPS, we used a tool called fio with an 8k block size and iodepth at 128. Remembering that the virtual instance using network-attached storage was able to get 400 IOPS per mount, let's look at how our "base" configurations perform:

Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 2937
Random Write IOPS - /var/lib/mongo/logs 1306
Random Read IOPS - /var/lib/mongo/data 1720
Random Write IOPS - /var/lib/mongo/data 772
Random Read IOPS - /var/lib/mongo/data/journal 19659
Random Write IOPS - /var/lib/mongo/data/journal 8869
Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 30269
Random Write IOPS - /var/lib/mongo/logs 13124
Random Read IOPS - /var/lib/mongo/data 33757
Random Write IOPS - /var/lib/mongo/data 14168
Random Read IOPS - /var/lib/mongo/data/journal 19644
Random Write IOPS - /var/lib/mongo/data/journal 8882
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 4820
Random Write IOPS - /var/lib/mongo/logs 2080
Random Read IOPS - /var/lib/mongo/data 2461
Random Write IOPS - /var/lib/mongo/data 1099
Random Read IOPS - /var/lib/mongo/data/journal 19639
Random Write IOPS - /var/lib/mongo/data/journal 8772
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32403
Random Write IOPS - /var/lib/mongo/logs 13928
Random Read IOPS - /var/lib/mongo/data 34536
Random Write IOPS - /var/lib/mongo/data 15412
Random Read IOPS - /var/lib/mongo/data/journal 19578
Random Write IOPS - /var/lib/mongo/data/journal 8835

Clearly, the 400 IOPS per mount results you'd see in SAN-based storage can't hold a candle to the performance of a physical disk, regardless of whether it's SAS or SSD. As you'd expect, the "Journal" reads and writes have roughly the same IOPS between all of the configurations because all four configurations use 2 x 64GB SSD drives in RAID1. In both configurations, SSD drives provide better Data mount read/write performance than the 15K SAS drives, and the results suggest that having more physical drives in a Data mount will provide higher average IOPS. To put that observation to the test, I maxed out the number of hard drives in both configurations (10 in the 2U MD server and 34 in the 4U LG server) and recorded the results:

Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 7175
Random Write IOPS - /var/lib/mongo/logs 3481
Random Read IOPS - /var/lib/mongo/data 6468
Random Write IOPS - /var/lib/mongo/data 1763
Random Read IOPS - /var/lib/mongo/data/journal 18383
Random Write IOPS - /var/lib/mongo/data/journal 8765
Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32160
Random Write IOPS - /var/lib/mongo/logs 12181
Random Read IOPS - /var/lib/mongo/data 34642
Random Write IOPS - /var/lib/mongo/data 14545
Random Read IOPS - /var/lib/mongo/data/journal 19699
Random Write IOPS - /var/lib/mongo/data/journal 8764
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 17566
Random Write IOPS - /var/lib/mongo/logs 11918
Random Read IOPS - /var/lib/mongo/data 9978
Random Write IOPS - /var/lib/mongo/data 6526
Random Read IOPS - /var/lib/mongo/data/journal 18522
Random Write IOPS - /var/lib/mongo/data/journal 8722
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 34220
Random Write IOPS - /var/lib/mongo/logs 15388
Random Read IOPS - /var/lib/mongo/data 35998
Random Write IOPS - /var/lib/mongo/data 17120
Random Read IOPS - /var/lib/mongo/data/journal 17998
Random Write IOPS - /var/lib/mongo/data/journal 8822

It should come as no surprise that by adding more drives into the configuration, we get better IOPS, but you might be wondering why the results aren't "betterer" when it comes to the IOPS in the SSD drive configurations. While the IOPS numbers improve going from four to ten drives in the medium engineered server and six to thirty-four drives in the large engineered server, they don't increase as significantly as the IOPS differences in the SAS drives. This is what I meant when I explained that several factors contribute to and potentially limit IOPS performance. In this case, the limiting factor throttling the (ridiculously high) IOPS is the RAID card we are using in the servers. We've been working with our RAID card vendor to test a new card that will open a little more headroom for SSD IOPS, but that replacement card doesn't provide the consistency and reliability we need for these servers (which is just as important as speed).

There are probably a dozen other observations I could point out about how each result compares with the others (and why), but I'll stop here and open the floor for you. Do you notice anything interesting in the results? Does anything surprise you? What kind of IOPS performance have you seen from your server/cloud instance when running a tool like fio?


December 6, 2012

MongoDB: Architectural Best Practices

With the launch of our MongoDB solutions, developers can provision powerful, optimized, horizontally scaling NoSQL database clusters in real-time on bare metal infrastructure in SoftLayer data centers around the world. We worked tirelessly with our friends at 10gen — the creators of MongoDB — to build and tweak hardware and software configurations that enable peak MongoDB performance, and the resulting platform is pretty amazing. As Duke mentioned in his blog post, those efforts followed 10Gen's MongoDB best practices, but what he didn't mention was that we created some architectural best practices of our own for MongoDB in deployments on our platform.

The MongoDB engineered servers that you order from SoftLayer already implement several of the recommendations you'll see below, and I'll note which have been incorporated as we go through them. Given the scope of the topic, it's probably easiest to break down this guide into a few sections to make it a little more digestible. Let's take a look at the architectural best practices of running MongoDB through the phases of the roll-out process: Selecting a deployment strategy to prepare for your MongoDB installation, the installation itself, and the operational considerations of running it in production.

Deployment Strategy

When planning your MongoDB deployment, you should follow Sun Tzu's (modified) advice: "If you know the [friend] and know yourself, you need not fear the result of a hundred battles." "Friend" was substituted for the "enemy" in this advice because the other party is MongoDB. If you aren't familiar with MongoDB, the top of your to-do list should be to read MongoDB's official documentation. That information will give you the background you'll need as you build and use your database. When you feel comfortable with what MongoDB is all about, it's time to "know yourself."

Your most important consideration will be the current and anticipated sizes of your data set. Understanding the volume of data you'll need to accommodate will be the primary driver for your choice of individual physical nodes as well as your sharding plans. Once you've established an expected size of your data set, you need to consider the importance of your data and how tolerant you are of the possibility of lost or lagging data (especially in replicated scenarios). With this information in hand, you can plan and start testing your deployment strategy.

It sounds a little strange to hear that you should test a deployment strategy, but when it comes to big data, you want to make sure your databases start with a strong foundation. You should perform load testing scenarios on a potential deployment strategy to confirm that a given architecture will meet your needs, and there are a few specific areas that you should consider:

Memory Sizing
MongoDB (like many data-oriented applications) works best when the data set can reside in memory. Nothing performs better than a MongoDB instance that does not require disk I/O. Whenever possible, select a platform that has more available RAM than your working data set size. If your data set exceeds the available RAM for a single node, then consider using sharding to increase the amount of available RAM in a cluster to accommodate the larger data set. This will maximize the overall performance of your deployment. If you notice page faults when you put your database under production load, they may indicate that you are exceeding the available RAM in your deployment.

Disk Type
If speed is not your primary concern or if you have a data set that is far larger than any available in memory strategy can support, selecting the proper disk type for your deployment is important. IOPS will be key in selecting your disk type and obviously the higher the IOPS the better the performance of MongoDB. Local disks should be used whenever possible (as network storage can cause high latency and poor performance for your deployment). It's also advised that you use RAID 10 when creating disk arrays.

To give you an idea of what kind of IOPS to expect from a given type of drive, these are the approximate ranges of IOPS per drive in SoftLayer MongoDB engineered servers:

SATA II – 100-200 IOPS
15K SAS – 300-400 IOPS
SSD – 7,000-8,000 IOPS (read) 19,000-20,000 IOPS (write)

Clock speed and the amount of available processors becomes a consideration if you anticipate using MapReduce. It has also been noted that when running a MongoDB instance with the majority of the data in memory, clock speed can have a major impact on overall performance. If you are planning to use MapReduce or you're able to operate with a majority of your data in memory, consider a deployment strategy that includes a CPU with a high clock/bus speed to maximize your operations per second.

Replication provides high availability of your data if a node fails in your cluster. It should be standard to replicate with at least three nodes in any MongoDB deployment. The most common configuration for replication with three nodes is a 2x1 deployment — having two primary nodes in a single data center with a backup server in a secondary data center:

MongoDB Replication

If you anticipate a large, active data set, you should deploy a sharded MongoDB deployment. Sharding allows you to partition a single data set across multiple nodes. You can allow MongoDB to automatically distribute the data across nodes in the cluster or you may elect to define a shard key and create range-based sharding for that key.

Sharding may also help write performance, so you can also elect to shard even if your data set is small but requires a high amount of updates or inserts. It's important to note that when you deploy a sharded set, MongoDB will require three (and only three) config server instances which are specialized Mongo runtimes to track the current shard configuration. Loss of one of these nodes will cause the cluster to go into a read-only mode (for the configuration only) and will require that all nodes be brought back online before any configuration changes can be made.

Write Safety Mode
There are several write safety modes that govern how MongoDB will handle the persistence of the data to disk. It is important to consider which mode best fits your needs for both data integrity and performance. The following write safety modes are available:

None – This mode provides a deferred writing strategy that is non-blocking. This will allow for high performance, however there is a small opportunity in the case of a node failing that data can be lost. There is also the possibility that data written to one node in a cluster will not be immediately available on all nodes in that cluster for read consistency. The 'None' strategy will also not provide any sort of protection in the case of network failures. That lack of protection makes this mode highly unreliable and should only be used when performance is a priority and data integrity is not a concern.

Normal – This is the default for MongoDB if you do not select any other mode. It provides a deferred writing strategy that is non-blocking. This will allow for high performance, however there is a small opportunity in the case of a node failing that data can be lost. There is also the possibility that data written to one node in a cluster will not be immediately available on all nodes in that cluster for read consistency.

Safe – This mode will block until MongoDB has acknowledged that it has received the write request but will not block until the write is actually performed. This provides a better level of data integrity and will ensure that read consistency is achieved within a cluster.

Journal Safe – Journals provide a recovery option for MongoDB. Using this mode will ensure that the data has been acknowledged and a Journal update has been performed before returning.

Fsync - This mode provides the highest level of data integrity and blocks until a physical write of the data has occurred. This comes with a degradation in performance and should be used only if data integrity is the primary concern for your application.

Testing the Deployment
Once you've determined your deployment strategy, test it with a data set similar to your production data. 10gen has several tools to help you with load testing your deployment, and the console has a tool named 'benchrun' which can execute operations from within a JavaScript test harness. These tools will return operation information as well as latency numbers for each of those operations. If you require more detailed information about the MongoDB instance, consider using the mongostat command or MongoDB Monitoring Service (MMS) to monitor your deployment during the testing.


When performing the installation of MongoDB, a few considerations can help create both a stable and performance-oriented solution. 10gen recommends the use CentOS (64-bit) as the base operating system if at all possible. If you try installing MongoDB on a 32-bit operating system, you might run into file size limits that cause issues, and if you feel the urge to install it on Windows, you'll see performance issues if virtual memory begins to be utilized by the OS to make up for a lack of RAM in your deployment. As a result, 32-bit operating systems and Windows operating systems should be avoided on MongoDB servers. SoftLayer provisions CentOS 6.X 64-bit operating systems by default on all of our MongoDB engineered server deployments.

When you've got CentOS 64-bit installed, you should also make the following changes to maximize your performance (all of which are included by default on all SoftLayer engineered servers):

Set SSD Read Ahead Defaults to 16 Blocks - SSD drives have excellent seek times allowing for shrinking the Read Ahead to 16 blocks. Spinning disks might require slight buffering so these have been set to 32 blocks.

noatime - Adding the noatime option eliminates the need for the system to make writes to the file system for files which are simply being read — or in other words: Faster file access and less disk wear.

Turn NUMA Off in BIOS - Linux, NUMA and MongoDB tend not to work well together. If you are running MongoDB on NUMA hardware, we recommend turning it off (running with an interleave memory policy). If you don't, problems will manifest in strange ways like massive slow downs for periods of time or high system CPU time.

Set ulimit - We have set the ulimit to 64000 for open files and 32000 for user processes to prevent failures due to a loss of available file handles or user processes.

Use ext4 - We have selected ext4 over ext3. We found ext3 to be very slow in allocating files (or removing them). Additionally, access within large files is poor with ext3.

One last tip on installation: Make the Journal and Data volumes be distinct physical volumes. If the Journal and Data directories reside on a single physical volume, flushes to the Journal will interrupt the access of data and provide spikes of high latency within your MongoDB deployment.


Once a MongoDB deployment has been promoted to production, there are a few recommendations for monitoring and optimizing performance. You should always have the MMS agent running on all MongoDB instances to help monitor the health and performance of your deployment. Additionally, this tool is also very useful if you have 10gen MongoDB Cloud Subscriptions because it provides useful debugging data for the 10gen team during support interactions. In addition to MMS, you can use the mongostat command (mentioned in the deployment section) to see runtime information about the performance of a MongoDB node. If either of these tools flags performance issues, sharding or indexing are first-line options to resolve them:

Indexes - Indexes should be created for a MongoDB deployment if monitoring tools indicate that field based queries are performing poorly. Always use indexes when you are querying data based on distinct fields to help boost performance.

Sharding - Sharding can be leveraged when the overall performance of the node is suffering because of a large operating data set. Be sure to shard before you get in the red; the system only splits chunks for sharding on insert or update so if you wait too long to shard you may have some uneven distribution for a period of time or forever depending on your data set and sharding key strategy.

I know it seems like we've covered a lot over the course of this blog post, but this list of best practices is far from exhaustive. If you want to learn more, the MongoDB forums are a great resource to connect with the rest of the MongoDB community and learn from their experiences, and the documentation on MongoDB's site is another phenomenal resource. The best people to talk to when it comes to questions about MongoDB are the folks at 10gen, so I also highly recommend taking advantage of MongoDB Cloud Subscriptions to get their direct support for your one-off questions and issues.


December 5, 2012

Breaking Down 'Big Data' - Database Models

Forester defines big data as "techniques and technologies that make capturing value from data at an extreme scale economical." Gartner says, "Big data is the term adopted by the market to describe extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets." Big data demands extreme horizontal scale that traditional IT management can't handle, and it's not a challenge exclusive to the Facebooks, Twitters and Tumblrs of the world ... Just look at the Google search volume for "big data" over the past eight years:

Big Data Search Interest

Developers are collectively facing information overload. As storage has become more and more affordable, it's easier to justify collecting and saving more data. Users are more comfortable with creating and sharing content, and we're able to track, log and index metrics and activity that previously would have been deleted in consideration of space restraints or cost. As the information age progresses, we are collecting more and more data at an ever-accelerating pace, and we're sharing that data at an incredible rate.

To understand the different facets of this increased usage and demand, Gartner came up with the three V's of big data that vary significantly from traditional data requirements: Volume, Velocity and Variety. Larger, more abundant pieces of data ("Volume") are coming at a much faster speed ("Velocity") in formats like media and walls of text that don't easily fit into a column-and-row database structure ("Variety"). Given those equally important factors, many of the biggest players in the IT world have been hard at work to create solutions that provide the scale and speed developers need when they build social, analytics, gaming, financial or medical apps with large data sets.

When we talk about scaling databases here, we're talking about scaling horizontally across multiple servers rather than scaling vertically by upgrading a single server — adding more RAM, increasing HDD capacity, etc. It's important to make that distinction because it leads to a unique challenge shared by all distributed computer systems: The CAP Theorem. According to the CAP theorem, a distributed storage system must choose to sacrifice either consistency (that everyone sees the same data) or availability (that you can always read/write) while having partition tolerance (where the system continues to operate despite arbitrary message loss or failure of part of the system occurs).

Let's take a look at a few of the most common database models, what their strengths are, and how they handle the CAP theorem compromise of consistency v. availability:

Relational Databases

What They Do: Stores data in rows/columns. Parent-child records can be joined remotely on the server. Provides speed over scale. Some capacity for vertical scaling, poor capacity for horizontal scaling. This type of database is where most people start.
Horizontal Scaling: In a relational database system, horizontal scaling is possible via replication — dharing data between redundant nodes to ensure consistency — and some people have success sharding — horizontal partitioning of data — but those techniques add a lot of complexity.
CAP Balance: Prefer consistency over availability.
When to use: When you have highly structured data, and you know what you'll be storing. Great when production queries will be predictable.
Example Products: Oracle, SQLite, PostgreSQL, MySQL

Document-Oriented Databases

What They Do: Stores data in documents. Parent-child records can be stored in the same document and returned in a single fetch operation with no join. The server is aware of the fields stored within a document, can query on them, and return their properties selectively.
Horizontal Scaling: Horizontal scaling is provided via replication, or replication + sharding. Document-oriented databases also usually support relatively low-performance MapReduce for ad-hoc querying.
CAP Balance: Generally prefer consistency over availability
When to Use: When your concept of a "record" has relatively bounded growth, and can store all of its related properties in a single doc.
Example Products: MongoDB, CouchDB, BigCouch, Cloudant

Key-Value Stores

What They Do: Stores an arbitrary value at a key. Most can perform simple operations on a single value. Typically, each property of a record must be fetched in multiple trips, with Redis being an exception. Very simple, and very fast.
Horizontal Scaling: Horizontal scale is provided via sharding.
CAP Balance: Generally prefer consistency over availability.
When to Use: Very simple schemas, caching of upstream query results, or extreme speed scenarios (like real-time counters)
Example Products: CouchBase, Redis, PostgreSQL HStore, LevelDB

BigTable-Inspired Databases

What They Do: Data put into column-oriented stores inspired by Google's BigTable paper. It has tunable CAP parameters, and can be adjusted to prefer either consistency or availability. Both are sort of operationally intensive.
Horizontal Scaling: Good speed and very wide horizontal scale capabilities.
CAP Balance: Prefer consistency over availability
When to Use: When you need consistency and write performance that scales past the capabilities of a single machine. Hbase in particular has been used with around 1,000 nodes in production.
Example Products: Hbase, Cassandra (inspired by both BigTable and Dynamo)

Dynamo-Inspired Databases

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency,
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

Each of the database models has strengths and weaknesses, and there are huge communities that support each of the open source examples I gave in each model. If your database is a bottleneck or you're not getting the flexibility and scalability you need to handle your application's volume, velocity and variety of data, start looking at some of these "big data" solutions.

Tried any of the above models and have feedback that differs from ours? Leave a comment below and tell us about it!


December 4, 2012

Big Data at SoftLayer: MongoDB

In one day, Facebook's databases ingest more than 500 terabytes of data, Twitter processes 500 million Tweets and Tumblr users publish more than 75 million posts. With such an unprecedented volume of information, developers face significant challenges when it comes to building an application's architecture and choosing its infrastructure. As a result, demand has exploded for "big data" solutions — resources that make it possible to process, store, analyze, search and deliver data from large, complex data sets. In light of that demand, SoftLayer has been working in strategic partnership with 10gen — the creators of MongoDB — to develop a high-performance, on-demand, big data solution. Today, we're excited to announce the launch of specialized MongoDB servers at SoftLayer.

If you've configured an infrastructure to accommodate big data, you know how much of a pain it can be: You choose your hardware, you configure it to run NoSQL, you install an open source NoSQL project that you think will meet your needs, and you keep tweaking your environment to optimize its performance. Assuming you have the resources (and patience) to get everything running efficiently, you'll wind up with the horizontally scalable database infrastructure you need to handle the volume of content you and your users create and consume. SoftLayer and 10gen are making that process a whole lot easier.

Our new MongoDB solutions take the time and guesswork out of configuring a big data environment. We give you an easy-to-use system for designing and ordering everything you need. You can start with a single server or roll out multiple servers in a single replica set across multiple data centers, and in under two hours, an optimized MongoDB environment is provisioned and ready to be used. I stress that it's an "optimized" environment because that's been our key focus. We collaborated with 10gen engineers on hardware and software configurations that provide the most robust performance for MongoDB, and we incorporated many of their MongoDB best practices. The resulting "engineered servers" are big data powerhouses:

MongoDB Configs

From each engineered server base configuration, you can customize your MongoDB server to meet your application's needs, and as you choose your upgrades from the base configuration, you'll see the thresholds at which you should consider upgrading other components. As your data set's size and the number of indexes in your database increase, you'll need additional RAM, CPU, and storage resources, but you won't need them in the same proportions — certain components become bottlenecks before others. Sure, you could upgrade all of the components in a given database server at the same rate, but if, say, you update everything when you only need to upgrade RAM, you'd be adding (and paying for) unnecessary CPU and storage capacity.

Using our new Solution Designer, it's very easy to graphically design a complex multi-site replica set. Once you finalize your locations and server configurations, you'll click "Order," and our automated provisioning system will kick into high gear. It deploys your server hardware, installs CentOS (with OS optimizations to provide MongoDB performance enhancements), installs MongoDB, installs MMS (MongoDB Monitoring Service) and configures the network connection on each server to cluster it with the other servers in your environment. A process that may have taken days of work and months of tweaking is completed in less than four hours. And because everything is standardized and automated, you run much less risk of human error.

MongoDB Configs

One of the other massive benefits of working so closely with 10gen is that we've been able to integrate 10gen's MongoDB Cloud Subscriptions into our offering. Customers who opt for a MongoDB Cloud Subscription get additional MongoDB features (like SSL and SNMP support) and support direct from the MongoDB authority. As an added bonus, since the 10gen team has an intimate understanding of the SoftLayer environment, they'll be able to provide even better support to SoftLayer customers!

You shouldn't have to sacrifice agility for performance, and you shouldn't have to sacrifice performance for agility. Most of the "big data" offerings in the market today are built on virtual servers that can be provisioned quickly but offer meager performance levels relative to running the same database on bare metal infrastructure. To get the performance benefits of dedicated hardware, many users have chosen to build, roll out and tweak their own configurations. With our MongoDB offering, you get the on-demand availability and flexibility of a cloud infrastructure with the raw power and full control of dedicated hardware.

If you've been toying with the idea of rolling out your own big data infrastructure, life just got a lot better for you.


October 23, 2012

Tips from the Abuse Department: Know Spam. Stop Spam.

As an abuse administrator, I'm surrounded by spam on a daily basis. When someone sends an abuse-related complaint to our contact address, it gets added to our ticket queue, and our Abuse SLayers take time to investigate and follow up with the customers whose servers violate our acceptable use policy. The majority of those abuse-related submissions are reporting spam coming from our network, and in my interaction with customers, I've noticed that spam (and the source of spam) is widely misunderstood.

Most spam tickets we create on customer accounts pinpoint spam sent from a compromised or exploited server. Our direct customer didn't send the phishing email, malware distribution, pharmacy advertisement or pornographic spam, but that activity came from their account. While they're accountable for the abusive behavior coming from their server, in many cases, they don't know that there's a problem until we post an abuse ticket on their account. These servers are targeted and compromised by common techniques and exploits that could have been easily avoided, but they aren't very well known outside the world of abuse.

To protect yourself from a spammer, you need to think like a spammer. You need to understand how someone might try to exploit your environment so that you can prevent them from doing so. As you're looking at ways to secure your server proactively, make sure you target these five exploits in particular:

1. User Auth Login

This is by far the most common exploit to used to send spam. This method involves a person or script using the credentials of a user to send spam through a domain's mail server. The majority of these incidences are caused by malware on a client PC that obtains the login and password for a domain user and uses that information to log on and send mail from the client PC through the server. Often, these spam messages are sent through a botnet command structure.

When an account is compromised, simply changing the password for the compromised user on the server usually won't stop the abuse. We see quite a few accounts that continue to send spam after an initial abuse ticket results in a password change. Most servers that are sending spam with this method are found to only be sending a small amount of spam at any given time to avoid detection. The low volume of spam that is being sent per server is made up for by the fact that there are thousands of servers being used for the same spamming campaigns.

In order to stop the User Auth Login exploit, a customer needs to clean all of the malicious software (malware) from their environments. To prevent future User Auth Login compromises, users should be made aware of the potential dangers of untrusted software, and if they believe their machines are infected, they need to know what to do.

2. Tell-a-friend Exploitation

The User Auth Login technique is the most common method employed by spammers, but the "tell-a-friend" script exploitation isn't far behind when it comes to volume of affected servers. This spamming method find websites that use scripts to invite users to refer friends to a page or product. Spammers will use the 'Your Message' field in one of these scripts to input their own content and links, and they'll push the actual page referral link to the bottom of the message. When these site scripts aren't secure, the spammer will use them to send hundreds or thousands of messages.

To avoid having your website fall victim to this type of spam, be very wary of any widget or script you add. If you need to add Facebook, Twitter and email "share" functionality to your site, make sure you incorporate a tell-a-friend script that does not allow for customizable messages or does not accept input of more than one email address. Also, users won't need the "cc" or "bcc" fields, so you can be sure those are axed as well. If you can't find a good "share" script that you're comfortable with from a security perspective, it might be a good idea to remove that functionality to avoid exploitation.

3. Uploaded Mailers

Spam sent via an uploaded third party mailer can sometimes prove difficult for admins to locate. An uploaded third party mailer could be capable of creating it's own outbound SMTP connection, and that would allow a program to bypass the existing MTA on the server and render any legitimate mail logs useless for investigation. Another challenge is that a php mailer can be uploaded to a location within a user's web content, and that mailer is run by the user 'nobody' (the default Apache user).

We strongly suggest configuring your server to have the mail headers show the script's user (that's not the Apache default user) and the location the script is running from on the server. Many times, these kinds of mailers are maliciously uploaded after a user's FTP password is been compromised, so be sure your FTP login information is secure.

4. Software Exploits

The "software exploits" category casts a huge shadow. Every piece of software on a server — from mail servers, content management systems and control panels to the operating system itself — can be targeted by hackers. They probe servers to find security vulnerabilities and weak coding, and when they find a vulnerability, they take control.

The hacker who found the software vulnerability might not actually take advantage of the exploit immediately. That user may sell access to other entities for their use, and that use often ends up being spam. In addition to having strong firewall rules and access restrictions, you should update and maintain the current stable versions of all software on your servers.

5. WordPress Exploits

WordPress exploits would technically fall under the "Software Exploits" category, but I'm breaking it out into its own category simply due to the volume of spam issues that are the result of exploiting this particular piece of software. The first step to protecting against spam being sent through this source is to make sure you have the latest version of WordPress installed. With that done, be sure to research the latest security plugins for that version and install any that are applicable to your environment.

These five techniques are not the only ones used by spammers to take advantage of your environment, but they are some of the most common. To protect yourself from becoming a source of spam, make your servers a more difficult target to exploit. To stop spam, you need to know spam. Now that you know spam, it's time to stop it. Ask questions, test your environment regularly and watch your logs for any unexplained usage.


October 8, 2012

Don't Let Your Success Bring You Down

Last week, I got an email from a huge technology conference about their new website, exciting new speaker line up and the availability of early-bird tickets. I clicked on a link from that email, and I find that their fancy new website was down. After giving up on getting my early-bird discount, I surfed over to Facebook, and I noticed a post from one of my favorite blogs, Dutch Cowboys, about another company's interesting new product release. I clicked the link to check out the product, and THAT site was down, too. It's painfully common for some of the world's most popular sites and applications buckle under the strain of their own success ... Just think back to when Diablo III was launched: Demand crushed their servers on release day, and the gamers who waited patiently to get online with their copy turned to the world of social media to express their visceral anger about not being able to play the game.

The question everyone asks is why this kind of thing still happens. To a certain extent, the reality is that most entrepreneurs don't know what they don't know. I spoke with an woman who was going to be featured on BBC's Dragons' Den, and she said that the traffic from the show's viewers crippled most (if not all) of the businesses that were presented on the program. She needed to safeguard from that happening to her site, and she didn't know how to do that.

Fortunately, it's pretty easy to keep sites and applications online with on-demand infrastructure and auto-scaling tools. Unfortunately, most business owners don't know how easy it is, so they don't take advantage of the resources available to them. Preparing a website, game or application for its own success doesn't have to be expensive or time consuming. With pay-for-what-you-use pricing and "off the shelf" cloud management solutions, traffic-caused outages do NOT have to happen.

First impressions are extremely valuable, and if I wasn't really interested in that conference or the new product Dutch Cowboys blogged about, I'd probably never go back to those sites. Most Internet visitors would not. I cringe to think about the potential customers lost.

Businesses spend a lot of time and energy on user experience and design, and they don't think to devote the same level of energy on their infrastructure. In the 90's, sites crashing or slowing was somewhat acceptable since the interwebs were exploding beyond available infrastructure's capabilities. Now, there's no excuse.

If you're launching a new site, product or application, how do you get started?

The first thing you need to do is understand what resources you need and where the potential bottlenecks are when hundreds, thousands or even millions of people want to what you're launching. You don't need to invest in infrastructure to accommodate all of that traffic, but you need to know how you can add that infrastructure when you need it.

One of the easiest ways to prepare for your own success without getting bogged down by the bits and bytes is to take advantage of resources from some of our technology partners (and friends). If you have a PHP, Ruby on Rails or Node.js applications, Engine Yard will help you deploy and manage a specialized hosting environment. When you need a little more flexibility, RightScale's cloud management product lets you easily manage your environment in "a single integrated solution for extreme efficiency, speed and control." If your biggest concern is your database's performance and scalability, Cloudant has an excellent cloud database management service.

Invest a little time in getting ready for your success, and you won't need to play catch-up when that success comes to you. Given how easy it is to prepare and protect your hosting environment these days, outages should go the way of the 8-track player.


August 27, 2012

IPv4 v. IPv6 - What's the Difference?

About a year ago, Phil Jackson and I recorded a podcast-esque click-through of a presentation that explained the difference between IPv4 and IPv6 address space, and as a testament to the long-tail nature of blog posts, Internet Society's Deploy360 Blog shared the video. With a hint of nostalgia, I clicked "play" on the video.

I laughed. I cried. I found it informative. I noticed a few places where it could have been better.

We recorded the video in response to a tweet from one of our Twitter followers, and the off-the-cuff dialog wound up being somewhere in between "accessible, informative and funny" and "overly detailed, too long and obviously improvised." Because there aren't many people who want to listen to two guys give a 15-minute presentation on IP addresses when they could be watching a Songified review of Five Guys Burgers and Fries or an epic data center tour, I thought I'd dilute the information from the video into a quick blog post that spells out some of the major distinctions between IPv4 and IPv6 so you can scan it, interject your own "witty" banter and have your favorite YouTube viral video playing in the background.

IP Address Overview

An IP address is like a telephone number or a street address. When you connect to the Internet, your device (computer, smartphone, tablet) is assigned an IP address, and any site you visit has an IP address. The IP addressing system we've been using since the birth of the Internet is called IPv4, and the new addressing system is called IPv6. The reason we have to supplement the IPv4 address system (and ultimately eclipse it) with IPv6 is because the Internet is running out of available IPv4 address space, and IPv6 provides is an exponentially larger pool of IP addresses ... Let's look at the numbers:

  • Total IPv4 Space: 4,294,967,296 addresses
  • Total IPv6 Space: 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses

Even saying the IPv6 space is "exponentially larger" doesn't really paint the picture of the difference in size.

IPv4 Addresses

To understand why the IPv4 address space is limited to four billion addresses, we can break down an IPv4 address. An IPv4 address is a 32-bit number made up of four octets (8-bit numbers) in decimal notation, separated by periods. A bit can either be a 1 or a 0 (2 possibilities), so the decimal notation of an octet would have 28 distinct possibilities — 256 of them, to be exact. Because we start numbering at 0, the possible values of one an octet in an IPv4 address go from 0 to 255.

Examples of IPv4 Addresses:,,

If an IPv4 address is made up of four sections with 256 possibilities in each section, to find the total number of possibilities in the entire IPv4 pool, you'd just multiply 256*256*256*256 to get to the 4,294,967,296 number. To look at it another way, you've got 32 bits, so 232 will get you to the same total.

IPv6 Addresses

IPv6 addresses are based on 128 bits. Using the same math as above, we can take 2128 and find the total IPv6 address pool (which I won't copy again here because it takes up too much space). Because the IPv6 pool is so much larger than the IPv4 pool, it'd be much more difficult to define the space in the same decimal notation ... you'd have 232 possibilities in each section.

To allow for that massive IPv6 pool to be used a little more easily, IPv6 addresses are broken down into eight 16-bit sections, separated by colons. Because each section is 16 bits, it can have 216 variations (65,536 distinct possibilities). Using decimal numbers between 0 and 65,535 would still be pretty long-winded, so IPv6 addresses are expressed with hexadecimal notation (16 different characters: 0-9 and a-f).

Example of an IPv6 Addresses: 2607:f0d0:4545:3:200:f8ff:fe21:67cf

That's still a mouthful, but it's a little more manageable than the decimal alternatives.

CIDR Slash (/) Notation

When people talk about blocks of IP addresses, they generally use CIDR Slash (/) Notation where the block might look like this: ... When you glance at that number, you might assume, "Okay, so you have through," but CIDR notation is not showing you the range of addresses, it's telling you the size of the "network" part of the allocation.

IP addresses are made up two parts — the network and the host. The "network" part of the address tells us the number of bits that stay the same at the beginning of the block of IPs, while the "host" part of the address are the bits that define the different possibilities of IP addresses in the block. In CIDR notation, a /24 is telling us that the first 24 bits of the address are defined by the network, so we have 8 bits (32 total bits minus 24 network bits) in the host — 28 is 256 distinct addresses. The IPv4 address block includes to

IPv4 address blocks can be as large as a /8 (given to regional registries like ARIN and APNIC), and they can be as small as a /32 (which is a single IP address).

Why Provision So Many IPv6 Addresses?

When SoftLayer provisions an IPv6 address block on a server, we give a /64 block of IPv6 addresses ... Or 18,446,744,073,709,551,616 IPv6 addresses to each server. That number seems excessive, but the /64 block size is the "smallest" IPv6 allocation block.

Providers like SoftLayer are allocated /32 blocks of IPv6 addresses. The difference between a /32 and a /64 is 32 bits (232) ... Bonus points if you can remember where you've seen that number before. What that means is that SoftLayer is given a block of IP addresses so large that we could provision 4,294,967,296 /64 blocks of IPv6 addresses ... Or put more remarkably: In one /32 block of IPv6 space, there are the same number of /64 blocks of IPv6 addresses as there are TOTAL IPv4 addresses.

So while it's pretty impossible to use a full /64 of IPv6 addresses on a server, it's equally difficult for SoftLayer to burn through its /32 block.

So Now What?

IPv4 space is running out quickly. If your site isn't running a dual-stack IPv6 configuration yet, it's possible that you're going to start missing traffic from users who are only able to access the Internet over IPv6 (which is not backwards compatible with IPv4). If your Internet Service Provider (ISP) doesn't support IPv6 yet, you won't be able to access websites that are broadcast only with IPv6 addresses.

The percentage of instances of each of those cases is relatively small, but it's only going to get larger ... And it only takes one missed customer to make you regret not taking the steps to incorporate IPv6 into your infrastructure.


August 17, 2012

SoftLayer Private Clouds - Provisioning Speed

SoftLayer Private Clouds are officially live, and that means you can now order and provision your very own private cloud infrastructure on Citrix CloudPlatform quickly and easily. Chief Scientist Nathan Day introduced private clouds on the blog when it was announced at Cloud Expo East, and CTO Duke Skarda followed up with an explanation of the architecture powering SoftLayer Private Clouds. The most amazing claim: You can order a private cloud infrastructure and spin up its first virtual machines in a matter of hours rather than days, weeks or months.

If you've ever looked at building your own private cloud in the past, the "days, weeks or months" timeline isn't very surprising — you have to get the hardware provisioned, the software installed and the network configured ... and it all has to work together. Hearing that SoftLayer Private Clouds can be provisioned in "hours" probably seems too good to be true to administrators who have tried building a private cloud in the past, so I thought I'd put it to the test by ordering a private cloud and documenting the experience.

At 9:30am, I walked over to Phil Jackson's desk and asked him if he would be interested in helping me out with the project. By 9:35am, I had him convinced (proof), and the clock was started.

When we started the order process, part of our work is already done for us:

SoftLayer Private Clouds

To guarantee peak performance of the CloudPlatform management server, SoftLayer selected the hardware for us: A single processor quad core Xeon 5620 server with 6GB RAM, GigE, and two 2.0TB SATA II HDDs in RAID1. With the management server selected, our only task was choosing our host server and where we wanted the first zone (host server and management server) to be installed:

SoftLayer Private Clouds

For our host server, we opted for a dual processor quad core Xeon 5504 with the default specs, and we decided to spin it up in DAL05. We added (and justified) a block of 16 secondary IP addresses for our first zone, and we submitted the order. The time: 9:38am.

At this point, it would be easy for us to game the system to shave off a few minutes from the provisioning process by manually approving the order we just placed (since we have access to the order queue), but we stayed true to the experiment and let it be approved as it normally would be. We didn't have to wait long:

SoftLayer Private Clouds

At 9:42am, our order was approved, and the pressure was on. How long would it take before we were able to log into the CloudStack portal to create a virtual machine? I'd walked over to Phil's desk 12 minutes ago, and we still had to get two physical servers online and configured to work with each other on CloudPlatform. Luckily, the automated provisioning process took on a the brunt of that pressure.

Both server orders were sent to the data center, and the provisioning system selected two pieces of hardware that best matched what we needed. Our exact configurations weren't available, so a SBT in the data center was dispatched to make the appropriate hardware changes to meet our needs, and the automated system kicked into high gear. IP addresses were assigned to the management and host servers, and we were able to monitor each server's progress in the customer portal. The hardware was tested and prepared for OS install, and when it was ready, the base operating systems were loaded — CentOS 6 on the management server and Citrix XenServer 6 on the host server. After CentOS 6 finished provisioning on the management server, CloudStack was installed. Then we got an email:

SoftLayer Private Clouds

At 11:24am, less than two hours from when I walked over to Phil's desk, we had two servers online and configured with CloudStack, and we were ready to provision our first virtual machines in our private cloud environment.

We log into CloudStack and added our first instance:

SoftLayer Private Clouds

We configured our new instance in a few clicks, and we clicked "Launch VM" at 11:38am. It came online in just over 3 minutes (11:42am):

SoftLayer Private Clouds

I got from "walking to Phil's desk" to having a multi-server private cloud infrastructure running a VM in exactly two hours and twelve minutes. For fun, I created a second VM on the host server, and it was provisioned in 31.7 seconds. It's safe to say that the claim that SoftLayer takes "hours" to provision a private cloud has officially been confirmed, but we thought it would be fun to add one more wrinkle to the system: What if we wanted to add another host server in a different data center?

From the "Hardware" tab in the SoftLayer portal, we selected "Add Zone" to from the "Actions" in the "Private Clouds" section, and we chose a host server with four portable IP addresses in WDC01. The zone was created, and the host server went through the same hardware provisioning process that our initial deployment went through, and our new host server was online in < 2 hours. We jumped into CloudStack, and the new zone was created with our host server ready to provision VMs in Washington, D.C.

Given how quick the instances were spinning up in the first zone, we timed a few in the second zone ... The first instance was online in about 4 minutes, and the second was running in 26.8 seconds.

SoftLayer Private Clouds

By the time I went out for a late lunch at 1:30pm, we'd spun up a new private cloud infrastructure with geographically dispersed zones that launched new cloud instances in under 30 seconds. Not bad.

Don't take my word for it, though ... Order a SoftLayer Private Cloud and see for yourself.


July 27, 2012

SoftLayer 'Cribs' ≡ DAL05 Data Center Tour

The highlight of any customer visit to a SoftLayer office is always the data center tour. The infrastructure in our data centers is the hardware platform on which many of our customers build and run their entire businesses, so it's not surprising that they'd want a first-hand look at what's happening inside the DC. Without exception, visitors to a SoftLayer data center pod are impressed when they walk out of a SoftLayer data center pod ... even if they've been in dozens of similar facilities in the past.

What about the customers who aren't able to visit us, though? We can post pictures, share stats, describe our architecture and show you diagrams of our facilities, but those mediums can't replace the experience of an actual data center tour. In the interest of bridging the "data center tour" gap for customers who might not be able to visit SoftLayer in person (or who want to show off their infrastructure), we decided to record a video data center tour.

If you've seen "professional" video data center tours in the past, you're probably positioning a pillow on top of your keyboard right now to protect your face if you fall asleep from boredom when you hear another baritone narrator voiceover and see CAD mock-ups of another "enterprise class" facility. Don't worry ... That's not how we roll:

Josh Daley — whose role as site manager of DAL05 made him the ideal tour guide — did a fantastic job, and I'm looking forward to feedback from our customers about whether this data center tour style is helpful and/or entertaining.

If you want to see more videos like this one, "Like" it, leave comments with ideas and questions, and share it wherever you share things (Facebook, Twitter, your refrigerator, etc.).


July 19, 2012

The Human Element of SoftLayer - DAL05 DC Operations

One of the founding principles of SoftLayer is automation. Automation has enabled this company to provide our customers with a world class experience, and it enables employees to provide excellent service. It allows us to quickly deploy a variety of solutions at the click of a button, and it guarantees consistency in the products that we deliver. Automation isn't the whole story, though. The human element plays a huge role in SoftLayer's success.

As a Site Manager for the corporate facility, I thought I could share a unique perspective when it comes to what that human element looks like, specifically through the lens of the Server Build Team's responsibilities. You recently heard how my colleague, Broc Chalker, became an SBT, and so I wanted take it a step further by providing a high-level breakdown of how the Server Build Team enables SoftLayer to keep up with the operational demands of a rapidly growing, global infrastructure provider.

The Server Build Team is responsible for filling all of the beautiful data center environments you see in pictures and videos of SoftLayer facilities. Every day, they are in the DC, building out new rows for inventory. It sounds pretty simple, but it's actually a pretty involved process. When it comes to prepping new rows, our primary focus is redundancy (for power, cooling and network). Each rack is powered by dual power sources, four switches in a stacked configuration (two public network, two private network), and an additional switch that provides KVM access to the server. To make it possible to fill the rack with servers, we also have to make sure it's organized well, and that takes a lot of time. Just watch the video of the Go Live Crew cabling a server rack in SJC01, and you can see how time- and labor-intensive the process is. And if there are any mistakes or if the cables don't look clean, we'll cut all the ties and start over again.


In addition to preparing servers for new orders, SBTs also handle hardware-related requests. This can involve anything from changing out components for a build, performing upgrades / maintenance on active servers, or even troubleshooting servers. Any one of these requests has to be treated with significant urgency and detail.


The responsibilities do not end there. Server Build Technicians also perform a walk of the facility twice per shift. During this walk, technicians check for visual alerts on the servers and do a general facility check of all SoftLayer pods. Note: Each data center facility features one or more pods or "server rooms," each built to the same specifications to support up to 5,000 servers.


The DAL05 facility has a total of four pods, and at the end of the build-out, we should be running 18,000-20,000 servers in this facility alone. Over the past year, we completed the build out of SR02 and SR03 (pod 2 and 3, respectively), and we're finishing the final pod (SR04) right now. We've spent countless hours building servers and monitoring operating system provisions when new orders roll in, and as our server count increases, our team has grown to continue providing the support our existing customers expect and deserve when it comes to upgrade requests and hardware-related support tickets.


To be successful, we have to stay ahead of the game from an operations perspective. The DAL05 crew is working hard to build out this facility's last pod (SR04), but for the sake of this blog post, I pulled everyone together for a quick photo op to introduce you to the team.

DAL05 Day / Evening Team and SBT Interns (with the remaining racks to build out in DAL05):
DAL05 DC Ops

DAL05 Overnight Server Build Technician Team:
DAL05 DC Ops

Let us know if there's ever anything we can do to help you!


Subscribe to infrastructure