Posts Tagged 'Architecture'

July 24, 2013

Deconstructing SoftLayer's Three-Tiered Network

When Sun Microsystems VP John Gage coined the phrase, "The network is the computer," the idea was more wishful thinking than it was profound. At the time, personal computers were just starting to show up in homes around the country, and most users were getting used to the notion that "The computer is the computer." In the '80s, the only people talking about networks were the ones selling network-related gear, and the idea of "the network" was a little nebulous and vaguely understood. Fast-forward a few decades, and Gage's assertion has proven to be prophetic ... and it happens to explain one of SoftLayer's biggest differentiators.

SoftLayer's hosting platform features an innovative, three-tier network architecture: Every server in a SoftLayer data center is physically connected to public, private and out-of-band management networks. This "network within a network" topology provides customers the ability to build out and manage their own global infrastructure without overly complex configurations or significant costs, but the benefits of this setup are often overlooked. To best understand why this network architecture is such a game-changer, let's examine each of the network layers individually.

SoftLayer Private Network

Public Network

When someone visits your website, they are accessing content from your server over the public network. This network connection is standard issue from every hosting provider since your content needs to be accessed by your users. When SoftLayer was founded in 2005, we were the first hosting provider to provide multiple network connections by default. At the time, some of our competitors offered one-off private network connections between servers in a rack or a single data center phase, but those competitors built their legacy infrastructures with an all-purpose public network connection. SoftLayer offers public network connection speeds up to 10Gbps, and every bare metal server you order from us includes free inbound bandwidth and 5TB of outbound bandwidth on the public network.

Private Network

When you want to move data from one server to another in any of SoftLayer's data centers, you can do so quickly and easily over the private network. Bandwidth between servers on the private network is unmetered and free, so you don't incur any costs when you transfer files from one server to another. Having a dedicated private network allows you to move content between servers and facilities without fighting against or getting in the way of the users accessing your server over the public network.

It should come as no surprise to learn that all private network traffic stays on SoftLayer's network exclusively when it travels between our facilities. The blue lines in this image show how the private network connects all of our data centers and points of presence:

SoftLayer Private Network

To fully replicate the functionality provided by the SoftLayer private network, competitors with legacy single-network architecture would have to essentially double their networking gear installation and establish safeguards to guarantee that customers can only access information from their own servers via the private network. Because that process is pretty daunting (and expensive), many of our competitors have opted for "virtual" segmentation that logically links servers to each other. The traffic between servers in those "virtual" private networks still travels over the public network, so they usually charge you for "private network" bandwidth at the public bandwidth rate.

Out-of-Band Management Network

When it comes to managing your server, you want an unencumbered network connection that will give you direct, secure access when you need it. Splitting out the public and private networks into distinct physical layers provides significant flexibility when it comes to delivering content where it needs to go, but we saw a need for one more unique network layer. If your server is targeted for a denial of service attack or a particular ISP fails to route traffic to your server correctly, you're effectively locked out of your server if you don't have another way to access it. Our management-specific network layer uses bandwidth providers that aren't included in our public/private bandwidth mix, so you're taking a different route to your server, and you're accessing the server through a dedicated port.

If you've seen pictures or video from a SoftLayer data center (or if you've competed in the Server Challenge), you probably noticed the three different colors of Ethernet cables connected at the back of every server rack, and each of those colors carries one of these types of network traffic exclusively. The pink/red cables carry public network traffic, the blue cables carry private network traffic, and the green cables carry out-of-band management network traffic. All thirteen of our data centers have the same colored cables in the same configuration doing the same jobs, so we're able to train our operations staff consistently between all thirteen of our data centers. That consistency enables us to provide quicker service when you need it, and it lessens the chance of human error on the data center floor.

The most powerful server on the market can be sidelined by a poorly designed, inefficient network. If "the network is the computer," the network should be a primary concern when you select your next hosting provider.

-@khazard

July 9, 2013

When to Consider Riak for Your Big Data Architecture

In my Breaking Down 'Big Data' – Database Models, I briefly covered the most common database models, their strengths, and how they handle the CAP theorem — how a distributed storage system balances demands of consistency and availability while maintaining partition tolerance. Here's what I said about Dynamo-inspired databases:

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

This type of key/value store architecture is very unique from the document-oriented MongoDB solutions we launched at the end of last year, so we worked with Basho to prioritize development of high-performance Riak solutions on our global platform. Since you already know about MongoDB, let's take a few minutes to meet the new kid on the block.

Riak is a distributed database architected for availability, fault tolerance, operational simplicity and scalability. Riak is masterless, so each node in a Riak cluster is the same and contains a complete, independent copy of the Riak package. This design makes the Riak environment highly fault tolerant and scalable, and it also aids in replication — if a node goes down, you can still read, write and update data.

As you approach the daunting prospect of choosing a big data architecture, there are a few simple questions you need to answer:

  1. How much data do/will I have?
  2. In what format am I storing my data?
  3. How important is my data?

Riak may be the choice for you if [1] you're working with more than three terabytes of data, [2] your data is stored in multiple data formats, and [3] your data must always be available. What does that kind of need look like in real life, though? Luckily, we've had a number of customers kick Riak's tires on SoftLayer bare metal servers, so I can share a few of the use cases we've seen that have benefited significantly from Riak's unique architecture.

Use Case 1 – Digital Media
An advertising company that serves over 10 billion ads per month must be able to quickly deliver its content to millions of end users around the world. Meeting that demand with relational databases would require a complex configuration of expensive, vertically scaled hardware, but it can be scaled out horizontally much easier with Riak. In a matter of only a few hours, the company is up and running with an ad-serving infrastructure that includes a back-end Riak cluster in Dallas with a replication cluster in Singapore along with an application tier on the front end with Web servers, load balancers and CDN.

Use Case 2 – E-commerce
An e-commerce company needs 100-percent availability. If any part of a customer's experience fails, whether it be on the website or in the shopping cart, sales are lost. Riak's fault tolerance is a big draw for this kind of use case: Even if one node or component fails, the company's data is still accessible, and the customer's user experience is uninterrupted. The shopping cart structure is critical, and Riak is built to be available ... It's a perfect match.

As an additional safeguard, the company can take advantage of simple multi-datacenter replication in their Riak Enterprise environment to geographically disperse content closer to its customers (while also serving as an important tool for disaster recovery and backup).

Use Case 3 – Gaming
With customers like Broken Bulb and Peak Games, SoftLayer is no stranger to the gaming industry, so it should come as no surprise that we've seen interesting use cases for Riak from some of our gaming customers. When a game developer incorporated Riak into a new game to store player data like user profiles, statistics and rankings, the performance of the bare metal infrastructure blew him away. As a result, the game's infrastructure was redesigned to also pull gaming content like images, videos and sounds from the Riak database cluster. Since the environment is so easy to scale horizontally, the process on the infrastructure side took no time at all, and the multimedia content in the game is getting served as quickly as the player data.

Databases are common bottlenecks for many applications, but they don't have to be. Making the transition from scaling vertically (upgrading hardware, adding RAM, etc.) to scaling horizontally (spreading the work intelligently across multiple nodes) alleviates many of the pain points for a quickly growing database environment. Have you made that transition? If not, what's holding you back? Have you considered implementing Riak?

-@marcalanjones

April 30, 2013

Big Data at SoftLayer: Riak

Big data is only getting bigger. Late last year, SoftLayer teamed up with 10Gen to launch a high-performance MongoDB solution, and since then, many of our customers have been clamoring for us to support other big data platforms in the same way. By automating the provisioning process of a complex big data environment on bare metal infrastructure, we made life a lot easier for developers who demanded performance and on-demand scalability for their big data applications, and it's clear that our simple formula produced amazing results. As Marc mentioned when he started breaking down big data database models, document-oriented databases like MongoDB are phenomenal for certain use-cases, and in other situations, a key-value store might be a better fit. With that in mind, we called up our friends at Basho and started building a high-performance architecture specifically for Riak ... And I'm excited to announce that we're launching it today!

Riak is an open source, distributed database platform based on the principles enumerated in the DynamoDB paper. It uses a simple key/value model for object storage, and it was architected for high availability, fault tolerance, operational simplicity and scalability. A Riak cluster is composed of multiple nodes that are all connected, all communicating and sharing data automatically. If one node were to fail, the other nodes would automatically share the data that the failed node was storing and processing until the node is back up and running or a new node is added. See the diagram below for a simple illustration of how adding a node to a cluster works within Riak.

Riak Nodes

We will support both the open source and the Enterprise versions of Riak. The open source version is a great place to start. It has all of the database functionality of Riak Enterprise, but it is limited to a single cluster. The Enterprise version supports replication between clusters across data centers, giving you lots of architectural options. You can use replication to build highly available, live-live failover applications. You can also use it to distribute your application's data across regions, giving you a global platform that you can update anywhere in the world and know that those modifications will be available anywhere else. Riak Enterprise customers also receive 24×7 coverage, both from SoftLayer and Basho. This includes SoftLayer's one-hour guaranteed response for Severity 1 hardware issues and unlimited support available via our secure web portal, email and phone.

The business use-case for this flexibility is that if you need to scale up or down, nodes can be easily added or taken down as your requirements change. You can opt for a single-data center environment with a few nodes or you can broaden your architecture to a multi-data center deployment with a 40-node cluster. While these capabilities are inherent in Riak, they can be complicated to build and configure, so we spent countless hours working with Basho to streamline Riak deployment on the SoftLayer platform. The fruit of that labor can be found in our Riak Solution Designer:

Riak Solution Designer

The server configurations and packages in the Riak Solution Designer have been selected to deliver the performance, availability and stability that our customers expect from their bare metal and virtual cloud infrastructure at SoftLayer. With a few quick clicks, you can order a fully configured Riak environment, and it'll be provisioned and online for you in two to four hours. And everything you order is on a month-to-month contract.

Thanks to the hard work done by the SoftLayer development group and Basho's team, we're proud to be the first in the marketplace to offer a turn-key Riak solution on bare metal infrastructure. You don't need to sacrifice performance and agility for simplicity.

For more information, visit SoftLayer.com/Riak or contact our sales team.

-Duke

December 6, 2012

MongoDB: Architectural Best Practices

With the launch of our MongoDB solutions, developers can provision powerful, optimized, horizontally scaling NoSQL database clusters in real-time on bare metal infrastructure in SoftLayer data centers around the world. We worked tirelessly with our friends at 10gen — the creators of MongoDB — to build and tweak hardware and software configurations that enable peak MongoDB performance, and the resulting platform is pretty amazing. As Duke mentioned in his blog post, those efforts followed 10Gen's MongoDB best practices, but what he didn't mention was that we created some architectural best practices of our own for MongoDB in deployments on our platform.

The MongoDB engineered servers that you order from SoftLayer already implement several of the recommendations you'll see below, and I'll note which have been incorporated as we go through them. Given the scope of the topic, it's probably easiest to break down this guide into a few sections to make it a little more digestible. Let's take a look at the architectural best practices of running MongoDB through the phases of the roll-out process: Selecting a deployment strategy to prepare for your MongoDB installation, the installation itself, and the operational considerations of running it in production.

Deployment Strategy

When planning your MongoDB deployment, you should follow Sun Tzu's (modified) advice: "If you know the [friend] and know yourself, you need not fear the result of a hundred battles." "Friend" was substituted for the "enemy" in this advice because the other party is MongoDB. If you aren't familiar with MongoDB, the top of your to-do list should be to read MongoDB's official documentation. That information will give you the background you'll need as you build and use your database. When you feel comfortable with what MongoDB is all about, it's time to "know yourself."

Your most important consideration will be the current and anticipated sizes of your data set. Understanding the volume of data you'll need to accommodate will be the primary driver for your choice of individual physical nodes as well as your sharding plans. Once you've established an expected size of your data set, you need to consider the importance of your data and how tolerant you are of the possibility of lost or lagging data (especially in replicated scenarios). With this information in hand, you can plan and start testing your deployment strategy.

It sounds a little strange to hear that you should test a deployment strategy, but when it comes to big data, you want to make sure your databases start with a strong foundation. You should perform load testing scenarios on a potential deployment strategy to confirm that a given architecture will meet your needs, and there are a few specific areas that you should consider:

Memory Sizing
MongoDB (like many data-oriented applications) works best when the data set can reside in memory. Nothing performs better than a MongoDB instance that does not require disk I/O. Whenever possible, select a platform that has more available RAM than your working data set size. If your data set exceeds the available RAM for a single node, then consider using sharding to increase the amount of available RAM in a cluster to accommodate the larger data set. This will maximize the overall performance of your deployment. If you notice page faults when you put your database under production load, they may indicate that you are exceeding the available RAM in your deployment.

Disk Type
If speed is not your primary concern or if you have a data set that is far larger than any available in memory strategy can support, selecting the proper disk type for your deployment is important. IOPS will be key in selecting your disk type and obviously the higher the IOPS the better the performance of MongoDB. Local disks should be used whenever possible (as network storage can cause high latency and poor performance for your deployment). It's also advised that you use RAID 10 when creating disk arrays.

To give you an idea of what kind of IOPS to expect from a given type of drive, these are the approximate ranges of IOPS per drive in SoftLayer MongoDB engineered servers:

SATA II – 100-200 IOPS
15K SAS – 300-400 IOPS
SSD – 7,000-8,000 IOPS (read) 19,000-20,000 IOPS (write)

CPU
Clock speed and the amount of available processors becomes a consideration if you anticipate using MapReduce. It has also been noted that when running a MongoDB instance with the majority of the data in memory, clock speed can have a major impact on overall performance. If you are planning to use MapReduce or you're able to operate with a majority of your data in memory, consider a deployment strategy that includes a CPU with a high clock/bus speed to maximize your operations per second.

Replication
Replication provides high availability of your data if a node fails in your cluster. It should be standard to replicate with at least three nodes in any MongoDB deployment. The most common configuration for replication with three nodes is a 2x1 deployment — having two primary nodes in a single data center with a backup server in a secondary data center:

MongoDB Replication

Sharding
If you anticipate a large, active data set, you should deploy a sharded MongoDB deployment. Sharding allows you to partition a single data set across multiple nodes. You can allow MongoDB to automatically distribute the data across nodes in the cluster or you may elect to define a shard key and create range-based sharding for that key.

Sharding may also help write performance, so you can also elect to shard even if your data set is small but requires a high amount of updates or inserts. It's important to note that when you deploy a sharded set, MongoDB will require three (and only three) config server instances which are specialized Mongo runtimes to track the current shard configuration. Loss of one of these nodes will cause the cluster to go into a read-only mode (for the configuration only) and will require that all nodes be brought back online before any configuration changes can be made.

Write Safety Mode
There are several write safety modes that govern how MongoDB will handle the persistence of the data to disk. It is important to consider which mode best fits your needs for both data integrity and performance. The following write safety modes are available:

None – This mode provides a deferred writing strategy that is non-blocking. This will allow for high performance, however there is a small opportunity in the case of a node failing that data can be lost. There is also the possibility that data written to one node in a cluster will not be immediately available on all nodes in that cluster for read consistency. The 'None' strategy will also not provide any sort of protection in the case of network failures. That lack of protection makes this mode highly unreliable and should only be used when performance is a priority and data integrity is not a concern.

Normal – This is the default for MongoDB if you do not select any other mode. It provides a deferred writing strategy that is non-blocking. This will allow for high performance, however there is a small opportunity in the case of a node failing that data can be lost. There is also the possibility that data written to one node in a cluster will not be immediately available on all nodes in that cluster for read consistency.

Safe – This mode will block until MongoDB has acknowledged that it has received the write request but will not block until the write is actually performed. This provides a better level of data integrity and will ensure that read consistency is achieved within a cluster.

Journal Safe – Journals provide a recovery option for MongoDB. Using this mode will ensure that the data has been acknowledged and a Journal update has been performed before returning.

Fsync - This mode provides the highest level of data integrity and blocks until a physical write of the data has occurred. This comes with a degradation in performance and should be used only if data integrity is the primary concern for your application.

Testing the Deployment
Once you've determined your deployment strategy, test it with a data set similar to your production data. 10gen has several tools to help you with load testing your deployment, and the console has a tool named 'benchrun' which can execute operations from within a JavaScript test harness. These tools will return operation information as well as latency numbers for each of those operations. If you require more detailed information about the MongoDB instance, consider using the mongostat command or MongoDB Monitoring Service (MMS) to monitor your deployment during the testing.

Installation

When performing the installation of MongoDB, a few considerations can help create both a stable and performance-oriented solution. 10gen recommends the use CentOS (64-bit) as the base operating system if at all possible. If you try installing MongoDB on a 32-bit operating system, you might run into file size limits that cause issues, and if you feel the urge to install it on Windows, you'll see performance issues if virtual memory begins to be utilized by the OS to make up for a lack of RAM in your deployment. As a result, 32-bit operating systems and Windows operating systems should be avoided on MongoDB servers. SoftLayer provisions CentOS 6.X 64-bit operating systems by default on all of our MongoDB engineered server deployments.

When you've got CentOS 64-bit installed, you should also make the following changes to maximize your performance (all of which are included by default on all SoftLayer engineered servers):

Set SSD Read Ahead Defaults to 16 Blocks - SSD drives have excellent seek times allowing for shrinking the Read Ahead to 16 blocks. Spinning disks might require slight buffering so these have been set to 32 blocks.

noatime - Adding the noatime option eliminates the need for the system to make writes to the file system for files which are simply being read — or in other words: Faster file access and less disk wear.

Turn NUMA Off in BIOS - Linux, NUMA and MongoDB tend not to work well together. If you are running MongoDB on NUMA hardware, we recommend turning it off (running with an interleave memory policy). If you don't, problems will manifest in strange ways like massive slow downs for periods of time or high system CPU time.

Set ulimit - We have set the ulimit to 64000 for open files and 32000 for user processes to prevent failures due to a loss of available file handles or user processes.

Use ext4 - We have selected ext4 over ext3. We found ext3 to be very slow in allocating files (or removing them). Additionally, access within large files is poor with ext3.

One last tip on installation: Make the Journal and Data volumes be distinct physical volumes. If the Journal and Data directories reside on a single physical volume, flushes to the Journal will interrupt the access of data and provide spikes of high latency within your MongoDB deployment.

Operations

Once a MongoDB deployment has been promoted to production, there are a few recommendations for monitoring and optimizing performance. You should always have the MMS agent running on all MongoDB instances to help monitor the health and performance of your deployment. Additionally, this tool is also very useful if you have 10gen MongoDB Cloud Subscriptions because it provides useful debugging data for the 10gen team during support interactions. In addition to MMS, you can use the mongostat command (mentioned in the deployment section) to see runtime information about the performance of a MongoDB node. If either of these tools flags performance issues, sharding or indexing are first-line options to resolve them:

Indexes - Indexes should be created for a MongoDB deployment if monitoring tools indicate that field based queries are performing poorly. Always use indexes when you are querying data based on distinct fields to help boost performance.

Sharding - Sharding can be leveraged when the overall performance of the node is suffering because of a large operating data set. Be sure to shard before you get in the red; the system only splits chunks for sharding on insert or update so if you wait too long to shard you may have some uneven distribution for a period of time or forever depending on your data set and sharding key strategy.

I know it seems like we've covered a lot over the course of this blog post, but this list of best practices is far from exhaustive. If you want to learn more, the MongoDB forums are a great resource to connect with the rest of the MongoDB community and learn from their experiences, and the documentation on MongoDB's site is another phenomenal resource. The best people to talk to when it comes to questions about MongoDB are the folks at 10gen, so I also highly recommend taking advantage of MongoDB Cloud Subscriptions to get their direct support for your one-off questions and issues.

-Harold

December 5, 2012

Breaking Down 'Big Data' - Database Models

Forester defines big data as "techniques and technologies that make capturing value from data at an extreme scale economical." Gartner says, "Big data is the term adopted by the market to describe extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets." Big data demands extreme horizontal scale that traditional IT management can't handle, and it's not a challenge exclusive to the Facebooks, Twitters and Tumblrs of the world ... Just look at the Google search volume for "big data" over the past eight years:

Big Data Search Interest

Developers are collectively facing information overload. As storage has become more and more affordable, it's easier to justify collecting and saving more data. Users are more comfortable with creating and sharing content, and we're able to track, log and index metrics and activity that previously would have been deleted in consideration of space restraints or cost. As the information age progresses, we are collecting more and more data at an ever-accelerating pace, and we're sharing that data at an incredible rate.

To understand the different facets of this increased usage and demand, Gartner came up with the three V's of big data that vary significantly from traditional data requirements: Volume, Velocity and Variety. Larger, more abundant pieces of data ("Volume") are coming at a much faster speed ("Velocity") in formats like media and walls of text that don't easily fit into a column-and-row database structure ("Variety"). Given those equally important factors, many of the biggest players in the IT world have been hard at work to create solutions that provide the scale and speed developers need when they build social, analytics, gaming, financial or medical apps with large data sets.

When we talk about scaling databases here, we're talking about scaling horizontally across multiple servers rather than scaling vertically by upgrading a single server — adding more RAM, increasing HDD capacity, etc. It's important to make that distinction because it leads to a unique challenge shared by all distributed computer systems: The CAP Theorem. According to the CAP theorem, a distributed storage system must choose to sacrifice either consistency (that everyone sees the same data) or availability (that you can always read/write) while having partition tolerance (where the system continues to operate despite arbitrary message loss or failure of part of the system occurs).

Let's take a look at a few of the most common database models, what their strengths are, and how they handle the CAP theorem compromise of consistency v. availability:

Relational Databases

What They Do: Stores data in rows/columns. Parent-child records can be joined remotely on the server. Provides speed over scale. Some capacity for vertical scaling, poor capacity for horizontal scaling. This type of database is where most people start.
Horizontal Scaling: In a relational database system, horizontal scaling is possible via replication — dharing data between redundant nodes to ensure consistency — and some people have success sharding — horizontal partitioning of data — but those techniques add a lot of complexity.
CAP Balance: Prefer consistency over availability.
When to use: When you have highly structured data, and you know what you'll be storing. Great when production queries will be predictable.
Example Products: Oracle, SQLite, PostgreSQL, MySQL

Document-Oriented Databases

What They Do: Stores data in documents. Parent-child records can be stored in the same document and returned in a single fetch operation with no join. The server is aware of the fields stored within a document, can query on them, and return their properties selectively.
Horizontal Scaling: Horizontal scaling is provided via replication, or replication + sharding. Document-oriented databases also usually support relatively low-performance MapReduce for ad-hoc querying.
CAP Balance: Generally prefer consistency over availability
When to Use: When your concept of a "record" has relatively bounded growth, and can store all of its related properties in a single doc.
Example Products: MongoDB, CouchDB, BigCouch, Cloudant

Key-Value Stores

What They Do: Stores an arbitrary value at a key. Most can perform simple operations on a single value. Typically, each property of a record must be fetched in multiple trips, with Redis being an exception. Very simple, and very fast.
Horizontal Scaling: Horizontal scale is provided via sharding.
CAP Balance: Generally prefer consistency over availability.
When to Use: Very simple schemas, caching of upstream query results, or extreme speed scenarios (like real-time counters)
Example Products: CouchBase, Redis, PostgreSQL HStore, LevelDB

BigTable-Inspired Databases

What They Do: Data put into column-oriented stores inspired by Google's BigTable paper. It has tunable CAP parameters, and can be adjusted to prefer either consistency or availability. Both are sort of operationally intensive.
Horizontal Scaling: Good speed and very wide horizontal scale capabilities.
CAP Balance: Prefer consistency over availability
When to Use: When you need consistency and write performance that scales past the capabilities of a single machine. Hbase in particular has been used with around 1,000 nodes in production.
Example Products: Hbase, Cassandra (inspired by both BigTable and Dynamo)

Dynamo-Inspired Databases

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency,
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

Each of the database models has strengths and weaknesses, and there are huge communities that support each of the open source examples I gave in each model. If your database is a bottleneck or you're not getting the flexibility and scalability you need to handle your application's volume, velocity and variety of data, start looking at some of these "big data" solutions.

Tried any of the above models and have feedback that differs from ours? Leave a comment below and tell us about it!

-@marcalanjones

August 21, 2012

High Performance Computing - GPU v. CPU

Sometimes, technical conversations can sound like people are just making up tech-sounding words and acronyms: "If you want HPC to handle Gigaflops of computational operations, you probably need to supplement your server's CPU and RAM with a GPU or two." It's like hearing a shady auto mechanic talk about replacing gaskets on double overhead flange valves or hearing Chris Farley (in Tommy Boy) explain that he was "just checking the specs on the endline for the rotary girder" ... You don't know exactly what they're talking about, but you're pretty sure they're lying.

When we talk about high performance computing (HPC), a natural tendency is to go straight into technical specifications and acronyms, but that makes the learning curve steeper for people who are trying to understand why a solution is better suited for certain types of workloads than technology they are already familiar with. With that in mind, I thought I'd share a quick explanation of graphics processing units (GPUs) in the context of central processing units (CPUs).

The first thing that usually confuses people about GPUs is the name: "Why do I need a graphics processing unit on a server? I don't need to render the visual textures from Crysis on my database server ... A GPU is not going to benefit me." It's true that you don't need cutting-edge graphics on your server, but a GPU's power isn't limited to "graphics" operations. The "graphics" part of the name reflects the original intention for kind of processing GPUs perform, but in the last ten years or so, developers and engineers have come to adapt the processing power for more general-purpose computing power.

GPUs were designed in a highly parallel structure that allows large blocks of data to be processed at one time — similar computations are being made on data at the same time (rather than in order). If you assigned the task of rendering a 3D environment to a CPU, it would slow to a crawl — it handles requests more linearly. Because GPUs are better at performing repetitive tasks on large blocks of data than CPUs, you start see the benefit of enlisting a GPU in a server environment.

The Folding@home project and bitcoin mining are two of the most visible distributed computing projects that GPUs are accelerating, and they're perfect examples of workloads made exponentially faster with the parallel processing power of graphics processing units. You don't need to be folding protein or completing a blockchain to get the performance benefits, though; if you are taxing your CPUs with repetitive compute tasks, a GPU could make your life a lot easier.

If that still doesn't make sense, I'll turn the floor over to the Mythbusters in a presentation for our friends at NVIDIA:

SoftLayer uses NVIDIA Tesla GPUs in our high performance computing servers, so developers can use "Compute Unified Device Architecture" (CUDA) to easily take advantage of their GPU's capabilities.

Hopefully, this quick rundown is helpful in demystifying the "technobabble" about GPUs and HPC ... As a quick test, see if this sentence makes more sense now than it did when you started this blog: "If you want HPC to handle Gigaflops of computational operations, you probably need to supplement your server's CPU and RAM with a GPU or two."

-Phil

October 4, 2011

An Introduction to Redis

I recently had the opportunity to get re-acquainted with Redis while evaluating solutions for a project on the Product Innovation team here at SoftLayer. I'd actually played with it a couple of times before, but this time it "clicked." Or my brain broke. Either way, I see a lot of potential for Redis now.

No one product is a perfect fit for all of your data storage needs, of course. There are such fundamental tradeoffs to be made in designing storage architectures that you should be immediately suspicious of any product that claims to fit every need.

The best solutions tend to be products that actually embrace these tradeoffs. Redis, for instance, has sacrificed a small amount of data durability in exchange for being awesome.

What is it?

Redis is a key/value store, but describing it that way is sort of like calling a helicopter a "vehicle." It's a technically correct description, but it leaves out some important stuff.

You can think of it like a sophisticated older brother of Memcached. It presents a flat keyspace, and you can set those keys to string values. Another feature of Memcached is the ability to perform remote atomic operations, like "incr" and "append." These are really handy, because you have the ability to modify remote data without fetching, and you have an assurance that you're the only one performing that operation at that instant.

Redis takes this concept of remote commands on data and goes completely nuts with it. The database is aware of data structures like hashes, lists and sets in addition to simple string values. You can sort, union, intersect, slice and dice to your heart's content without fetching any data. Redis is a data structure server. You can treat it like remote memory, and this has an awesome immediate benefit for a programmer: your code and brain are already optimized for these data types.

But it's not just about making storage simpler. It's fast, too. Crazy fast. If you make intelligent use of its data structures, it's possible to serve a lot of traffic from relatively modest hardware. Redis 2.4 can easily handle ~50k list appends a second on my notebook. With batching, it can append 2 million items to a list on a remote host in about 1.28 seconds.

It allows the remote, atomic and performant manipulation of data structures. It took me a little while to realize exactly how useful that is.

What's wrong with it?

Nothing. Move along.

OK, it's a little short on durability. Redis uses memory as its primary store and periodically flushes to disk. A common configuration is to do so every second.

That sounds pretty reasonable. If a server goes down, you could lose a second of data. Keep in mind, however, how many operations Redis can perform in a second. If you're in a high-volume environment, that could be a lot of data. It's not for your financial transactions.

It also supports relatively limited availability options. Currently, it only supports master/slave replication. Clustering support is planned for an upcoming release. It's looking pretty powerful, but it will take some real-world testing to know its performance impact.

These challenges should be taken into consideration, and it's probably clear if you're in a situation where the current tradeoffs aren't a good fit.

In my experience, a lot of developers seriously overestimate the consequences of their application losing small amounts of data. Also consider whether or not the chance of losing a second (or less) of data genuinely represents a bigger threat to your application than any other compromises you might have made.

More Information
You can check out the slightly aging docs or browse the impressively simple source. There are probably already bindings for your language of choice as well.

-Tim

April 26, 2011

Hybrid Hosting - What Does it Really Mean?

In our first 3 Bars - 3 Questions video interview, SoftLayer CTO Duke Skarda talked about Hybrid Hosting with Kevin, and last week, I tackled the topic in a session the Texas Technology Summit in Houston. If you have a few minutes and want to learn a little more about SoftLayer's take on hybrid computing and hybrid hosting, you can pull up a virtual chair and see my presentation here:

Even though hybrid hosting is relatively young, it has a great deal of potential. Unlike some of the hyped technologies and developments we hear about all the time, hybrid hosting isn't going to replace everything that came before it ... On the contrary, hybrid hosting encompasses everything that came before it, allowing for flexibility and functionality that you can't find in any of the individual component technologies.

We weren't able to record all of the questions and answers at the end of the session, but one of the most surprising themes I noticed was a misunderstanding of what "Cloud Infrastructure" meant. Those questions reminded me of a fantastic BrightTALK Cloud Infrastructure Online Summit that featured several interesting and informative session about how cloud computing is changing the way businesses are thinking about deploying and managing their IT infrastructure. I know it seems like we're preaching to the choir by posting this on the SoftLayer Blog, but take a look at the BrightTALK Summit's webcast topics to see if any would be helpful to you as you talk about this mysterious "cloud" thing.

-@toddmitchell

Subscribe to architecture