Posts Tagged ‘Riak’

April 30, 2013

Big Data at SoftLayer: Riak

By in Cloud, Executive Blog, SoftLayer, Technology

Big data is only getting bigger. Late last year, SoftLayer teamed up with 10Gen to launch a high-performance MongoDB solution, and since then, many of our customers have been clamoring for us to support other big data platforms in the same way. By automating the provisioning process of a complex big data environment on bare metal infrastructure, we made life a lot easier for developers who demanded performance and on-demand scalability for their big data applications, and it’s clear that our simple formula produced amazing results. As Marc mentioned when he started breaking down big data database models, document-oriented databases like MongoDB are phenomenal for certain use-cases, and in other situations, a key-value store might be a better fit. With that in mind, we called up our friends at Basho and started building a high-performance architecture specifically for Riak … And I’m excited to announce that we’re launching it today!

Riak is an open source, distributed database platform based on the principles enumerated in the DynamoDB paper. It uses a simple key/value model for object storage, and it was architected for high availability, fault tolerance, operational simplicity and scalability. A Riak cluster is composed of multiple nodes that are all connected, all communicating and sharing data automatically. If one node were to fail, the other nodes would automatically share the data that the failed node was storing and processing until the node is back up and running or a new node is added. See the diagram below for a simple illustration of how adding a node to a cluster works within Riak.

Riak Nodes

We will support both the open source and the Enterprise versions of Riak. The open source version is a great place to start. It has all of the database functionality of Riak Enterprise, but it is limited to a single cluster. The Enterprise version supports replication between clusters across data centers, giving you lots of architectural options. You can use replication to build highly available, live-live failover applications. You can also use it to distribute your application’s data across regions, giving you a global platform that you can update anywhere in the world and know that those modifications will be available anywhere else. Riak Enterprise customers also receive 24×7 coverage, both from SoftLayer and Basho. This includes SoftLayer’s one-hour guaranteed response for Severity 1 hardware issues and unlimited support available via our secure web portal, email and phone.

The business use-case for this flexibility is that if you need to scale up or down, nodes can be easily added or taken down as your requirements change. You can opt for a single-data center environment with a few nodes or you can broaden your architecture to a multi-data center deployment with a 40-node cluster. While these capabilities are inherent in Riak, they can be complicated to build and configure, so we spent countless hours working with Basho to streamline Riak deployment on the SoftLayer platform. The fruit of that labor can be found in our Riak Solution Designer:

Riak Solution Designer

The server configurations and packages in the Riak Solution Designer have been selected to deliver the performance, availability and stability that our customers expect from their bare metal and virtual cloud infrastructure at SoftLayer. With a few quick clicks, you can order a fully configured Riak environment, and it’ll be provisioned and online for you in two to four hours. And everything you order is on a month-to-month contract.

Thanks to the hard work done by the SoftLayer development group and Basho’s team, we’re proud to be the first in the marketplace to offer a turn-key Riak solution on bare metal infrastructure. You don’t need to sacrifice performance and agility for simplicity.

For more information, visit SoftLayer.com/Riak or contact our sales team.

-Duke

December 5, 2012

Breaking Down ‘Big Data’ – Database Models

By in Development, Executive Blog, Infrastructure, Technology

Forester defines big data as “techniques and technologies that make capturing value from data at an extreme scale economical.” Gartner says, “Big data is the term adopted by the market to describe extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets.” Big data demands extreme horizontal scale that traditional IT management can’t handle, and it’s not a challenge exclusive to the Facebooks, Twitters and Tumblrs of the world … Just look at the Google search volume for “big data” over the past eight years:

Big Data Search Interest

Developers are collectively facing information overload. As storage has become more and more affordable, it’s easier to justify collecting and saving more data. Users are more comfortable with creating and sharing content, and we’re able to track, log and index metrics and activity that previously would have been deleted in consideration of space restraints or cost. As the information age progresses, we are collecting more and more data at an ever-accelerating pace, and we’re sharing that data at an incredible rate.

To understand the different facets of this increased usage and demand, Gartner came up with the three V’s of big data that vary significantly from traditional data requirements: Volume, Velocity and Variety. Larger, more abundant pieces of data (“Volume”) are coming at a much faster speed (“Velocity”) in formats like media and walls of text that don’t easily fit into a column-and-row database structure (“Variety”). Given those equally important factors, many of the biggest players in the IT world have been hard at work to create solutions that provide the scale and speed developers need when they build social, analytics, gaming, financial or medical apps with large data sets.

When we talk about scaling databases here, we’re talking about scaling horizontally across multiple servers rather than scaling vertically by upgrading a single server — adding more RAM, increasing HDD capacity, etc. It’s important to make that distinction because it leads to a unique challenge shared by all distributed computer systems: The CAP Theorem. According to the CAP theorem, a distributed storage system must choose to sacrifice either consistency (that everyone sees the same data) or availability (that you can always read/write) while having partition tolerance (where the system continues to operate despite arbitrary message loss or failure of part of the system occurs).

Let’s take a look at a few of the most common database models, what their strengths are, and how they handle the CAP theorem compromise of consistency v. availability:

Relational Databases

What They Do: Stores data in rows/columns. Parent-child records can be joined remotely on the server. Provides speed over scale. Some capacity for vertical scaling, poor capacity for horizontal scaling. This type of database is where most people start.
Horizontal Scaling: In a relational database system, horizontal scaling is possible via replication — dharing data between redundant nodes to ensure consistency — and some people have success sharding — horizontal partitioning of data — but those techniques add a lot of complexity.
CAP Balance: Prefer consistency over availability.
When to use: When you have highly structured data, and you know what you’ll be storing. Great when production queries will be predictable.
Example Products: Oracle, SQLite, PostgreSQL, MySQL

Document-Oriented Databases

What They Do: Stores data in documents. Parent-child records can be stored in the same document and returned in a single fetch operation with no join. The server is aware of the fields stored within a document, can query on them, and return their properties selectively.
Horizontal Scaling: Horizontal scaling is provided via replication, or replication + sharding. Document-oriented databases also usually support relatively low-performance MapReduce for ad-hoc querying.
CAP Balance: Generally prefer consistency over availability
When to Use: When your concept of a “record” has relatively bounded growth, and can store all of its related properties in a single doc.
Example Products: MongoDB, CouchDB, BigCouch, Cloudant

Key-Value Stores

What They Do: Stores an arbitrary value at a key. Most can perform simple operations on a single value. Typically, each property of a record must be fetched in multiple trips, with Redis being an exception. Very simple, and very fast.
Horizontal Scaling: Horizontal scale is provided via sharding.
CAP Balance: Generally prefer consistency over availability.
When to Use: Very simple schemas, caching of upstream query results, or extreme speed scenarios (like real-time counters)
Example Products: CouchBase, Redis, PostgreSQL HStore, LevelDB

BigTable-Inspired Databases

What They Do: Data put into column-oriented stores inspired by Google’s BigTable paper. It has tunable CAP parameters, and can be adjusted to prefer either consistency or availability. Both are sort of operationally intensive.
Horizontal Scaling: Good speed and very wide horizontal scale capabilities.
CAP Balance: Prefer consistency over availability
When to Use: When you need consistency and write performance that scales past the capabilities of a single machine. Hbase in particular has been used with around 1,000 nodes in production.
Example Products: Hbase, Cassandra (inspired by both BigTable and Dynamo)

Dynamo-Inspired Databases

What They Do: Distributed key/value stores inspired by Amazon’s Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency,
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

Each of the database models has strengths and weaknesses, and there are huge communities that support each of the open source examples I gave in each model. If your database is a bottleneck or you’re not getting the flexibility and scalability you need to handle your application’s volume, velocity and variety of data, start looking at some of these “big data” solutions.

Tried any of the above models and have feedback that differs from ours? Leave a comment below and tell us about it!

-@marcalanjones