In my Breaking Down ‘Big Data’ – Database Models, I briefly covered the most common database models, their strengths, and how they handle the CAP theorem — how a distributed storage system balances demands of consistency and availability while maintaining partition tolerance. Here’s what I said about Dynamo-inspired databases:
What They Do: Distributed key/value stores inspired by Amazon’s Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch
This type of key/value store architecture is very unique from the document-oriented MongoDB solutions we launched at the end of last year, so we worked with Basho to prioritize development of high-performance Riak solutions on our global platform. Since you already know about MongoDB, let’s take a few minutes to meet the new kid on the block.
Riak is a distributed database architected for availability, fault tolerance, operational simplicity and scalability. Riak is masterless, so each node in a Riak cluster is the same and contains a complete, independent copy of the Riak package. This design makes the Riak environment highly fault tolerant and scalable, and it also aids in replication — if a node goes down, you can still read, write and update data.
As you approach the daunting prospect of choosing a big data architecture, there are a few simple questions you need to answer:
- How much data do/will I have?
- In what format am I storing my data?
- How important is my data?
Riak may be the choice for you if  you’re working with more than three terabytes of data,  your data is stored in multiple data formats, and  your data must always be available. What does that kind of need look like in real life, though? Luckily, we’ve had a number of customers kick Riak’s tires on SoftLayer bare metal servers, so I can share a few of the use cases we’ve seen that have benefited significantly from Riak’s unique architecture.
Use Case 1 – Digital Media
An advertising company that serves over 10 billion ads per month must be able to quickly deliver its content to millions of end users around the world. Meeting that demand with relational databases would require a complex configuration of expensive, vertically scaled hardware, but it can be scaled out horizontally much easier with Riak. In a matter of only a few hours, the company is up and running with an ad-serving infrastructure that includes a back-end Riak cluster in Dallas with a replication cluster in Singapore along with an application tier on the front end with Web servers, load balancers and CDN.
Use Case 2 – E-commerce
An e-commerce company needs 100-percent availability. If any part of a customer’s experience fails, whether it be on the website or in the shopping cart, sales are lost. Riak’s fault tolerance is a big draw for this kind of use case: Even if one node or component fails, the company’s data is still accessible, and the customer’s user experience is uninterrupted. The shopping cart structure is critical, and Riak is built to be available … It’s a perfect match.
As an additional safeguard, the company can take advantage of simple multi-datacenter replication in their Riak Enterprise environment to geographically disperse content closer to its customers (while also serving as an important tool for disaster recovery and backup).
Use Case 3 – Gaming
With customers like Broken Bulb and Peak Games, SoftLayer is no stranger to the gaming industry, so it should come as no surprise that we’ve seen interesting use cases for Riak from some of our gaming customers. When a game developer incorporated Riak into a new game to store player data like user profiles, statistics and rankings, the performance of the bare metal infrastructure blew him away. As a result, the game’s infrastructure was redesigned to also pull gaming content like images, videos and sounds from the Riak database cluster. Since the environment is so easy to scale horizontally, the process on the infrastructure side took no time at all, and the multimedia content in the game is getting served as quickly as the player data.
Databases are common bottlenecks for many applications, but they don’t have to be. Making the transition from scaling vertically (upgrading hardware, adding RAM, etc.) to scaling horizontally (spreading the work intelligently across multiple nodes) alleviates many of the pain points for a quickly growing database environment. Have you made that transition? If not, what’s holding you back? Have you considered implementing Riak?