Author Archive: Marc Jones

December 5, 2012

Breaking Down ‘Big Data’ – Database Models

By in Development, Executive Blog, Infrastructure, Technology

Forester defines big data as “techniques and technologies that make capturing value from data at an extreme scale economical.” Gartner says, “Big data is the term adopted by the market to describe extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets.” Big data demands extreme horizontal scale that traditional IT management can’t handle, and it’s not a challenge exclusive to the Facebooks, Twitters and Tumblrs of the world … Just look at the Google search volume for “big data” over the past eight years:

Big Data Search Interest

Developers are collectively facing information overload. As storage has become more and more affordable, it’s easier to justify collecting and saving more data. Users are more comfortable with creating and sharing content, and we’re able to track, log and index metrics and activity that previously would have been deleted in consideration of space restraints or cost. As the information age progresses, we are collecting more and more data at an ever-accelerating pace, and we’re sharing that data at an incredible rate.

To understand the different facets of this increased usage and demand, Gartner came up with the three V’s of big data that vary significantly from traditional data requirements: Volume, Velocity and Variety. Larger, more abundant pieces of data (“Volume”) are coming at a much faster speed (“Velocity”) in formats like media and walls of text that don’t easily fit into a column-and-row database structure (“Variety”). Given those equally important factors, many of the biggest players in the IT world have been hard at work to create solutions that provide the scale and speed developers need when they build social, analytics, gaming, financial or medical apps with large data sets.

When we talk about scaling databases here, we’re talking about scaling horizontally across multiple servers rather than scaling vertically by upgrading a single server — adding more RAM, increasing HDD capacity, etc. It’s important to make that distinction because it leads to a unique challenge shared by all distributed computer systems: The CAP Theorem. According to the CAP theorem, a distributed storage system must choose to sacrifice either consistency (that everyone sees the same data) or availability (that you can always read/write) while having partition tolerance (where the system continues to operate despite arbitrary message loss or failure of part of the system occurs).

Let’s take a look at a few of the most common database models, what their strengths are, and how they handle the CAP theorem compromise of consistency v. availability:

Relational Databases

What They Do: Stores data in rows/columns. Parent-child records can be joined remotely on the server. Provides speed over scale. Some capacity for vertical scaling, poor capacity for horizontal scaling. This type of database is where most people start.
Horizontal Scaling: In a relational database system, horizontal scaling is possible via replication — dharing data between redundant nodes to ensure consistency — and some people have success sharding — horizontal partitioning of data — but those techniques add a lot of complexity.
CAP Balance: Prefer consistency over availability.
When to use: When you have highly structured data, and you know what you’ll be storing. Great when production queries will be predictable.
Example Products: Oracle, SQLite, PostgreSQL, MySQL

Document-Oriented Databases

What They Do: Stores data in documents. Parent-child records can be stored in the same document and returned in a single fetch operation with no join. The server is aware of the fields stored within a document, can query on them, and return their properties selectively.
Horizontal Scaling: Horizontal scaling is provided via replication, or replication + sharding. Document-oriented databases also usually support relatively low-performance MapReduce for ad-hoc querying.
CAP Balance: Generally prefer consistency over availability
When to Use: When your concept of a “record” has relatively bounded growth, and can store all of its related properties in a single doc.
Example Products: MongoDB, CouchDB, BigCouch, Cloudant

Key-Value Stores

What They Do: Stores an arbitrary value at a key. Most can perform simple operations on a single value. Typically, each property of a record must be fetched in multiple trips, with Redis being an exception. Very simple, and very fast.
Horizontal Scaling: Horizontal scale is provided via sharding.
CAP Balance: Generally prefer consistency over availability.
When to Use: Very simple schemas, caching of upstream query results, or extreme speed scenarios (like real-time counters)
Example Products: CouchBase, Redis, PostgreSQL HStore, LevelDB

BigTable-Inspired Databases

What They Do: Data put into column-oriented stores inspired by Google’s BigTable paper. It has tunable CAP parameters, and can be adjusted to prefer either consistency or availability. Both are sort of operationally intensive.
Horizontal Scaling: Good speed and very wide horizontal scale capabilities.
CAP Balance: Prefer consistency over availability
When to Use: When you need consistency and write performance that scales past the capabilities of a single machine. Hbase in particular has been used with around 1,000 nodes in production.
Example Products: Hbase, Cassandra (inspired by both BigTable and Dynamo)

Dynamo-Inspired Databases

What They Do: Distributed key/value stores inspired by Amazon’s Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency,
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

Each of the database models has strengths and weaknesses, and there are huge communities that support each of the open source examples I gave in each model. If your database is a bottleneck or you’re not getting the flexibility and scalability you need to handle your application’s volume, velocity and variety of data, start looking at some of these “big data” solutions.

Tried any of the above models and have feedback that differs from ours? Leave a comment below and tell us about it!

-@marcalanjones

April 24, 2012

RightScale + SoftLayer: The Power of Cloud Automation

By in Cloud, Executive Blog, SoftLayer, Technology

SoftLayer’s goal is to provide unparalleled value to the customers who entrust their business-critical computing to us — whether via dedicated hosting, managed hosting, cloud computing or a hybrid environment of all three. We provide the best platform on the market, delivering convenience, ease of use, compelling return on investment (ROI), significant competitive advantage, and consistency in a world where the only real constant seems to be change.

That value proposition is one of the biggest driving forces behind our partnership with RightScale. We’re cloud computing soul mates.

RightScale

RightScale understands the power of automation, and as a result, they’ve created a cloud management platform that they like to say delivers “abstraction with complete customization.” RightScale customers can easily deploy and manage applications across public, private and hybrid cloud environments, unencumbered by the underlying details. They are free to run efficient, scalable, highly available applications with visibility into and control over their computing resources available in one place.

As you know, SoftLayer is fueled by automation as well, and it’s one of our primary differentiators. We’re able to deliver a phenomenal customer experience because every aspect of our platform is fully and seamlessly automated to accelerate provisioning, mitigate human error and provide customers with access and features that our competitors can only dream of. Our customers get simple and total control over an ever-expanding number of back-end services and functions through our easy-to-use Customer Portal and via an open, robust API.

The compatibility between SoftLayer and RightScale is probably pretty clear already, but if you needed another point to ponder, you can ruminate on the fact that we both share expertise and focus across a number of vertical markets. The official announcement of the SoftLayer and RightScale partnership will be particularly noteworthy and interesting in the Internet-based business and online gaming market segments.

It didn’t take long to find an amazing customer success story that demonstrated the value of the new SoftLayer-RightScale partnership. Broken Bulb Game Studios — the developer of social games such as My Town, Braaains, Ninja Warz and Miscrits — is already harnessing the combined feature sets made possible by our partnership with RightScale to simplify its deployment process and scale to meet its customers’ expectations as its games find audiences and growing favor on Facebook. Don’t take our word for it, though … Check out the Broken Bulb quote in today’s press release announcing the partnership.

Broken Bulb Game Studios

Broken Bulb and other developers of social games recognize the importance of getting concepts to market at breakneck speed. They also understand the critical importance of intelligently managing IT resources throughout a game’s life cycle. What they want is fully automated control over computing resources so that they can be allocated dynamically and profitably in immediate response to market signals, and they’re not alone.

Game developers of all sorts — and companies in a growing number of vertical markets — will need and want the same fundamental computing-infrastructure agility.

Our partnership with RightScale is only beginning. You’re going to see some crazy innovation happening now that our cloud computing mad scientists are all working together.

-Marc

February 15, 2012

SoftLayer + OpenStack Swift = SoftLayer Object Storage

By in Cloud, Executive Blog, Infrastructure, SoftLayer, Technology

Since our inception in 2005, SoftLayer’s goal has been to provide an array of on-demand data center and hosting services that combine exceptional access, control, scalability and security with unparalleled network robustness and ease of use … That’s why we’re so excited to unveil SoftLayer Object Storage to our customers.

Based on OpenStack Object Storage (codenamed Swift) — open-source software that allows the creation of redundant, scalable object storage on clusters of standardized servers — SoftLayer Object Storage provides customers with new opportunities to leverage cost-effective cloud-based storage and to simultaneously realize significant capex-related cost savings.

OpenStack has been phenomenally successful thanks to a global software community comprised of developers and other technologists that has built and tweaked a standards-based, massively scalable open-source platform for public and private cloud computing. The simple goal of the OpenStack project is to deliver code that enables any organization to create and offer feature-rich cloud computing services from industry-standard hardware. The overarching OpenStack technology consists of several interrelated project components: One for compute, one for an image service, one for object storage, and a few more projects in development.

SoftLayer Object Storage
Like the OpenStack Swift system on which it is based, SoftLayer Object Storage is not a file system or real-time data-storage system, rather it’s a long-term storage system for a more permanent type of static data that can be retrieved, leveraged and updated when necessary. Typical applications for this type of storage can involve virtual machine images, photo storage, email storage and backup archiving.

One of the primary benefits of Object Storage is the role that it can play in automating and streamlining data storage in cloud computing environments. SoftLayer Object Storage offers rich metadata features and search capability that can be leveraged to automate the way unstructured data gets accessed. In this way, SoftLayer Object Storage will provide organizations with new capabilities for improving overall data management and storage efficiency.

File Storage v. Object Storage
To better understand the difference between file storage and object storage, let’s look at how file storage and object storage differ when it comes to metadata and search for a simple photo image. When a digital camera or camera-enabled phone snaps a photo, it embeds a series of metadata values in the image. If you save the image in a standard image file format, you can search for it by standard file properties like name, date and size. If you save the same image with additional metadata as an object, you can set object metadata values for the image (after reading them from the image file). This detail provides granular search capability based on the metadata keys and values, in addition to the standard object properties. Here is a sample comparison of an image’s metadata value in both systems:

File Metadata Object Metadata
Name:img01.jpg Name:img01.jpg
Date: 2012-02-13 Date:2012-02-13
Size:1.2MB Size:1.2MB
Manufacturer:CASIO
Model:QV-4000
x-Resolution:72.00
y-Resolution:72.00
PixelXDimension:2240
PixelYDimension:1680
FNumber:f/4.0
Exposure Time:1/659 sec.

Using the rich metadata and search capability enabled by object storage, you would be able to search for all images with a dimension of 2240×1680 or a resolution of 72×72 in a quick/automated fashion. The object storage system “understands” more about what is being stored because it is able to differentiate files based on characteristics that you define.

What Makes SoftLayer Object Storage Different?
SoftLayer Object Storage features several unique features and ways for SoftLayer customers to upload, access and manage data:

  • Search — Quickly access information through user-defined metadata key-value pairs, file name or unique identifier
  • CDN — Serve your content globally over our high-performance content delivery network
  • Private Network — Free, secure private network traffic between all data centers and storage cluster nodes
  • API — Access to a full-feature OpenStack-compatible API with additional support for CDN and search integration
  • Portal — Web application integrated into the SoftLayer portal
  • Mobile — iPhone and Android mobile apps, with Windows Phone app coming soon
  • Language Bindings — Feature-complete bindings for Java, PHP, Python and Ruby*

*Language bindings, documentation, and guides are available on SLDN.

We think SoftLayer Object Storage will be attractive to a broad range of current and prospective customers, from web-centric businesses dependent on file sharing and content distribution to legal/medical/financial-services companies which possess large volumes of data that must be stored securely while remaining readily accessible.

SoftLayer Object Storage significantly extends our cloud-services portfolio while substantially enriching the storage capabilities that we bring to our customers. What are you waiting for? Go order yourself some object storage @ $0.12/GB!

-Marc

February 1, 2012

Flex Images: Blur the Line Between Cloud and Dedicated

By in Cloud, Executive Blog, SoftLayer, Technology

Our customers are not concerned with technology for technology’s sake. Information technology should serve a purpose; it should function as an integral means to a desired end. Understandably, our customers are focused, first and foremost, on their application architecture and infrastructure. They want, and need, the freedom and flexibility to design their applications to their specifications.

Many companies leverage the cloud to take advantage of core features that enable robust, agile architectures. Elasticity (ability to quickly increase or decrease compute capacity) and flexibility (choice such as cores, memory and storage) combine to provide solutions that scale to meet the demands of modern applications.

Another widely used feature of cloud computing is image-based provisioning. Rapid provisioning of cloud resources is accomplished, in part, through the use of images. Imaging capability extends beyond the use of base images, allowing users to create customized images that preserve their software installs and configurations. The images persist in an image library, allowing users to launch new cloud instances based their images.

But why should images only be applicable to virtualized cloud resources?

Toward that end, we’re excited to introduce SoftLayer Flex Images, a new capability that allows us to capture images of physical and virtual servers, store them all in one library, and rapidly deploy those images on either platform.

SoftLayer Flex Images

Physical servers now share the core features of virtual servers—elasticity and flexibility. With Flex Images, you can move seamlessly between and environments as your needs change.

Let’s say you’re running into resource limits in a cloud server environment—your data-intensive server is I/O bound—and you want to move the instance to a more powerful dedicated server. Using Flex Images, you can create an image of your cloud server and, extending our I/O bound example, deploy it to a custom dedicated server with SSD drives.

Conversely, a dedicated environment can be quickly replicated on multiple cloud instances if you want the scaling capability of the cloud to meet increased demand. Maybe your web heads run on dedicated servers, but you’re starting to see periods of usage that stress your servers. Create a Flex Image from your dedicated server and use it to deploy cloud instances to meet demand.

Flex Image technology blurs the distinctions—and breaks down the walls—between virtual and physical computing environments.

We don’t think of Flex Images as new product. Instead—like our network, our portal, our automated platform, and our globe-spanning geographic diversity—Flex Image capability is a free resource for our customers (with the exception of standard nominal costs in storing the Flex Images).

We think Flex Images represents not only great value, but also provides a further example of how SoftLayer innovates continually to bring new capabilities and the highest possible level of customer control to our automated services platform.

To sum up, here are some of the key features and benefits of SoftLayer Flex Images:

  • Universal images that can be used interchangeably on dedicated or cloud systems
  • Unified image library for archiving, managing, sharing, and publishing images
  • Greater flexibility and higher scalability
  • Rapid provisioning of new dedicated and cloud environments
  • Available via SoftLayer’s management portal and API

In public beta, Flex Images are available now. We invite you to try them out, and, as always, we want to hear what you think.

-Marc

February 16, 2011

3 Bars | 3 Questions: Big Data and Search

By in 3 Bars 3 Questions, Culture, Development, SoftLayer, Technology

Last week, Duke chose me as this week’s “3 Bars | 3 Questions” participant, so my desk chair became the hot seat this afternoon. The topic of discussion: “Big Data and Search.”

Have you started working with big data? What’s the best method you’ve found to keep it organized and accessible? How do you scale your infrastructure to maintain performance?

-Marc