Author Archive: Marc Jones

January 31, 2014

Simplified OpenStack Deployment on SoftLayer

"What is SoftLayer doing with OpenStack?" I can't even begin to count the number of times I've been asked that question over the last few years. In response, I'll usually explain how we've built our object storage platform on top of OpenStack Swift, or I'll give a few examples of how our customers have used SoftLayer infrastructure to build and scale their own OpenStack environments. Our virtual and bare metal cloud servers provide a powerful and flexible foundation for any OpenStack deployment, and our unique three-tiered network integrates perfectly with OpenStack's Compute and Network node architecture, so it's high time we make it easier to build an OpenStack environment on SoftLayer infrastructure.

To streamline and simplify OpenStack deployment for the open source community, we've published Opscode Chef recipes for both OpenStack Grizzly and OpenStack Havana on GitHub: SoftLayer Chef-Openstack. With Chef and SoftLayer, your own OpenStack cloud is a cookbook away. These recipes were designed with the needs of growth and scalability in mind. Let's take a deeper look into what exactly that means.

OpenStack has adopted a three-node design whereby a controller, compute, and network node make up its architecture:

OpenStack Architecture on SoftLayer

Looking more closely at any one node reveal the services it provides. Scaling the infrastructure beyond a few dozen nodes, using this model, could create bottlenecks in services such as your block store, OpenStack Cinder, and image store, OpenStack Glance, since they are traditionally located on the controller node. Infrastructure requirements change from service to service as well. For example OpenStack Neutron, the networking service, does not need much disk I/O while the Cinder storage service might heavily rely on a node's hard disk. Our cookbook allows you to choose how and where to deploy the services, and it even lets you break apart the MySQL backend to further improve platform performance.

Quick Start: Local Demo Environment

To make it easy to get started, we've created a rapid prototype and sandbox script for use with Vagrant and Virtual Box. With Vagrant, you can easily spin up a demo environment of Chef Server and OpenStack in about 15 minutes on moderately good laptops or desktops. Check it out here. This demo environment is an all-in-one installation of our Chef OpenStack deployment. It also installs a basic Chef server as a sandbox to help you see how the SoftLayer recipes were deployed.

Creating a Custom OpenStack Deployment

The thee-node OpenStack model does well in small scale and meets the needs of many consumers; however, control and customizability are the tenants for the design of the SoftLayer OpenStack Chef cookbook. In our model, you have full control over the configuration and location of eleven different components in your deployed environment:

Our Chef recipes will take care of populating the configuration files with the necessary information so you won't have to. When deploying, you merely add the role for the matching service to a hardware or virtual server node, and Chef will deploy the service to it with all the configuration done automatically, including adding multiple Neutron, Nova, and Cinder nodes. This approach allows you to tailor the needs of each service to the hardware it will be deployed to--you might put your Neutron hardware node on a server with 10-gigabit network interfaces and configure your Cinder hardware node with RAID 1+0 15k SAS drives.

OpenStack is a fast growing project for the implementation of IaaS in public and private clouds, but its deployment and configuration can be overwhelming. We created this cookbook to make the process of deploying a full OpenStack environment on SoftLayer quick and straightforward. With the simple configuration of eleven Chef roles, your OpenStack cloud can be deployed onto as little as one node and scaled up to many as hundreds (or thousands).

To follow this project, visit SoftLayer on GitHub. Check out some of our other projects on GitHub, and let us know if you need any help or want to contribute.

-@marcalanjones

July 9, 2013

When to Consider Riak for Your Big Data Architecture

In my Breaking Down 'Big Data' – Database Models, I briefly covered the most common database models, their strengths, and how they handle the CAP theorem — how a distributed storage system balances demands of consistency and availability while maintaining partition tolerance. Here's what I said about Dynamo-inspired databases:

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

This type of key/value store architecture is very unique from the document-oriented MongoDB solutions we launched at the end of last year, so we worked with Basho to prioritize development of high-performance Riak solutions on our global platform. Since you already know about MongoDB, let's take a few minutes to meet the new kid on the block.

Riak is a distributed database architected for availability, fault tolerance, operational simplicity and scalability. Riak is masterless, so each node in a Riak cluster is the same and contains a complete, independent copy of the Riak package. This design makes the Riak environment highly fault tolerant and scalable, and it also aids in replication — if a node goes down, you can still read, write and update data.

As you approach the daunting prospect of choosing a big data architecture, there are a few simple questions you need to answer:

  1. How much data do/will I have?
  2. In what format am I storing my data?
  3. How important is my data?

Riak may be the choice for you if [1] you're working with more than three terabytes of data, [2] your data is stored in multiple data formats, and [3] your data must always be available. What does that kind of need look like in real life, though? Luckily, we've had a number of customers kick Riak's tires on SoftLayer bare metal servers, so I can share a few of the use cases we've seen that have benefited significantly from Riak's unique architecture.

Use Case 1 – Digital Media
An advertising company that serves over 10 billion ads per month must be able to quickly deliver its content to millions of end users around the world. Meeting that demand with relational databases would require a complex configuration of expensive, vertically scaled hardware, but it can be scaled out horizontally much easier with Riak. In a matter of only a few hours, the company is up and running with an ad-serving infrastructure that includes a back-end Riak cluster in Dallas with a replication cluster in Singapore along with an application tier on the front end with Web servers, load balancers and CDN.

Use Case 2 – E-commerce
An e-commerce company needs 100-percent availability. If any part of a customer's experience fails, whether it be on the website or in the shopping cart, sales are lost. Riak's fault tolerance is a big draw for this kind of use case: Even if one node or component fails, the company's data is still accessible, and the customer's user experience is uninterrupted. The shopping cart structure is critical, and Riak is built to be available ... It's a perfect match.

As an additional safeguard, the company can take advantage of simple multi-datacenter replication in their Riak Enterprise environment to geographically disperse content closer to its customers (while also serving as an important tool for disaster recovery and backup).

Use Case 3 – Gaming
With customers like Broken Bulb and Peak Games, SoftLayer is no stranger to the gaming industry, so it should come as no surprise that we've seen interesting use cases for Riak from some of our gaming customers. When a game developer incorporated Riak into a new game to store player data like user profiles, statistics and rankings, the performance of the bare metal infrastructure blew him away. As a result, the game's infrastructure was redesigned to also pull gaming content like images, videos and sounds from the Riak database cluster. Since the environment is so easy to scale horizontally, the process on the infrastructure side took no time at all, and the multimedia content in the game is getting served as quickly as the player data.

Databases are common bottlenecks for many applications, but they don't have to be. Making the transition from scaling vertically (upgrading hardware, adding RAM, etc.) to scaling horizontally (spreading the work intelligently across multiple nodes) alleviates many of the pain points for a quickly growing database environment. Have you made that transition? If not, what's holding you back? Have you considered implementing Riak?

-@marcalanjones

December 5, 2012

Breaking Down 'Big Data' - Database Models

Forester defines big data as "techniques and technologies that make capturing value from data at an extreme scale economical." Gartner says, "Big data is the term adopted by the market to describe extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets." Big data demands extreme horizontal scale that traditional IT management can't handle, and it's not a challenge exclusive to the Facebooks, Twitters and Tumblrs of the world ... Just look at the Google search volume for "big data" over the past eight years:

Big Data Search Interest

Developers are collectively facing information overload. As storage has become more and more affordable, it's easier to justify collecting and saving more data. Users are more comfortable with creating and sharing content, and we're able to track, log and index metrics and activity that previously would have been deleted in consideration of space restraints or cost. As the information age progresses, we are collecting more and more data at an ever-accelerating pace, and we're sharing that data at an incredible rate.

To understand the different facets of this increased usage and demand, Gartner came up with the three V's of big data that vary significantly from traditional data requirements: Volume, Velocity and Variety. Larger, more abundant pieces of data ("Volume") are coming at a much faster speed ("Velocity") in formats like media and walls of text that don't easily fit into a column-and-row database structure ("Variety"). Given those equally important factors, many of the biggest players in the IT world have been hard at work to create solutions that provide the scale and speed developers need when they build social, analytics, gaming, financial or medical apps with large data sets.

When we talk about scaling databases here, we're talking about scaling horizontally across multiple servers rather than scaling vertically by upgrading a single server — adding more RAM, increasing HDD capacity, etc. It's important to make that distinction because it leads to a unique challenge shared by all distributed computer systems: The CAP Theorem. According to the CAP theorem, a distributed storage system must choose to sacrifice either consistency (that everyone sees the same data) or availability (that you can always read/write) while having partition tolerance (where the system continues to operate despite arbitrary message loss or failure of part of the system occurs).

Let's take a look at a few of the most common database models, what their strengths are, and how they handle the CAP theorem compromise of consistency v. availability:

Relational Databases

What They Do: Stores data in rows/columns. Parent-child records can be joined remotely on the server. Provides speed over scale. Some capacity for vertical scaling, poor capacity for horizontal scaling. This type of database is where most people start.
Horizontal Scaling: In a relational database system, horizontal scaling is possible via replication — dharing data between redundant nodes to ensure consistency — and some people have success sharding — horizontal partitioning of data — but those techniques add a lot of complexity.
CAP Balance: Prefer consistency over availability.
When to use: When you have highly structured data, and you know what you'll be storing. Great when production queries will be predictable.
Example Products: Oracle, SQLite, PostgreSQL, MySQL

Document-Oriented Databases

What They Do: Stores data in documents. Parent-child records can be stored in the same document and returned in a single fetch operation with no join. The server is aware of the fields stored within a document, can query on them, and return their properties selectively.
Horizontal Scaling: Horizontal scaling is provided via replication, or replication + sharding. Document-oriented databases also usually support relatively low-performance MapReduce for ad-hoc querying.
CAP Balance: Generally prefer consistency over availability
When to Use: When your concept of a "record" has relatively bounded growth, and can store all of its related properties in a single doc.
Example Products: MongoDB, CouchDB, BigCouch, Cloudant

Key-Value Stores

What They Do: Stores an arbitrary value at a key. Most can perform simple operations on a single value. Typically, each property of a record must be fetched in multiple trips, with Redis being an exception. Very simple, and very fast.
Horizontal Scaling: Horizontal scale is provided via sharding.
CAP Balance: Generally prefer consistency over availability.
When to Use: Very simple schemas, caching of upstream query results, or extreme speed scenarios (like real-time counters)
Example Products: CouchBase, Redis, PostgreSQL HStore, LevelDB

BigTable-Inspired Databases

What They Do: Data put into column-oriented stores inspired by Google's BigTable paper. It has tunable CAP parameters, and can be adjusted to prefer either consistency or availability. Both are sort of operationally intensive.
Horizontal Scaling: Good speed and very wide horizontal scale capabilities.
CAP Balance: Prefer consistency over availability
When to Use: When you need consistency and write performance that scales past the capabilities of a single machine. Hbase in particular has been used with around 1,000 nodes in production.
Example Products: Hbase, Cassandra (inspired by both BigTable and Dynamo)

Dynamo-Inspired Databases

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency,
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

Each of the database models has strengths and weaknesses, and there are huge communities that support each of the open source examples I gave in each model. If your database is a bottleneck or you're not getting the flexibility and scalability you need to handle your application's volume, velocity and variety of data, start looking at some of these "big data" solutions.

Tried any of the above models and have feedback that differs from ours? Leave a comment below and tell us about it!

-@marcalanjones

April 24, 2012

RightScale + SoftLayer: The Power of Cloud Automation

SoftLayer's goal is to provide unparalleled value to the customers who entrust their business-critical computing to us — whether via dedicated hosting, managed hosting, cloud computing or a hybrid environment of all three. We provide the best platform on the market, delivering convenience, ease of use, compelling return on investment (ROI), significant competitive advantage, and consistency in a world where the only real constant seems to be change.

That value proposition is one of the biggest driving forces behind our partnership with RightScale. We're cloud computing soul mates.

RightScale

RightScale understands the power of automation, and as a result, they've created a cloud management platform that they like to say delivers "abstraction with complete customization." RightScale customers can easily deploy and manage applications across public, private and hybrid cloud environments, unencumbered by the underlying details. They are free to run efficient, scalable, highly available applications with visibility into and control over their computing resources available in one place.

As you know, SoftLayer is fueled by automation as well, and it's one of our primary differentiators. We're able to deliver a phenomenal customer experience because every aspect of our platform is fully and seamlessly automated to accelerate provisioning, mitigate human error and provide customers with access and features that our competitors can only dream of. Our customers get simple and total control over an ever-expanding number of back-end services and functions through our easy-to-use Customer Portal and via an open, robust API.

The compatibility between SoftLayer and RightScale is probably pretty clear already, but if you needed another point to ponder, you can ruminate on the fact that we both share expertise and focus across a number of vertical markets. The official announcement of the SoftLayer and RightScale partnership will be particularly noteworthy and interesting in the Internet-based business and online gaming market segments.

It didn't take long to find an amazing customer success story that demonstrated the value of the new SoftLayer-RightScale partnership. Broken Bulb Game Studios — the developer of social games such as My Town, Braaains, Ninja Warz and Miscrits — is already harnessing the combined feature sets made possible by our partnership with RightScale to simplify its deployment process and scale to meet its customers' expectations as its games find audiences and growing favor on Facebook. Don't take our word for it, though ... Check out the Broken Bulb quote in today's press release announcing the partnership.

Broken Bulb Game Studios

Broken Bulb and other developers of social games recognize the importance of getting concepts to market at breakneck speed. They also understand the critical importance of intelligently managing IT resources throughout a game's life cycle. What they want is fully automated control over computing resources so that they can be allocated dynamically and profitably in immediate response to market signals, and they're not alone.

Game developers of all sorts — and companies in a growing number of vertical markets — will need and want the same fundamental computing-infrastructure agility.

Our partnership with RightScale is only beginning. You're going to see some crazy innovation happening now that our cloud computing mad scientists are all working together.

-Marc

February 15, 2012

SoftLayer + OpenStack Swift = SoftLayer Object Storage

Since our inception in 2005, SoftLayer's goal has been to provide an array of on-demand data center and hosting services that combine exceptional access, control, scalability and security with unparalleled network robustness and ease of use ... That's why we're so excited to unveil SoftLayer Object Storage to our customers.

Based on OpenStack Object Storage (codenamed Swift) — open-source software that allows the creation of redundant, scalable object storage on clusters of standardized servers — SoftLayer Object Storage provides customers with new opportunities to leverage cost-effective cloud-based storage and to simultaneously realize significant capex-related cost savings.

OpenStack has been phenomenally successful thanks to a global software community comprised of developers and other technologists that has built and tweaked a standards-based, massively scalable open-source platform for public and private cloud computing. The simple goal of the OpenStack project is to deliver code that enables any organization to create and offer feature-rich cloud computing services from industry-standard hardware. The overarching OpenStack technology consists of several interrelated project components: One for compute, one for an image service, one for object storage, and a few more projects in development.

SoftLayer Object Storage
Like the OpenStack Swift system on which it is based, SoftLayer Object Storage is not a file system or real-time data-storage system, rather it's a long-term storage system for a more permanent type of static data that can be retrieved, leveraged and updated when necessary. Typical applications for this type of storage can involve virtual machine images, photo storage, email storage and backup archiving.

One of the primary benefits of Object Storage is the role that it can play in automating and streamlining data storage in cloud computing environments. SoftLayer Object Storage offers rich metadata features and search capability that can be leveraged to automate the way unstructured data gets accessed. In this way, SoftLayer Object Storage will provide organizations with new capabilities for improving overall data management and storage efficiency.

File Storage v. Object Storage
To better understand the difference between file storage and object storage, let's look at how file storage and object storage differ when it comes to metadata and search for a simple photo image. When a digital camera or camera-enabled phone snaps a photo, it embeds a series of metadata values in the image. If you save the image in a standard image file format, you can search for it by standard file properties like name, date and size. If you save the same image with additional metadata as an object, you can set object metadata values for the image (after reading them from the image file). This detail provides granular search capability based on the metadata keys and values, in addition to the standard object properties. Here is a sample comparison of an image's metadata value in both systems:

File Metadata Object Metadata
Name:img01.jpg Name:img01.jpg
Date: 2012-02-13 Date:2012-02-13
Size:1.2MB Size:1.2MB
Manufacturer:CASIO
Model:QV-4000
x-Resolution:72.00
y-Resolution:72.00
PixelXDimension:2240
PixelYDimension:1680
FNumber:f/4.0
Exposure Time:1/659 sec.

Using the rich metadata and search capability enabled by object storage, you would be able to search for all images with a dimension of 2240x1680 or a resolution of 72x72 in a quick/automated fashion. The object storage system "understands" more about what is being stored because it is able to differentiate files based on characteristics that you define.

What Makes SoftLayer Object Storage Different?
SoftLayer Object Storage features several unique features and ways for SoftLayer customers to upload, access and manage data:

  • Search — Quickly access information through user-defined metadata key-value pairs, file name or unique identifier
  • CDN — Serve your content globally over our high-performance content delivery network
  • Private Network — Free, secure private network traffic between all data centers and storage cluster nodes
  • API — Access to a full-feature OpenStack-compatible API with additional support for CDN and search integration
  • Portal — Web application integrated into the SoftLayer portal
  • Mobile — iPhone and Android mobile apps, with Windows Phone app coming soon
  • Language Bindings — Feature-complete bindings for Java, PHP, Python and Ruby*

*Language bindings, documentation, and guides are available on SLDN.

We think SoftLayer Object Storage will be attractive to a broad range of current and prospective customers, from web-centric businesses dependent on file sharing and content distribution to legal/medical/financial-services companies which possess large volumes of data that must be stored securely while remaining readily accessible.

SoftLayer Object Storage significantly extends our cloud-services portfolio while substantially enriching the storage capabilities that we bring to our customers. What are you waiting for? Go order yourself some object storage @ $0.12/GB!

-Marc

February 1, 2012

Flex Images: Blur the Line Between Cloud and Dedicated

Our customers are not concerned with technology for technology's sake. Information technology should serve a purpose; it should function as an integral means to a desired end. Understandably, our customers are focused, first and foremost, on their application architecture and infrastructure. They want, and need, the freedom and flexibility to design their applications to their specifications.

Many companies leverage the cloud to take advantage of core features that enable robust, agile architectures. Elasticity (ability to quickly increase or decrease compute capacity) and flexibility (choice such as cores, memory and storage) combine to provide solutions that scale to meet the demands of modern applications.

Another widely used feature of cloud computing is image-based provisioning. Rapid provisioning of cloud resources is accomplished, in part, through the use of images. Imaging capability extends beyond the use of base images, allowing users to create customized images that preserve their software installs and configurations. The images persist in an image library, allowing users to launch new cloud instances based their images.

But why should images only be applicable to virtualized cloud resources?

Toward that end, we're excited to introduce SoftLayer Flex Images, a new capability that allows us to capture images of physical and virtual servers, store them all in one library, and rapidly deploy those images on either platform.

SoftLayer Flex Images

Physical servers now share the core features of virtual servers—elasticity and flexibility. With Flex Images, you can move seamlessly between and environments as your needs change.

Let's say you're running into resource limits in a cloud server environment—your data-intensive server is I/O bound—and you want to move the instance to a more powerful dedicated server. Using Flex Images, you can create an image of your cloud server and, extending our I/O bound example, deploy it to a custom dedicated server with SSD drives.

Conversely, a dedicated environment can be quickly replicated on multiple cloud instances if you want the scaling capability of the cloud to meet increased demand. Maybe your web heads run on dedicated servers, but you're starting to see periods of usage that stress your servers. Create a Flex Image from your dedicated server and use it to deploy cloud instances to meet demand.

Flex Image technology blurs the distinctions—and breaks down the walls—between virtual and physical computing environments.

We don't think of Flex Images as new product. Instead—like our network, our portal, our automated platform, and our globe-spanning geographic diversity—Flex Image capability is a free resource for our customers (with the exception of standard nominal costs in storing the Flex Images).

We think Flex Images represents not only great value, but also provides a further example of how SoftLayer innovates continually to bring new capabilities and the highest possible level of customer control to our automated services platform.

To sum up, here are some of the key features and benefits of SoftLayer Flex Images:

  • Universal images that can be used interchangeably on dedicated or cloud systems
  • Unified image library for archiving, managing, sharing, and publishing images
  • Greater flexibility and higher scalability
  • Rapid provisioning of new dedicated and cloud environments
  • Available via SoftLayer's management portal and API

In public beta, Flex Images are available now. We invite you to try them out, and, as always, we want to hear what you think.

-Marc

February 16, 2011

3 Bars | 3 Questions: Big Data and Search

Last week, Duke chose me as this week's "3 Bars | 3 Questions" participant, so my desk chair became the hot seat this afternoon. The topic of discussion: "Big Data and Search."

Have you started working with big data? What's the best method you've found to keep it organized and accessible? How do you scale your infrastructure to maintain performance?

-Marc

Subscribe to Author Archive: %