Posts Tagged 'Data'

September 18, 2014

The Cloud Doesn't Bite, Part III

Why it's OK to be a server-hugger—a cloud server hugger.

(This is the final post in a three-part series. Read the first and second posts here.)

By now, you probably understand the cloud enough to know what it is and does. Maybe it's something you've even considered for your own business. But you're still not sold. You still have nagging concerns. You still have questions that you wish you could ask, but you're pretty sure no cloud company would dignify those questions with an honest, legitimate response.

Well we’re a cloud company, and we’ll answer those questions.

Inspired by a highly illuminating (!) thread on Slashdot about the video embedded below, we've noticed that some of you aren't ready to get your head caught up in the cloud just yet. And that's cool. But let's see if maybe we can put a few of those fears to rest right now.

“[The] reason that companies are hesitant to commit all of their IT to the cloud [relates to] keeping control. It's not about jobs, it's about being sure that critical services are available when you need them. Whenever you see ‘in the CLOUD!’, mentally replace it with ‘using someone else's server’—all of a sudden it looks a whole lot less appealing. Yes, you gain some flexibility, but you lose a LOT of control. I like my data to not be in the hands of someone else. If I don't control the actual machine that has my data on it, then I don't control the data.”

You guys are control FREAKS! And rightfully so. But some of us actually don't take that away from you. Believe it or not, we make it easier for you.

In fact, sometimes you even get to manage your own infrastructure—and that means you can do anything an employee can do. You'll probably even get so good at it that you'll wonder why we don't pay you.

But it doesn't stop at mere management. Oh, no, no, no, friends. You can even take it one further and build, manage, and have total control over your very own private cloud of virtual servers. Yes, yours, and yours only. Now announcing you, the shot caller.

The point is, you don't lose control over your data in the cloud. None. 'Cause cloud companies don't play like that.

“The first rule of computer security is physical access, which is impossible with cloud services, which means they are inherently insecure.”

Curious. So since you can't physically touch your money in your bank account, does that mean it's a free-for-all on your savings? Let us know; we'll bring buckets.

“These cloud guys always forget to mention one glaring problem with their model— they're not adding any new software to the picture.”

Ready for us to blow your minds? We're actually adding software all the time; you just don't see it—but you do feel it.

Your friendly Infrastructure as a Service (IaaS) providers out there are doing a lot of development behind the scenes. An internal software update might let us deploy servers 10 minutes faster, for example. You won't see that, but that doesn't mean it's not happening. If you're happy with your servers, then rest assured you're seeing some sweet software in action. Some cloud companies aren't exclusively focused on software (think Salesforce), but that doesn't mean the software is dial-up grade.

“I personally don't trust the cloud. Think about it for a moment. You are putting your data on a server, and you have no clue as to where it is. You have no clue about who else is able to see that data, and you have no clue about who is watching as you access your data and probably no clue if that server is up to date on security patches.”

Just ask. Simply ask all these questions, and you'd have all these answers. Not to be cheeky, but all of this is information you can and do have a right to know before you commit to anything. We're not sure what makes you think you don't, but you do. Your own due diligence on behalf of your data makes that a necessity, not a luxury.

“As long as I'm accountable, I want the hardware and software under my control. That way when something goes wrong and my boss calls and asks 'WTF?', I can give him something more than ’Well I called Amazon and left a message with our account representative.’"

We can't speak for Amazon, but cloud companies often offer multiple ways you can get a hold of a real, live person because we get that you want to talk to us, like, yesterday. Yes, we totally get you. And we want to fix whatever ails you. In the cloud, that is.

But what makes you think we won't know when something goes wrong before you do? (Checkmate.)

“No matter how much marketing jargon you spew at people, ‘the cloud’ is still just a bunch of servers. Stop lying.”

Why yes, yes, it is. Who's lying to you about that? You're right. "They" should stop lying.

The concept of "the cloud" is simply about where the servers are located and how you consume computing, storage, and networking resources. In "the cloud," your servers are accessed remotely via a network connection (often the Internet, for most of the clouds you know and love) as opposed to being locally accessed while housed in a server room or physical location on the company premises. Your premises, as in wherever you are while performing your computing functions. But no one's trying to pull the wool over your eyes with that one.

Think about it this way: If servers at your location are "on the ground," then servers away from your location can be considered "in the cloud." And that's all there is to it.

Did we help? Did we clear the cloudy haze? We certainly hope so.

But this is just the beginning, and our door is always open for you to question, criticize, and wax philosophical with us when it comes to all things cloud. So get at us. You can chat with us live via our homepage, message us or post up on Facebook, or sling a tweet at a SLayer. We've got real, live people manning their stations. Consider the gauntlet thrown.

-Fayza

August 14, 2013

Setting Up Your New Server

As a technical support specialist at SoftLayer, I work with new customers regularly, and after fielding a lot of the same kinds of questions about setting up a new server, I thought I'd put together a quick guide that could be a resource for other new customers who are interested in implementing a few best practices when it comes to setting up and securing a new server. This documentation is based on my personal server setup experience and on the experience I've had helping customers with their new servers, so it shouldn't be considered an exhaustive, authoritative guide ... just a helpful and informative one.

Protect Your Data

First and foremost, configure backups for your server. The server is worthless without your data. Data is your business. An old adage says, "It's better to have and not need, than to need and not have." Imagine what would happen to your business if you lost just some of your data. There's no excuse for neglecting backup when configuring your new server. SoftLayer does not backup your server, but SoftLayer offers several options for data protection and backup to fit any of your needs.

Control panels like cPanel and Plesk include backup functionality and can be configured to automatically backup regularly an FTP/NAS account. Configure backups now, before doing anything else. Before migrating or copying your data to the server. This first (nearly empty) backup will be quick. Test the backup by restoring the data. If your server has RAID, it important to remember that RAID is not backup!

For more tips about setting up and checking your backups, check out Risk Management: The Importance of Redundant Backups

Use Strong Passwords

I've seen some very week and vulnerable password on customers' servers. SoftLayer sets a random, complex password on every new server that is provisioned. Don't change it to a weak password using names, birthdays and other trivia that can be found or guessed easily. Remember, a strong password doesn't have to be a complicated one: xkcd: Password Strength

Write down your passwords: "If I write them down and then protect the piece of paper — or whatever it is I wrote them down on — there is nothing wrong with that. That allows us to remember more passwords and better passwords." "We're all good at securing small pieces of paper. I recommend that people write their passwords down on a small piece of paper, and keep it with their other valuable small pieces of paper: in their wallet." Just don't use any of these passwords.
I've gone electronic and use 1Password and discovered just how many passwords I deal with. With such strong, random passwords, you don't have to change your password frequently, but if you have to, you don't have to worry about remembering the new one or updating all of your notes. If passwords are too much of a hassle ...

Or Don't Use Passwords

One of the wonderful things of SSH/SFTP on Linux/FreeBSD is that SSH-keys obviate the problem of passwords. Mac and Linux/FreeBSD have an SSH-client installed by default! There are a lot of great SSH clients available for every medium you'll use to access your server. For Windows, I recommend PuTTY, and for iOS, Panic Prompt.

Firewall

Firewalls block network connections. Configuring a firewall manually can get very complicated, especially when involving protocols like FTP which opens random ports on either the client or the server. A quick way to deal with this is to use the system-config-securitylevel-tui tool. Or better, use a firewall front end such as APF or CSF. These tools also simplify blocking or unblocking IPs.

Firewall Allow Block Unblock
APF apf -a <IP> apf -d <IP> apf -u <IP>
CSF* csf -a <IP> csf -d <IP> csf -dr <IP>

*CSF has a handy search command: csf -g <IP>.

SoftLayer customers should be sure to allow SoftLayer IP ranges through the firewall so we can better support you when you have questions or need help. Beyond blocking and allowing IP addresses, it's also important to lock down the ports on your server. The only open ports on your system should be the ones you want to use. Here's a quick list of some of the most common ports:

cPanel ports

  • 2078 - webDisk
  • 2083 - cPanel control panel
  • 2087 - WHM control panel
  • 2096 - webmail

Other

  • 22 - SSH (secure shell - Linux)
  • 53 - DNS name servers
  • 3389 - RDP (Remote Desktop Protocol - Windows)
  • 8443 - Plesk control panel

Mail Ports

  • 25 - SMTP
  • 110 - POP3
  • 143 - IMAP
  • 465 - SMTPS
  • 993 - IMAPS
  • 995 - POP3S

Web Server Ports

  • 80 - HTTP
  • 443 - HTTPS

DNS

DNS is a naming system for computers and services on the Internet. Domain names like "softlayer.com" and "manage.softlayer.com" are easier to remember than IP address like 66.228.118.53 or even 2607:f0d0:1000:11:1::4 in IPv6. DNS looks up a domain's A record (AAAA record for IPv6), to retrieve its IP address. The opposite of an A record is a PTR record: PTR records resolve an IP address to a domain name.

Hostname
A hostname is the human-readable label you assign of your server to help you differentiate it from your other devices. A hostname should resolve to its server's main IP address, and the IP should resolve back to the hostname via a PTR record. This configuration is extremely important for email ... assuming you don't want all of your emails rejected as spam.

Avoid using "www" at the beginning of a hostname because it may conflict with a website on your server. The hostnames you choose don't have to be dry or boring. I've seen some pretty awesome hostname naming conventions over the years (Simpsons characters, Greek gods, superheros), so if you aren't going to go with a traditional naming convention, you can get creative and have some fun with it. A server's hostname can be changed in the customer portal and in the server's control panel. In cPanel, the hostname can be easily set in "Networking Setup". In Plesk, the hostname is set in "Server Preferences". Without a control panel, you can update the hostname from your operating system (ex. RedHat, Debian)

A Records
If you buy your domain name from SoftLayer, it is automatically added to our nameservers, but if your domain was registered externally, you'll need to go through a few additional steps to ensure your domain resolves correctly on our servers. To include your externally-registered domain on our DNS, you should first point it at our nameservers (ns1.softlayer.com, ns2.softlayer.com). Next, Add a DNS Zone, then add an A record corresponding to the hostname you chose earlier.

PTR Records
Many ISPs configure their servers that receive email to lookup the IP address of the domain in a sender's email address (a reverse DNS check) to see that the domain name matches the email server's host name. You can look up the PTR record for your IP address. In Terminal.app (Mac) or Command Prompt (Windows), type "nslookup" command followed by the IP. If the PTR doesn't match up, you can change the PTR easily.

NSLookup

SSL Certificates

Getting an SSL certificate for your site is optional, but it has many benefits. The certificates will assure your customers that they are looking at your site securely. SSL encrypts passwords and data sent over the network. Any website using SSL Certificates should be assigned its own IP address. For more information, we have a great KnowledgeLayer article about planning ahead for an SSL, and there's plenty of documentation on how to manage SSL certificates in cPanel and Plesk.

Move In!

Now that you've prepared your server and protected your data, you are ready to migrate your content to its new home. Be proactive about monitoring and managing your server once it's in production. These tips aren't meant to be a one-size-fits-all, "set it and forget it" solution; they're simply important aspects to consider when you get started with a new server. You probably noticed that I alluded to control panels quite a few times in this post, and that's for good reason: If you don't feel comfortable with all of the ins and outs of server administration, control panels are extremely valuable resources that do a lot of the heavy lifting for you.

If you have any questions about setting up your new server or you need any help with your SoftLayer account, remember that we're only a phone call away!

-Lyndell

December 17, 2012

Big Data at SoftLayer: The Importance of IOPS

The jet flow gates in the Hoover Dam can release up to 73,000 cubic feet — the equivalent of 546,040 gallons — of water per second at 120 miles per hour. Imagine replacing those jet flow gates with a single garden hose that pushes 25 gallons per minute (or 0.42 gallons per second). Things would get ugly pretty quickly. In the same way, a massive "big data" infrastructure can be crippled by insufficient IOPS.

IOPS — Input/Output Operations Per Second — measure computer storage in terms of the number of read and write operations it can perform in a second. IOPS are a primary concern for database environments where content is being written and queried constantly, and when we take those database environments to the extreme (big data), the importance of IOPS can't be overstated: If you aren't able perform database reads and writes quickly in a big data environment, it doesn't matter how many gigabytes, terabytes or petabytes you have in your database ... You won't be able to efficiently access, add to or modify your data set.

As we worked with 10gen to create, test and tweak SoftLayer's MongoDB engineered servers, our primary focus centered on performance. Since the performance of massively scalable databases is dictated by the read and write operations to that database's data set, we invested significant resources into maximizing the IOPS for each engineered server ... And that involved a lot more than just swapping hard drives out of servers until we found a configuration that worked best. Yes, "Disk I/O" — the amount of input/output operations a given disk can perform — plays a significant role in big data IOPS, but many other factors limit big data performance. How is performance impacted by network-attached storage? At what point will a given CPU become a bottleneck? How much RAM should included in a base configuration to accommodate the load we expect our users to put on each tier of server? Are there operating system changes that can optimize the performance of a platform like MongoDB?

The resulting engineered servers are a testament to the blood, sweat and tears that were shed in the name of creating a reliable, high-performance big data environment. And I can prove it.

Most shared virtual instances — the scalable infrastructure many users employ for big data — use network-attached storage for their platform's storage. When data has to be queried over a network connection (rather than from a local disk), you introduce latency and more "moving parts" that have to work together. Disk I/O might be amazing on the enterprise SAN where your data lives, but because that data is not stored on-server with your processor or memory resources, performance can sporadically go from "Amazing" to "I Hate My Life" depending on network traffic. When I've tested the IOPS for network-attached storage from a large competitor's virtual instances, I saw an average of around 400 IOPS per mount. It's difficult to say whether that's "not good enough" because every application will have different needs in terms of concurrent reads and writes, but it certainly could be better. We performed some internal testing of the IOPS for the hard drive configurations in our Medium and Large MongoDB engineered servers to give you an apples-to-apples comparison.

Before we get into the tests, here are the specs for the servers we're using:

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
CentOS 6 64-bit
36GB RAM
1Gb Network - Bonded
Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
CentOS 6 64-bit
128GB RAM
1Gb Network - Bonded
 

The numbers shown in the table below reflect the average number of IOPS we recorded with a 100% random read/write workload on each of these engineered servers. To measure these IOPS, we used a tool called fio with an 8k block size and iodepth at 128. Remembering that the virtual instance using network-attached storage was able to get 400 IOPS per mount, let's look at how our "base" configurations perform:

Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 2937
Random Write IOPS - /var/lib/mongo/logs 1306
Random Read IOPS - /var/lib/mongo/data 1720
Random Write IOPS - /var/lib/mongo/data 772
Random Read IOPS - /var/lib/mongo/data/journal 19659
Random Write IOPS - /var/lib/mongo/data/journal 8869
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 30269
Random Write IOPS - /var/lib/mongo/logs 13124
Random Read IOPS - /var/lib/mongo/data 33757
Random Write IOPS - /var/lib/mongo/data 14168
Random Read IOPS - /var/lib/mongo/data/journal 19644
Random Write IOPS - /var/lib/mongo/data/journal 8882
   
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 4820
Random Write IOPS - /var/lib/mongo/logs 2080
Random Read IOPS - /var/lib/mongo/data 2461
Random Write IOPS - /var/lib/mongo/data 1099
Random Read IOPS - /var/lib/mongo/data/journal 19639
Random Write IOPS - /var/lib/mongo/data/journal 8772
 
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32403
Random Write IOPS - /var/lib/mongo/logs 13928
Random Read IOPS - /var/lib/mongo/data 34536
Random Write IOPS - /var/lib/mongo/data 15412
Random Read IOPS - /var/lib/mongo/data/journal 19578
Random Write IOPS - /var/lib/mongo/data/journal 8835

Clearly, the 400 IOPS per mount results you'd see in SAN-based storage can't hold a candle to the performance of a physical disk, regardless of whether it's SAS or SSD. As you'd expect, the "Journal" reads and writes have roughly the same IOPS between all of the configurations because all four configurations use 2 x 64GB SSD drives in RAID1. In both configurations, SSD drives provide better Data mount read/write performance than the 15K SAS drives, and the results suggest that having more physical drives in a Data mount will provide higher average IOPS. To put that observation to the test, I maxed out the number of hard drives in both configurations (10 in the 2U MD server and 34 in the 4U LG server) and recorded the results:

Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 7175
Random Write IOPS - /var/lib/mongo/logs 3481
Random Read IOPS - /var/lib/mongo/data 6468
Random Write IOPS - /var/lib/mongo/data 1763
Random Read IOPS - /var/lib/mongo/data/journal 18383
Random Write IOPS - /var/lib/mongo/data/journal 8765
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32160
Random Write IOPS - /var/lib/mongo/logs 12181
Random Read IOPS - /var/lib/mongo/data 34642
Random Write IOPS - /var/lib/mongo/data 14545
Random Read IOPS - /var/lib/mongo/data/journal 19699
Random Write IOPS - /var/lib/mongo/data/journal 8764
   
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 17566
Random Write IOPS - /var/lib/mongo/logs 11918
Random Read IOPS - /var/lib/mongo/data 9978
Random Write IOPS - /var/lib/mongo/data 6526
Random Read IOPS - /var/lib/mongo/data/journal 18522
Random Write IOPS - /var/lib/mongo/data/journal 8722
 
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 34220
Random Write IOPS - /var/lib/mongo/logs 15388
Random Read IOPS - /var/lib/mongo/data 35998
Random Write IOPS - /var/lib/mongo/data 17120
Random Read IOPS - /var/lib/mongo/data/journal 17998
Random Write IOPS - /var/lib/mongo/data/journal 8822

It should come as no surprise that by adding more drives into the configuration, we get better IOPS, but you might be wondering why the results aren't "betterer" when it comes to the IOPS in the SSD drive configurations. While the IOPS numbers improve going from four to ten drives in the medium engineered server and six to thirty-four drives in the large engineered server, they don't increase as significantly as the IOPS differences in the SAS drives. This is what I meant when I explained that several factors contribute to and potentially limit IOPS performance. In this case, the limiting factor throttling the (ridiculously high) IOPS is the RAID card we are using in the servers. We've been working with our RAID card vendor to test a new card that will open a little more headroom for SSD IOPS, but that replacement card doesn't provide the consistency and reliability we need for these servers (which is just as important as speed).

There are probably a dozen other observations I could point out about how each result compares with the others (and why), but I'll stop here and open the floor for you. Do you notice anything interesting in the results? Does anything surprise you? What kind of IOPS performance have you seen from your server/cloud instance when running a tool like fio?

-Kelly

December 5, 2012

Breaking Down 'Big Data' - Database Models

Forester defines big data as "techniques and technologies that make capturing value from data at an extreme scale economical." Gartner says, "Big data is the term adopted by the market to describe extreme information management and processing issues which exceed the capability of traditional information technology along one or multiple dimensions to support the use of the information assets." Big data demands extreme horizontal scale that traditional IT management can't handle, and it's not a challenge exclusive to the Facebooks, Twitters and Tumblrs of the world ... Just look at the Google search volume for "big data" over the past eight years:

Big Data Search Interest

Developers are collectively facing information overload. As storage has become more and more affordable, it's easier to justify collecting and saving more data. Users are more comfortable with creating and sharing content, and we're able to track, log and index metrics and activity that previously would have been deleted in consideration of space restraints or cost. As the information age progresses, we are collecting more and more data at an ever-accelerating pace, and we're sharing that data at an incredible rate.

To understand the different facets of this increased usage and demand, Gartner came up with the three V's of big data that vary significantly from traditional data requirements: Volume, Velocity and Variety. Larger, more abundant pieces of data ("Volume") are coming at a much faster speed ("Velocity") in formats like media and walls of text that don't easily fit into a column-and-row database structure ("Variety"). Given those equally important factors, many of the biggest players in the IT world have been hard at work to create solutions that provide the scale and speed developers need when they build social, analytics, gaming, financial or medical apps with large data sets.

When we talk about scaling databases here, we're talking about scaling horizontally across multiple servers rather than scaling vertically by upgrading a single server — adding more RAM, increasing HDD capacity, etc. It's important to make that distinction because it leads to a unique challenge shared by all distributed computer systems: The CAP Theorem. According to the CAP theorem, a distributed storage system must choose to sacrifice either consistency (that everyone sees the same data) or availability (that you can always read/write) while having partition tolerance (where the system continues to operate despite arbitrary message loss or failure of part of the system occurs).

Let's take a look at a few of the most common database models, what their strengths are, and how they handle the CAP theorem compromise of consistency v. availability:

Relational Databases

What They Do: Stores data in rows/columns. Parent-child records can be joined remotely on the server. Provides speed over scale. Some capacity for vertical scaling, poor capacity for horizontal scaling. This type of database is where most people start.
Horizontal Scaling: In a relational database system, horizontal scaling is possible via replication — dharing data between redundant nodes to ensure consistency — and some people have success sharding — horizontal partitioning of data — but those techniques add a lot of complexity.
CAP Balance: Prefer consistency over availability.
When to use: When you have highly structured data, and you know what you'll be storing. Great when production queries will be predictable.
Example Products: Oracle, SQLite, PostgreSQL, MySQL

Document-Oriented Databases

What They Do: Stores data in documents. Parent-child records can be stored in the same document and returned in a single fetch operation with no join. The server is aware of the fields stored within a document, can query on them, and return their properties selectively.
Horizontal Scaling: Horizontal scaling is provided via replication, or replication + sharding. Document-oriented databases also usually support relatively low-performance MapReduce for ad-hoc querying.
CAP Balance: Generally prefer consistency over availability
When to Use: When your concept of a "record" has relatively bounded growth, and can store all of its related properties in a single doc.
Example Products: MongoDB, CouchDB, BigCouch, Cloudant

Key-Value Stores

What They Do: Stores an arbitrary value at a key. Most can perform simple operations on a single value. Typically, each property of a record must be fetched in multiple trips, with Redis being an exception. Very simple, and very fast.
Horizontal Scaling: Horizontal scale is provided via sharding.
CAP Balance: Generally prefer consistency over availability.
When to Use: Very simple schemas, caching of upstream query results, or extreme speed scenarios (like real-time counters)
Example Products: CouchBase, Redis, PostgreSQL HStore, LevelDB

BigTable-Inspired Databases

What They Do: Data put into column-oriented stores inspired by Google's BigTable paper. It has tunable CAP parameters, and can be adjusted to prefer either consistency or availability. Both are sort of operationally intensive.
Horizontal Scaling: Good speed and very wide horizontal scale capabilities.
CAP Balance: Prefer consistency over availability
When to Use: When you need consistency and write performance that scales past the capabilities of a single machine. Hbase in particular has been used with around 1,000 nodes in production.
Example Products: Hbase, Cassandra (inspired by both BigTable and Dynamo)

Dynamo-Inspired Databases

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency,
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

Each of the database models has strengths and weaknesses, and there are huge communities that support each of the open source examples I gave in each model. If your database is a bottleneck or you're not getting the flexibility and scalability you need to handle your application's volume, velocity and variety of data, start looking at some of these "big data" solutions.

Tried any of the above models and have feedback that differs from ours? Leave a comment below and tell us about it!

-@marcalanjones

February 28, 2012

14 Questions Every Business Should Ask About Backups

Unfortunately, having "book knowledge" (or in this case "blog knowledge") about backups and applying that knowledge faithfully and regularly are not necessarily one and the same. Regardless of how many times you hear it or read it, if you aren't actively protecting your data, YOU SHOULD BE.

Here are a few questions to help you determine whether your data is endangered:

  1. Is your data backed up?
  2. How often is your data backed up?
  3. How often do you test your backups?
  4. Is your data backed up externally from your server?
  5. Are your backups in another data center?
  6. Are your backups in another city?
  7. Are your backups stored with a different provider?
  8. Do you have local backups?
  9. Are your backups backed up?
  10. How many people in your organization know where your backups are and how to restore them?
  11. What's the greatest amount of data you might lose in the event of a server crash before your next backup?
  12. What is the business impact of that data being lost?
  13. If your server were to crash and the hard drives were unrecoverable, how long would it take you to restore all of your data?
  14. What is the business impact of your data being lost or inaccessible for the length of time you answered in the last question?

We can all agree that the idea of backups and data protection is a great one, but when it comes to investing in that idea, some folks change their tune. While each of the above questions has a "good" answer when it comes to keeping your data safe, your business might not need "good" answers to all of them for your data to be backed up sufficiently. You should understand the value of your data to your business and invest in its protection accordingly.

For example, a million-dollar business running on a single server will probably value its backups more highly than a hobbyist with a blog she contributes to once every year and a half. The million-dollar business needs more "good" answers than the hobbyist, so the business should invest more in the protection of its data than the hobbyist.

If you haven't taken time to quantify the business impact of losing your primary data (questions 11-14), sit down with a pencil and paper and take time to thoughtfully answer those questions for your business. Are any of those answers surprising to you? Do they make you want to reevaluate your approach to backups or your investment in protecting your data?

The funny thing about backups is that you don't need them until you NEED them, and when you NEED them, you'll usually want to kick yourself if you don't have them.

Don't end up kicking yourself.

-@khazard

P.S. SoftLayer has a ton of amazing backup solutions but in the interested of making this post accessible and sharable, I won't go crazy linking to them throughout the post. The latest product release that got me thinking about this topic was the SoftLayer Object Storage launch, and if you're concerned about your answers to any of the above questions, object storage may be an economical way to easily get some more "good" answers.

February 21, 2012

Startup Series: Distil

As you may have read in one of my previous posts, SoftLayer partners with various startup accelerator programs around the world. This gives us the incredible opportunity to get up close and personal with some of the brightest entrepreneurs in the tech industry. Because SoftLayer grew out of a classic startup environment, we have a passion for helping new companies achieve their goals. From C-level execs all the way down the chain, we're committed to finding the best innovators out there and mentoring them on their way to success.

We're planning a pretty big public debut for the SoftLayer startup program in the coming months, but we want to start introducing you to some of the killer startup companies we already are working with. Today's incredible business: Distil.

Distil

Distil is currently enrolled in the TechStars Cloud Accelerator program, where SoftLayer CSO George Karidis, CTO Duke Skarda, and I serve as mentors. After meeting the guys at Distil, I couldn't wait to get them set up with us as well.

Here's a quick insight into the company from a quick Q&A with the brains of the operation, Rami Essaid, Founder and CEO of Distil:

Q: Tell me a little bit about Distil and how you got started.

A: Distil is the first content protection network that helps companies identify and block malicious bots from harvesting and stealing their data. We started after talking to online publishers about their security needs, and we quickly realized that digital publishers had no control over their content once they put it on the web. We started working to create the first platform aimed to help them protect and control their information.

Q: When was the moment you first recognized you had a big idea?

A: It happened after presenting our proof of concept to a couple digital publishers, the enthusiastic feedback we received made us instantly realize that this was it.

Q: How did you build your company?

A: The company started as an after-work hobby. As the platform picked up momentum, we slowly started leaving our jobs to devote all of our time to Distil. We quickly raised seed capital to help fuel our growth.

Q: What are the keys to your Distil's success?

A: The team I have at Distil is absolutely the reason for our success. Each person's hard work, energy, and dedication allow us to accomplish twice as much in half the time. This group of guys is the most intelligent and keen I have ever had the pleasure of working with.

Q: How would you describe the market for your product?

A: Distil is a technology solution to a problem that traditionally only relied on laws and litigation. Copyright infringement has been an issue on the web since the World Wide Web was started, but up until now most companies treated the data theft reactively. We are disrupting that way of thinking and creating a new market, protecting data and content proactively before it is ever stolen.

Q: How did you arrive at SoftLayer and how have we helped?

A: We were connected to SoftLayer through the TechStars Cloud Accelerator program. We were introduced to SoftLayer's leadership team, and they worked with us to improve our platform performance and tweak our designs to utilize both dedicated and cloud servers. By using this hybrid solution, we've been able to gain the power and speed of dedicated servers while still having the flexibility to burst and scale on demand.

Q: What advice would you give to other startups?

A: The best advice I can give to any startup is to make sure they're passionate about what they're doing. Startup life is not easy. You work 16-20 hours a day, seven days a week, have very little money, and are always worried someone else will beat you to the prize. Passion is the only reason you get up in the morning.

Learn more about Distil at distil.it.

In my short conversation with Rami, I could hear his passion. That's exactly what we're looking for in companies who join the SoftLayer startup program. We can't wait to see what the future holds for Distil.

If you enjoy reading about cool new startups, bookmark the Startups page here on the SoftLayer Blog or subscribe to the "Startups" RSS feed to meet some of the most badass startups in the world.

Calling All Startups!

Companies in our program receive mentoring, best practices advice, industry insight, and tangible resources including:

  • A $1,000 per month credit for dedicated hosting, cloud hosting or any kind of hybrid hosting setup
  • Advanced infrastructure help and advice
  • A dedicated Senior Account Representative
  • Marketing support

If you're interested in joining our program and getting the help you deserve, shoot me an email, and we'll help you start the application process.

-@PaulFord

February 14, 2012

Open Source, OpenStack and SoftLayer

The open-source model has significantly revolutionized not only the IT industry but the business world as well. In fact, it was one of the key "flatteners" Thomas Friedman covered in his tour de force on globalization — The World is Flat. The trend toward collaborating on online projects — including open-source software, blogs, and Wikipedia — remains one of "the most disruptive forces of all."

The success of open-source projects like Linux, Ruby on Rails, and Android reveals the strength and diversity of having developers around the world contributing and providing feedback on code. The community becomes more than the sum of its parts, driving innovation and constant improvement. The case has been made for open source in and of itself, but a debate still rages over the developing case for businesses contributing to open source. Why would a business dedicate resources to the development of something it can't sell?

The answer is simple and straightforward: Contributing to open source fosters a community that can inspire, create and fuel the innovation a business needs to keep providing its customers with even better products. It makes sense ... Having hundreds of developers with different skills and perspectives working on a project can push that project further faster. The end result is a product that benefits the open-source community and the business world. The destiny of the community or the product cannot be defined by a single vendor or business; it's the democratization of technology.

Open-Source Cloud Platforms
Today, there are several open-source cloud platforms vying for industry dominance. SoftLayer has always been a big proponent and supporter of open source, and we've been involved with the OpenStack project from the beginning. In fact, we just announced SoftLayer Object Storage, an offering based on OpenStack Object Storage (code-named Swift). We'll provide code and support for Swift in hopes that it continues to grow and improve. The basic idea behind Swift Object Storage is to create redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data. I could go on and on about object storage, but I know Marc Jones has a blog specifically about SoftLayer Object Storage being published tomorrow, and I don't want to steal too much of his thunder.

We have to acknowledge and embrace the heterogeneous nature of IT industry. Just as you might use multiple operating systems and hypervisors, we're plan on working with a variety of open-source cloud platforms. Right now, we're looking into supporting initiatives like Eucalyptus, and we have our ear to the street to listen to what our customers are asking for. Our overarching goal is to provide our customers with much-needed technologies that are advancing the hosting industry, and one of the best ways to get to that end is to serve the needs of the open-source community.

As I write this blog post, I can't help but think of it in terms of a the Lord of Rings reference: "One ring to rule them all." The idea that "one ring" is all we need to focus on as a hosting provider just doesn't work when it comes to the open-source community ... It all comes down to enabling choice and flexibility. We'll keep investing in innovation wherever we can, and we'll let the market decide which ring will rule where.

What open-source projects are you working on now? How can SoftLayer get involved?

-Matt

November 30, 2011

Kred: Tech Partner Spotlight

This is a guest blog from the PeopleBrowsr team about Kred. Kred is the first social scoring system to provide people with a comprehensive, contextual score for their Influence and Outreach within interest-based communities.

Company Website: http://kred.ly/
Tech Partners Marketplace: http://www.softlayer.com/marketplace/Kred

We All Have Influence Somewhere

The social networking revolution provides the unprecedented opportunity to observe, filter and analyze conversations in real time. For marketers and anyone interested in human behavior, it's now possible to examine the collective consciousness for insights into consumer behavior and detection and engagement with the most influential people.

Increasingly, we find that the elements that determine "influence" in online networks are the same as they are in "real life" relationships: Trust and Generosity within small close networks of friends and subject matter experts. These in turn have become the foundations for Kred, a brand new way to understand anyone's Influence and Outreach across social media and within Communities formed around interests and affinities.

Kred

'We All Have Influence Somewhere,' so Kred sifts through billions of social posts from over 110 million people in real time to uncover who is most influential on any subject, keyword or hashtag. This all summarized in Kredentials, which displays anyone's history on Twitter over the last three years with a single click, including their top communities, most used words, most clicked links and much more.

Kred

Here are just a few of the other ways Kred is an evolution of influence measurement:

Dual Scores for Influence and Outreach
Influence – scored on a 1-1000 scale – shows the likelihood that your posts provoke actions from others. Outreach demonstrates your generosity in ReTweeting and replying to others.

Community
Real influence comes from expertise and passion. Kred is calculated for everyone in Communities that naturally form around interests and affinities.

Complete Transparency
Visitors to Kred.ly can see how all of their social actions count towards their scores - and how their connections' actions affect them as well. Those who want a more thorough accounting of their score can take advantage of our Score Audit feature.

Offline Kred
Kred is the only influence measure to integrate offline achievements with online identity. Visitors can add their accomplishments - anything from academic honors to club memberships - by sending us a PDF from the 'Get More Kred' menu tab inside the Kred site. We will then hand score it and manually add points.

Kred is free for everyone at http://kred.ly and deeply integrated into Playground, PeopleBrowsr's social analytics platform. For those who wish to build custom applications off of our datamine of 1,000 days of social data, Kred can be accessed via our Playground API, Kredentials API and through a standalone API.

Many key unique features of Kred – including score audits, privacy controls and real-time activity statements – are based on feedback from our community of friends and colleagues. What would you like to see in its next evolution?

Give Kred a try and let us know what you think via email: kred@peoplebrowsr.com or on Twitter: @kred.

- Shawn Roberts, PeopleBrowsr

This guest blog series highlights companies in SoftLayer's Technology Partners Marketplace.
These Partners have built their businesses on the SoftLayer Platform, and we're excited for them to tell their stories. New Partners will be added to the Marketplace each month, so stay tuned for many more come.
October 4, 2011

An Introduction to Redis

I recently had the opportunity to get re-acquainted with Redis while evaluating solutions for a project on the Product Innovation team here at SoftLayer. I'd actually played with it a couple of times before, but this time it "clicked." Or my brain broke. Either way, I see a lot of potential for Redis now.

No one product is a perfect fit for all of your data storage needs, of course. There are such fundamental tradeoffs to be made in designing storage architectures that you should be immediately suspicious of any product that claims to fit every need.

The best solutions tend to be products that actually embrace these tradeoffs. Redis, for instance, has sacrificed a small amount of data durability in exchange for being awesome.

What is it?

Redis is a key/value store, but describing it that way is sort of like calling a helicopter a "vehicle." It's a technically correct description, but it leaves out some important stuff.

You can think of it like a sophisticated older brother of Memcached. It presents a flat keyspace, and you can set those keys to string values. Another feature of Memcached is the ability to perform remote atomic operations, like "incr" and "append." These are really handy, because you have the ability to modify remote data without fetching, and you have an assurance that you're the only one performing that operation at that instant.

Redis takes this concept of remote commands on data and goes completely nuts with it. The database is aware of data structures like hashes, lists and sets in addition to simple string values. You can sort, union, intersect, slice and dice to your heart's content without fetching any data. Redis is a data structure server. You can treat it like remote memory, and this has an awesome immediate benefit for a programmer: your code and brain are already optimized for these data types.

But it's not just about making storage simpler. It's fast, too. Crazy fast. If you make intelligent use of its data structures, it's possible to serve a lot of traffic from relatively modest hardware. Redis 2.4 can easily handle ~50k list appends a second on my notebook. With batching, it can append 2 million items to a list on a remote host in about 1.28 seconds.

It allows the remote, atomic and performant manipulation of data structures. It took me a little while to realize exactly how useful that is.

What's wrong with it?

Nothing. Move along.

OK, it's a little short on durability. Redis uses memory as its primary store and periodically flushes to disk. A common configuration is to do so every second.

That sounds pretty reasonable. If a server goes down, you could lose a second of data. Keep in mind, however, how many operations Redis can perform in a second. If you're in a high-volume environment, that could be a lot of data. It's not for your financial transactions.

It also supports relatively limited availability options. Currently, it only supports master/slave replication. Clustering support is planned for an upcoming release. It's looking pretty powerful, but it will take some real-world testing to know its performance impact.

These challenges should be taken into consideration, and it's probably clear if you're in a situation where the current tradeoffs aren't a good fit.

In my experience, a lot of developers seriously overestimate the consequences of their application losing small amounts of data. Also consider whether or not the chance of losing a second (or less) of data genuinely represents a bigger threat to your application than any other compromises you might have made.

More Information
You can check out the slightly aging docs or browse the impressively simple source. There are probably already bindings for your language of choice as well.

-Tim

September 30, 2011

What's Your KRED?

SoftLayer loves startups. The culture, the energy, the potential ... It's all good stuff. As you may remember from my 3 Bars 3 Questions interview and our Teens in Tech profile, one of the ways we support startups is through an incubator program that provides a phenomenal hosting credit and a lot of technology know-how to participating organizations.

In San Francisco, one of the flagship programs we're excited to be a part of is called PeopleBrowsr Labs, a startup accelerator geared toward technology companies in the area. As you sit in the PeopleBrowsr office, the brilliance in the air is almost palpable ... Young companies doing innovative things with everything they need to be successful at their disposal. One of the fringe benefits for participants in PeopleBrowsr Labs is that they're actually rubbing elbows with the PeopleBrowsr team as well ... Which is almost worth the price of admission.

In addition to the Labs sponsorship, SoftLayer is also the infrastructure provider for PeopleBrowsr and its unbelievable data mine of information. They've got every tweet that's been tweeted since early 2008, and they've been able to take that content and make sense of it in unique and interesting ways ... And that's why we stopped by for a visit this week. Last night, PeopleBrowsr officially launched Kred, a dynamic and innovative social influence measurement platform, to a LOT of fanfare (see: TechCrunch).

In the midst of the launch-day craziness, we grabbed Scott Milener, PeopleBrowsr SVP of business development, to have him explain a little about Kred, what differentiates it from the other social influence measurements and what it means for users interested in engaging more effectively with their social networks. Check it out:

With the clear success of the announcement, we want to send a shout out of congratulations to the PeopleBrowsr team. It looks like a phenomenal leap forward in understanding social engagement, and we know it's only the tip of the iceberg when it comes to what we'll see coming out of the PeopleBrowsr office in the near future.

If you feel a little jaded by the social influence measurements you've seen, Kred's transparency and community-centricity should be refreshing: http://kred.ly

-@PaulFord

Subscribe to data