Technology Posts

January 6, 2016

Do You Speak SoftLayer Object Storage?

So you’ve made the decision to utilize object storage at SoftLayer. Great! But are you and your applications fluent in object storage? Do you know how to transfer data to SoftLayer object storage as well as modify and delete objects? How about when to use APIs and when to use storage gateways? If not, you’re not alone.

We’ve found that most IT professionals understand the difference between “traditional” (i.e., file and block) storage and object storage. They have difficulty, however, navigating the methods to interact with SoftLayer’s object storage service that is based on OpenStack Swift. This is understandable because traditional storage systems expose volumes and or shares that can be mounted and consumed via iSCSI, NFS, or SMB protocols.

That’s not the case with object storage, including the object storage service offered by SoftLayer. Data is only accessed via the use of REST APIs and language bindings, third-party applications supporting SFTP, the SoftLayer customer portal, or via storage gateways.

The solutions are outlined below, including guidance on when to utilize each access method. Figure 1 provides a high level overview of the available options and their purpose.



Figure 1: Object storage data access methods

REST APIs and Language Bindings
The first and possibly most flexible method to access SoftLayer object storage is via REST APIs and language bindings. These APIs and bindings give you the ability to interact with SoftLayer object storage via command line or programmatically. As a result, you can create scripts to perform a file upload, download certain objects, and modify metadata related to the object. Additionally, the current support for PHP, Java, Ruby, and Python bindings give application developers the flexibility to support SoftLayer object storage in their applications.

While this method is flexible in terms of capabilities, it does assume the user has knowledge and experience writing scripts, programs, and applications. REST APIs and language bindings aren’t the best methods for IT organizations that want to integrate existing environment backup, archive, and disaster recovery solutions. These solutions typically require traditional storage mount points, which REST APIs and language bindings don’t provide.

Third-Party Applications
The second method is to use third-party applications that support SFTP. This method abstracts the use of REST APIs and gives users the ability to upload, download, and delete objects via a GUI. However, you won’t have the ability to modify metadata when using an SFTP client. Additionally, third-party applications have a 5GB upload limit placed on each object by SoftLayer and OpenStack Swift. If an object greater than 5GB needs to be uploaded, you have to follow the OpenStack method of creating large objects on object storage to assure successful and efficient object upload. Unless you’re comfortable with this methodology, it’s strongly recommended that you use either the REST APIs or storage gateway solutions to access files over 5GB.

SoftLayer Customer Portal
The third method to access SoftLayer object storage is to simply use the SoftLayer customer portal. By using the portal, you have the ability to add containers, add files to containers, delete files from containers, modify metadata, and enable CDN capabilities. As with the SFTP method of accessing the object store, you can upload an unlimited number of files as long as each file does not exceed 20MB in size. Also, there is no bulk upload option within the customer portal; users must select and upload on a per-file basis. While using the portal is simple, it does provide some limitations and is best for users only wanting to upload a few files that occupy 20MB or less.

Storage Gateways
The last method to access and utilize SoftLayer object storage is storage gateways. Unlike other methods, storage gateways are unique. They’re able to expose traditional storage protocols like iSCSI, NFS, CIFS, and SMB and translate the read/write/modify commands into REST API calls against the object storage service. As a result, these devices offer an easier path to consume SoftLayer object storage for businesses looking to integrate their on-premises environment with the cloud. Some storage gateways also have the ability to compress, deduplicate, and encrypt data in-flight and at-rest. Storage gateways work best with organizations looking to integrate existing applications requiring traditional storage access methods (like backup software) with object storage or to securely transfer and store data to cloud object storage.

Summary
While there are many methods to access SoftLayer object storage, it’s important that you select an option that best meets your requirements relating to data access, security, and integration. For example, if you’re writing an application that requires object storage, you would most likely choose to interact with object storage via REST APIs or use language bindings. Or, if you simply need to integrate existing applications in your environment to cloud object storage, storage gateway would be the best option. In all cases, make sure you can meet your requirements with the appropriate method.

Table 1 lists sample requirements and shows whether each option meets the requirements. Use it to help you with your decision making process:



Table 1: Decision making tool

Click here for more information about SoftLayer’s object storage service and click here for FAQs on object storage.

Click here for information about SoftLayer’s REST-APIs and language bindings.

-Daniel De Araujo & Naeem Altaf

Categories: 
December 28, 2015

Semantics: "Public," "Private," and "Hybrid" in Cloud Computing, Part II

Welcome back! In the second post in this two-part series, we’ll look at the third definition of “public” and “private,” and we’ll have that broader discussion about “hybrid”—and we’ll figure out where we go after the dust has cleared on the semantics. If you missed the first part of our series, take a moment to get up to speed here before you dive in.

Definition 3—Control: Bare Metal v. Virtual

A third school of thought in the “public v. private” conversation is actually an extension of Definition 2, but with an important distinction. In order for infrastructure to be “private,” no one else (not even the infrastructure provider) can have access to a given hardware node.

In Definition 2, a hardware node provisioned for single-tenancy would be considered private. That single-tenant environment could provide customers with control of the server at the bare metal level—or it could provide control at the operating system level on top of a provider-managed hypervisor. In Definition 3, the latter example would not be considered “private” because the infrastructure provider has some level of control over the server in the form of the virtualization hypervisor.

Under Definition 3, infrastructure provisioned with full control over bare metal hardware is “private,” while any provider-virtualized or shared environment would be considered “public.” With complete, uninterrupted control down to the bare metal, a user can monitor all access and activity on the infrastructure and secure it from any third-party usage.

Defining “public cloud” and “private cloud” using the bare metal versus virtual delineation is easy. If a user orders infrastructure resources from a provider, and those resources are delivered from a shared, virtualized environment, that infrastructure would be considered public cloud. If the user orders a number of bare metal servers and chooses to install and maintain his or her own virtualization layer across those bare metal servers, that environment would be a private cloud.

“Hybrid”

Mix and Match

Now that we see the different meanings “public” and “private” can have in cloud computing, the idea of a “hybrid” environment is a lot less confusing. In actuality, it really only has one definition: A hybrid environment is a combination of any variation of public and private infrastructure.

Using bare metal servers for your database and virtual servers for your Web tier? That’s a hybrid approach. Using your own data centers for some of your applications and scaling out into another provider’s data centers when needed? That’s hybrid, too. As soon as you start using multiple types of infrastructure, by definition, you’ve created a hybrid environment.

And Throw in the Kitchen Sink

Taking our simple definition of “hybrid” one step further, we find a few other variations of that term’s usage. Because the cloud stack is made up of several levels of services—Infrastructure as a Service, Platform as a Service, Software as a Service, Business Process as a Service—“hybrid” may be defined by incorporating various “aaS” offerings into a single environment.

Perhaps you need bare metal infrastructure to build an off-prem private cloud at the IaaS level—and you also want to incorporate a managed analytics service at the BPaaS level. Or maybe you want to keep all of your production data on-prem and do your sandbox development in a PaaS environment like Bluemix. At the end of the day, what you’re really doing is leveraging a “hybrid” model.

Where do we go from here?

Once we can agree that this underlying semantic problem exists, we should be able to start having better conversations:

  • Them: We’re considering a hybrid approach to hosting our next application.
  • You: Oh yeah? What platforms or tools are we going to use in that approach?
  • Them: We want to try and incorporate public and private cloud infrastructure.
  • You: That’s interesting. I know that there are a few different definitions of public and private when it comes to infrastructure…which do you mean?
  • Them: That’s a profound observation! Since we have our own data centers, we consider the infrastructure there to be our private cloud, and we’re going to use bare metal servers from SoftLayer as our public cloud.
  • You: Brilliant! Especially the fact that we’re using SoftLayer.

Your mileage may vary, but that’s the kind of discussion we can get behind.

And if your conversation partner balks at either of your questions, send them over to this blog post series.

-@khazard

December 18, 2015

Semantics: "Public, "Private," and "Hybrid" in Cloud Computing, Part I

What does the word “gift” mean to you? In English, it most often refers to a present or something given voluntarily. In German, it has a completely different meaning: “poison.” If a box marked “gift” is placed in front of an English-speaker, it’s safe to assume that he or she would interact with it very differently than a German-speaker would.

In the same way, simple words like “public,” “private,” and “hybrid” in cloud computing can mean very different things to different audiences. But unlike our “gift” example above (which would normally have some language or cultural context), it’s much more difficult for cloud computing audiences to decipher meaning when terms like “public cloud,” “private cloud,” and “hybrid cloud” are used.

We, as an industry, need to focus on semantics.

In this two-part series, we’ll look at three different definitions of “public” and “private” to set the stage for a broader discussion about “hybrid.”

“Public” v. “Private”

Definition 1—Location: On-premises v. Off-premises

For some audiences (and the enterprise market), whether an infrastructure is public or private is largely a question of location. Does a business own and maintain the data centers, servers, and networking gear it uses for its IT needs, or does the business use gear that’s owned and maintained by another party?

This definition of “public v. private” makes sense for an audience that happens to own and operate its own data centers. If a business has exclusive physical access to and ownership of its gear, the business considers that gear “private.” If another provider handles the physical access and ownership of the gear, the business considers that gear “public.”

We can extend this definition a step further to understand what this audience would consider to be a “private cloud.” Using this definition of “private,” a private cloud is an environment with an abstracted “cloud” management layer (a la OpenStack or CloudStack or VMWare) that runs in a company’s own data center. In contrast, this audience would consider a “public cloud” to be a similar environment that’s owned and maintained by another provider.

Enterprises are often more likely to use this definition because they’re often the only ones that can afford to build and run their own data centers. They use “public” and “private” to distinguish between their own facilities or outside facilities. This definition does not make sense for businesses that don’t have their own data center facilities.

Definition 2—Population: Single-tenant v. Multi-tenant

Businesses that don’t own their own data center facilities would not use Definition 1 to distinguish “public” and “private” infrastructure. If the infrastructure they use is wholly owned and physically maintained by another provider, these businesses are most interested in whether hardware resources are shared with any other customers: Do any other customers have data on or access to a given server’s hardware? If so, the infrastructure is public. If not, the infrastructure is private.

Using this definition, public and private infrastructure could be served from the same third-party-owned data center, and the infrastructure could even be in the same server rack. “Public” infrastructure just happens to provide multiple users with resources and access to a single hardware node. Note: Even though the hardware node is shared, each user can only access his or her own data and allotted resources.

On the flip side, if a user has exclusive access to a hardware node, a business using Definition 2 would consider the node to be private.

Using this definition of “public” and “private,” multiple users share resources at the server level in a “public cloud” environment—and only one user has access to resources at the server level in a “private cloud” environment. Depending on the environment configuration, a “private cloud” user may or may not have full control over the individual servers he or she is using.

This definition echoes back to Definition 1, but it is more granular. Businesses using Definition 2 believe that infrastructure is public or private based on single-tenancy or multi-tenancy at the hardware level, whereas businesses using Definition 1 consider infrastructure to be public or private based on whether the data center itself is single-tenant or multi-tenant.

Have we blown your minds yet? Stay tuned for Part II, where we’ll tackle bare metal servers, virtual servers, and control. We’ll also show you how clear hybrid environments really are, and we’ll figure out where the heck we go from here now that we’ve figured it all out.

-@khazard

December 17, 2015

Xen Hypervisor Maintenance - December 2015

Security of your assets on our cloud platform is very important to the SoftLayer team. Last week, our Security Operations Center – which provides real time monitoring of suspicious activity (including being part of multiple security pre-disclosure lists) – alerted our engineering team to a potential vulnerability (advisory CVE-2015-8555 / XSA-165) in the Xen Hypervisor that if left un-remediated could allow a malicious user to access data from another VSI guest sharing the same hardware node and hypervisor instance.

Upon learning of this vulnerability, SoftLayer issued a notification including a per-data center schedule for applying critical maintenance to remediate the vulnerability. Our schedule was performed over multiple days and on a POD-by-POD basis with individual VM instances being offline for minutes while they rebooted. The updates were completed successfully in all data centers in advance of the public announcement of this vulnerability.

While deployment techniques such as clustering and failover across data centers and PODs allows continuous operations during a planned or unplanned event, you should be aware that SoftLayer is committed to working aggressively to further reduce the impact of events on your deployment and operations teams.

We value your business and will continue to take actions that insure your environment is secure and efficient to operate. If you have any questions or concerns, don't hesitate to reach out to SoftLayer support or your direct SoftLayer contacts.

-Sonny

November 4, 2015

Shared, scalable, and resilient storage without SAN

Storage area networks (SAN) are used most often in the enterprise world. In many enterprises, you will see racks filled with these large storage arrays. They are mainly used to provide a centralized storage platform with limited scalability. They require special training to operate, are expensive to purchase, support, or expand, and if those devices fail, there is big trouble.

Some people might say SAN devices are a necessary evil. But are they really necessary? Aren’t there alternatives?

Most startups nowadays are running their services on commodity hardware, with smart software to distribute their content across server farms globally. Current, well established, and successful companies that run websites or apps like Whatsapp, Facebook, or LinkedIn continue to operate pretty much the same way they started. They need the ability to scale and perform at unpredictable rates all around the world, so they use commodity hardware combined with smart software. These types of companies need the features that SAN storage offers them—but with more scalable, global resiliency, and without being centralized or having to buy expensive hardware. But how do they provide server access to the same data, and how do they avoid data loss?

The answer is actually quite simple, although its technology is quite sophisticated: distributed storage.

In a world where virtualization has become a standard for most companies, where even applications and networking are being virtualized, virtualization giant VMware answers this question with Virtual SAN. It effectively eliminates the need for SAN hardware in a VMware environment (and it will also be available for purchase from SoftLayer before the end of the year). Other similar distributed products are GlusterFS (also offered in our QuantaStor solution), Ceph, Microsoft Windows DFS, Hadoop HDFS, document-oriented databases like MongoDB, and many more.

Many solutions, however, vary in maturity. Object storage is a great example of a new type of storage that has come to market, which doesn’t require SAN devices. With SoftLayer, you can and may run them all.

When you have bare metal servers set up as hypervisors or application servers, it’s likely you have a lot of drive bays within those servers, mostly unused. Stuffing them with hard drives and allowing the software to distribute your data across multiple servers in multiple locations with two or three replicas will result in a big, safe, fast, and distributed storage platform. For such a platform, scaling it would be just adding more bare metal servers with even more hard drives and letting the software handle the rest.

Nowadays we are seeing more and more hardware solutions like SAN—or even networking—being replaced with smarter software on simpler and more affordable hardware. At SoftLayer, we offer month-to-month and hourly bare metal servers with up to 36 drive bays, potentially providing a lot of room for storage. With 10Gbps global connectivity options, we offer fast, low latency networking for syncing between servers and delivering data to the customer.

-Mathijs

October 29, 2015

How to measure the performance of striped block storage volumes

To piggyback on the performance specifications of block and file storage offerings, SoftLayer provides a high degree of volume size and performance combinations for your storage needs. But what if your storage performance or size requirements are much more specific than what is currently offered?

In this post, I’ll show you to configure and validate a sample RAID 0 configuration with:

  1. The use of LVM on CentOS to create a RAID 0 array with 3 volumes
  2. The use of FIO to apply IO load to the array
  3. The ability to measure throughput of the array

Without going into potential drawbacks of RAID 0, we should be able to observe the benefits of up to three times the throughput and size of any single volume. For example, if we needed a volume with 60GB and 240IOPS, we should be able to stripe three 20GB volumes each at 4 IOPS/GB. You can also extrapolate the benefits from this example to fit a range of performance and reliability requirements.

To start, we will provision 3x 20GB Endurance volumes at 4 IOPS/GB and make it accessible to our CentOS VM but stop short of creating a file system; e.g., you should stop once you are able to list three volumes with:

# fdisk -l | grep /dev/mapper
Disk /dev/mapper/3600a09803830344f785d46426c37364a: 21.5 GB, 21474836480 bytes, 41943040 sectors
Disk /dev/mapper/3600a09803830344f785d46426c373648: 21.5 GB, 21474836480 bytes, 41943040 sectors
Disk /dev/mapper/3600a09803830344f785d46426c373649: 21.5 GB, 21474836480 bytes, 41943040 sectors

Then proceed to create the three-stripe volume with the following commands:

# pvcreate /dev/mapper/3600a09803830344f785d46426c37364a /dev/mapper/3600a09803830344f785d46426c373648 /dev/mapper/3600a09803830344f785d46426c373649
 
# vgcreate new_vol_group /dev/mapper/3600a09803830344f785d46426c37364a /dev/mapper/3600a09803830344f785d46426c373648 /dev/mapper/3600a09803830344f785d46426c373649
 
# lvcreate -i3 -I16 -l100%FREE -nstriped_logical_volume new_vol_group

This creates a logical volume with three stripes (-i) and stripe size (-I) of 16KB with a volume size (-l) of 60GB or 100 percent of the free space.

You can now create the file system on the new logical volume, create a mount point, and mount the volume:

# mkfs.ext3 /dev/new_vol_group/striped_logical_volume
# mkdir /mnt
# mount /dev/mapper/new_vol_group-striped_logical_volume /mnt

Now download, build, and run FIO:

# yum install -y gcc libaio-devel
# cd /tmp
# wget http://freecode.com/urls/3aa21b8c106cab742bf1f20d60629e3f
# tar -xvf 3aa21b8c106cab742bf1f20d60629e3f
# cd fio-2.1.10/
# make
# make install
# cd /mnt
# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=16k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=50

This will execute the benchmark test at 16KB blocks (--bs), random sequence (--readwrite=randrw), at 50 percent read, and 50 percent write (rwmixread=50). This will run 64 threads (--iodepth=64) until the test file of 1GB (--size=1G) is size is completed.

Here is a snippet of output once completed:

read : io=51712KB, bw=1955.8KB/s, iops=122, runt= 26441msec
write: io=50688KB, bw=1917.3KB/s, iops=119, runt= 26441msec

This shows that throughput is rated at 122r + 119w = ~240 IOPS. To validate that it is what we expect, we provisioned 3x 20 GB x 4 IOPS/GB = 3 x 80 IOPS = 240 IOPS.

Here is a table showing how results would differ if we tuned the load with varying block sizes (--bs) :

As you can see from the results, you may not observe the expected 3x throughput (IOPS) in every case, so please be mindful of your logical volume configuration (stripe size) versus your load profile (--bs). Please refer to our FAQ for further details on other possible limits.

-Nam

October 28, 2015

Ongoing Actions to Eliminate Spam Hosting

We are announcing a new policy, effective today, as part of our regular efforts to reduce the ability for spam to be sent from the SoftLayer network.

Starting October 28, 2015 bare metal servers and virtual servers provisioned on new accounts will not have the ability to send email directly via outbound connections through TCP port 25 (SMTP). Port 25 can be used as a conduit for distributing unsolicited bulk email.

In a follow-up phase, we will roll out this network policy change to customers who established accounts before October 28. (A separate communications will be sent with timeline and implementation guidance to those customers.)

You can read the technical details on KnowledgeLayer.

SendGrid Services Available to Send and Track Emails

We have partnered with SendGrid™ since 2011 to provide email delivery services. We have arranged for SendGrid to provide SoftLayer customers with an account allowing sending of up to 25,000 emails per month at no charge, which can be activated via the SoftLayer customer portal.

SendGrid allows you to use a SmartHost to relay your outbound mail services while generating metrics, including tracking lists and bounce rates, open rates, and click-through rates. It also assists with newsletters and provides authentication. All of these services are designed to provide stronger email analytics for you to optimize your communications and eNurture programs. Full details on our SendGrid service, including free options, can be found here.

Use Your Email Service Through a Custom Email Port

You are welcome to use your own email service on a custom port following the API or SMTP guidelines provided by your mail provider to configure your servers to an email port other than TCP port 25. This is common practice for most mail providers and should not be an inhibitor to you sending and measuring your communications.

Need an Exception?

If you are a new client and need the ability to send outbound SMTP email via TCP port 25, please open a support ticket in the customer portal, and provide details about why you require an exception to this policy. Be sure to explain why the SendGrid email relaying solution does not fit your system or application needs. Our team is specialized to assist with most email relaying and blacklisting issues for recognized and reputable real-time blackhole lists (RBLs) and can evaluate your situation.

Dedicated to Your Success

We continuously work with established monitoring authorities and groups to eliminate fraudulent spammers and to block the usage of port 25 for email communications.

As we all know, spam is unsolicited bulk email. Our network architecture isolates devices so customers cannot see or share traffic across accounts. We follow ISO 27001. And for federal accounts, we are aligned to NIST 800-53 framework and maintain SOC 2 Type II reporting compliance for all data centers. We integrate three distinct network topologies for each physical or virtual server and offer security solutions for systems, applications, and data as well.

Thank you again to your commitment to SoftLayer as we continue to work hard to ensure a secure environment for you.

-Dani

October 7, 2015

Give me a MOOC with social proof

I’ve spent the last few weeks investigating the technologies needed to deliver e-learning, and it’s been a real eye-opener as to what’s hot.

I thought the only way to investigate was to try out a MOOC (Massive Open Online Course) for myself. After all, “eating your own dog food” is an essential skill for any entrepreneur.

Most of my learning in the last 20 years has been through the School of Life—I haven’t been in formal training for a while. Going back to school (or at least digital school) has been a fascinating experience. MOOCs, as offered by the likes of Coursera, iVersity and Udemy, represent a real step change in learning technology.

We all have areas of interest beyond technology; mine is global politics and development. (So keen am I on this subject that after a recent hackathon, I started running an employee advocacy program for the United Nations: the UN Social 500.) I enrolled in a Coursera course, Configuring the World, to learn how data drives political decisions. Perfect for a startup founder offering automated statistical release software!

The course covers eight weeks of lectures with reading material for each week and a quiz at the end of the week. You can watch the videos in normal or double-time on the web, or you can download them to your mobile device of choice to watch on-the-go.

Having the content accessible via mobile turned out to be essential for me. I thought I was going to block out the time at work (two hours every other day), but client needs always felt more pressing. In the end, I found that perching the phone above the sink whilst doing the washing up worked best (although I frequently have to dry my hands to press play on the next video).

As head of a company offering digital services, the mobile need has been a great lesson for me. You can’t always expect your users to either have Internet access or the time to sit at a desk. Mobile offers the experience offline on a smartphone and is utterly necessary if you want to retain your customers.

The second key I picked up from the course has been the value of interaction with others. It is in the discussion forums that you can review the lectures, ask questions, and extend the conversation beyond the confines of the curriculum.

I’m a social person, so you’d think I’d love the discussion forums. While helpful, they aren’t enough for me. They lack a killer feature, which is a social proof mechanism: a way of comparing my progress with other participants. On Twitter, I have a social proof mechanism: I can compare numbers of followers. That may be an overly simplified score (what’s the value of a follower anyway?), but at least it allows me compare progress via some standard metric.

What I really craved was a similar mechanism in my course. Not because I particularly wanted to compete—I’m not aiming to be top of the class—but because I wanted to make sure I am keeping up. Are my quiz scores worse or better than the average? Am I watching enough of the lectures?

The traditional way for us technologists to solve this problem is to offer a dashboard. You know the type: a page of multiple bar graphs, dials, and gauges.

The trouble with these types of data dashboards is that it is pretty difficult for the user to figure out whether he or she needs to do anything or not. The insights are not exactly “actionable.” If one gauge has gone up and another has gone down, is that good or bad overall? Moreover, should I be panicking?

Dashboards of this nature also tend to be fairly passive. Unless you remember to check them regularly (and who does?), they are quickly forgotten. I would prefer to see a single composite score (much like what Nike+ does for running or Klout for social media) that has an embedded weighting method for the relative importance of each metric. For my MOOC example, watching videos is important, so that should be worth 60 percent of the score. Scoring well on the quiz also counts, which should be worth 30 percent of the score. That leaves 10 percent for the discussion forums.

Now that I have a score, it should be sent to me each week (no passivity here), and I’ll know whether I’ve done well or badly. Even better: show me how I compare versus others across the world or in my country on a leaderboard. Then I will really have a social proof mechanism to help guide my behaviour.

-Toby

Toby Beresford is CEO and founder of Rise.global, a social utility to share the score. Rise provides an automated statistical release service to create composite single scores and to distribute the results via web, social media, and email. Rise scores and leaderboards have been used across enterprises for multiple use cases including employee advocacy programs, partner management, e-learning, audience development, digital marketing, and digital sales enablement. Rise is a member of the SoftLayer Startup Catalyst program.

Categories: 
September 2, 2015

Backup and Restore in a Cloud and DevOps World

Virtualization has brought many improvements to the compute infrastructure, including snapshots and live migration1. When an infrastructure moves to the cloud, these options often become a client’s primary backup strategy. While snapshots and live migration are also part of a successful strategy, backing up on the cloud may need additional tools.

First, a basic question: Why do we take backups? They’re taken to recover from

  • The loss of an entire machine
  • Partially corrupted files
  • A complete data loss (either through hardware or human error)

While losing an entire machine is frightening, corrupted files or data loss are the more common reasons for data backups.

Snapshots are useful when the snapshot and restore occur in close proximity to each other, e.g., when you’re migrating middleware or an operating system and want to fall back quickly if something goes wrong. If you need to restore after extensive changes (hardware or data), a snapshot isn’t an adequate resource. The restore may require restoring to a new machine, selecting files to be restored, and moving data back to the original machine.

So if a snapshot isn’t the silver bullet for backing up in the cloud, what are the effective backup alternatives? The solution needs to handle a full system loss, partial data loss, or corruption, and ideally work for both virtualized and non-virtualized environments.

What to back up

There are three types of files that you’ll want to consider when backing up an active machine’s disks:

  • Binary files: Changed by operating system and middleware updates; can be easily stored and recovered.
  • Configuration files: Defined by how the binary files are connected, configured, and what data is accessible to them.
  • Data files: Generated by users and unrecoverable if not backed up. Data files are the most precious part of the disk content and losing them may result in a financial impact on the client’s business.

Keep in mind when determining your backup strategy that each file type has a different change rate—data files change faster than configuration files, which are more fluid than binary files. So, what are your options for backing up and restoring each type of file?

Binary files
In the case of a system failure, DevOps advocates (see Phoenix Servers from Martin Fowler) propose getting a new machine, which all cloud providers can automatically provision, including middleware. Automated provisioning processes are available for both bare metal and virtual machines.

Note that most Open Source products only require an Internet connection and a single command line for installation, while commercial products can be provisioned through automation.

Configuration files
Cloud-centric operations have a distinct advantage over traditional operations when it comes to backing up configuration files. With traditional operations, each element is configured manually, which has several drawbacks such as being time-consuming and error-prone. Cloud-centric operations, or DevOps, treat each configuration as code, which allows an environment to be built from a source configuration via automated tools and procedures. Tools such as Chef, Puppet, Ansible, and SaltStack show their power with central configuration repositories that are used to drive the composition of an environment. A central repository works well with another component of automated provisioning—changing the IP address and hostname.

You have limited control of how the cloud will allocate resources, so you need an automated method to collect the information and apply it to all the machines being provisioned.

In a cloud context, it’s suboptimal to manage machines individually; instead, the machines have to be seen as part of a cluster of servers, managed via automation. Cluster automation is one the core tenants of solutions like CoreOS’ Fleet and Apache Mesos. Resources are allocated and managed as a single entity via API, configuration repositories, and automation.

You can attain automation in small steps. Start by choosing an automation tool and begin converting your existing environment one file at a time. Soon, your entire configuration is centrally available and recovering a machine or deploying a full environment is possible with a single automated process.

In addition to being able to quickly provision new machines with your binary and configuration files, you are also able to create parallel environments, such as disaster recovery, test and development, and quality assurance. Using the same provisioning process for all of your environments assures consistent environments and early detection of potential production problems. Packages, binaries, and configuration files can be treated as data and stored in something similar to object stores, which are available in some form with all cloud solutions.

Data files
The final files to be backed up and restored are the data files. These files are the most important part of a backup and restore and the hardest ones to replace. Part of the challenge is the volume of data as well as access to it. Data files are relatively easy to back up; the exception being files that are in transition, e.g., files being uploaded. Data file backups can be done with several tools, including synchronization tools or a full file backup solution. Another option is object stores, which is the natural repository for relatively static files, and allows for a pay–as-you-go model.

Database content is a bit harder to back up. Even with instant snapshots on storage, backing up databases can be challenging. A snapshot at the storage level is an option, but it doesn’t allow for a partial database restore. Also, a snapshot can capture inflight transactions that can cause issues during a restore; which is why most database systems provide a mechanism for online backups. The online backups should be leveraged in combination with tools for file backups.

Something to remember about databases: many solutions end up accumulating data even after the data is no longer used by users. The data within an active database includes data currently being used and historical data. Having current and historical data allows for data analytics on the same database, it also increases the size of the database, making database-related operations harder. It may make sense to archive older data in either other databases or flat files, which makes the database volumes manageable.

Summary

To recap, because cloud provides rapid deployment of your operating system and convenient places to store data (such as object stores), it’s easy to factor cloud into your backup and recovery strategy. By leveraging the containerization approach, you should split the content of your machines—binary, configuration, and data. Focus on automating the deployment of binaries and configuration; it allows easier delivery of an environment, including quality assurance, test, and disaster recovery. Finally, use traditional backup tools for backing up data files. These tools make it possible to rapidly and repeatedly recover complete environments while controlling the amount of backed up data that has to be managed.

-Thomas

1 Snapshots are not available on bare metal servers that have no virtualization capability.

August 12, 2015

Network Performance 101: What is latency, and why does it matter?

We’ve all been there. Waiting for a web page to load can be so frustrating that we end up just closing out. You might ask yourself, “Hey, I have high-speed Internet. Why is this happening to me?” Well, there are a lot of factors outside your control that … control page loads. And whether you have an online store, run big data solutions, or have your employees set up on a network accessing files around the world, you never want to hear that your data, consumer products, information, or otherwise, is keeping you from a sale or slowing down employee productivity because of slow data transfer.

So why are some pages so much slower to load than others?
It could be that poorly written code or large images are slowing the load on the backend, but slow page loads can also be caused by network latency. This might sound elementary, but data is not just floating out there in some non-physical Internet space. In reality, data is stored on hard drives … somewhere. Network connectivity provides a path for that data to travel to end users around the world, and that connectivity can vary significantly—depending on how far it’s going, how many times the data has to hop between service providers, how much bandwidth is available along the way, the other data traveling across the same path, and a number of other variables.

The measurement of how quickly data travels between two connected points is called network latency. Network latency is an expression of the amount of time it takes a packet of data to get from one place to another.

Understanding Network Latency
Theoretically, data can travel at the speed of light across optical fiber network cables, but in practice, data typically travels slower than light due to the variables we referenced in the previous section. If a network connection doesn’t have any available bandwidth capacity, data might temporarily queue up to wait for its turn to travel across the line. If a service provider’s network doesn’t route a network path optimally, data could be sent hundreds or thousands of miles away from the destination in the process of routing to the destination. These kinds of delays and detours lead to higher network latency, which lead to slower page loads and download speeds.

We express network latency in milliseconds (that’s 1,000 milliseconds per second), and while a few thousandths of a second may not mean much to us as we’re living our daily lives, those milliseconds are often the deciding factors for whether we stay on a webpage or give up and try another site. As consumers of high-speed Internet, we like what we like, and we want what we want when we want it. In the financial sector, milliseconds can mean billions of dollars in gains or losses from trade transactions on a day-to-day basis.

Logical conclusion: Everyone wants the lowest network latency to the greatest number of users.

Common Approaches to Minimize Network Latency
If our shared goal is to minimize latency for our data, the most common approaches to addressing network latency involve limiting the number of potential variables that can impact the speed of data’s movement. While we don’t have complete control over how our data travels across the Internet, we can do a few things to keep our network latency in line:

  • Distribute data around the world: Users in different locations can pull data from a location that’s geographically close to them. Because the data is closer to the users, it is handed off fewer times, it has a shorter distance to travel, and inefficient routing is less likely to cause a significant performance impact.
  • Provision servers with high-capacity network ports: Huge volumes of data can travel to and from the server every second. If packets are delayed due to fully saturated ports, milliseconds of time pass, pages load slower, download speeds drop, and users get unhappy.
  • Understand how your providers route traffic: When you know how your data is transferred to users around the world, you can make better decisions about where you host your data.

How SoftLayer Minimizes Network Latency
To minimize latency, we took a unique approach to building our network. All of our data centers are connected to network points of presence. All of our network points of presence are connected to each other via our global backbone network. And by maintaining our own global backbone network, our network operations team is able to control network paths and data handoffs much more granularly than if we relied on other providers to move data between geographies.

SoftLayer Private Network

For example, if a user in Berlin wants to watch a cat video hosted on a SoftLayer server in Dallas, the packets of data that make up that cat video will travel across our backbone network (which is exclusively used by SoftLayer traffic) to Frankfurt, where the packets would be handed off to one of our peering or transit public network partners to get to the user in Berlin.

Without a global backbone network, the packets would be handed off to a peering or transit public network provider in Dallas, and that provider would route the packets across its network and/or hand the packets off to another provider at a network hop, and the packets would bounce their way to Germany. It’s entirely possible that the packets could get from Dallas to Berlin with the same network latency with or without the global backbone network, but without the global backbone network, there are a lot more variables.

In addition to building a global backbone network, we also segment public, private, and management traffic onto different network ports so that different types of traffic can be transferred without interfering with each other.

SoftLayer Private Network

But at the end of the day, all of that network planning and forethought doesn’t amount to a hill of beans if you can’t see the results for yourself. That’s why we put speed tests on our website so you can check out our network yourself (for more on speed tests, check out this blog post).

TL;DR: Network Latency
Your users want your data as quickly as you can get it to them. The time it takes for your data to get to them across the Internet is called network latency. The more control you (or your provider) have over your data’s network path, the more consistent (and lower) your network latency will be.

Stay tuned. Next month we will be discussing Network Performance 101: Security, where we’ll discuss all things cloud security—including answering your burning questions: Can other people see or access my data in a public cloud? Is my data more prone to hackers? And, what safeguards do SoftLayer have in place to protect data?

-JRL

Subscribe to technology