Technology Posts

October 16, 2013

Tips and Tricks: Troubleshooting Email Issues

Working in support, one of the most common issues we troubleshoot is a customer's ability to receive email. Depending on email server, this can be a headache and a half to figure out, but more often than not, we're able to fix the problem with one of only a few simple solutions. Because the SoftLayer Blog audience loves technical tips and tricks, I thought I'd share a few easy steps that make pinpointing the root cause of email issues much easier.

Before you gear up to go into battle, check the that server is not out of disk space on /var and that it is not in a read only state. That precursory step may seem silly, but Occam's Razor often holds true in technical troubleshooting. Once you verify that those two common problems aren't causing your email problems, the next step is to determine whether the email issues are server-wide or isolated to one mail account/domain. To do that, the first thing you need to do is make sure that the IMAP and POP services are responding.

Check IMAP and POP Services

The universal approach to checking IMAP and POP services is to use telnet:

telnet <serverip> 110
telnet <serverip> 143

If either of those commands fail, you're able to pinpoint which service to check on your server.

For most variants of Linux, you can check both services with a single command: netstat -plan|egrep -i "110|143". The resulting output will show if the services are listening and which process is doing the listening. In Windows, you can run a similar command from a command prompt: netstat -anb|find "LISTEN"| findstr "110 143".

If the ports are listening, and you're able to connect to them over telnet, your next stop should be your server's error logs.

Check Error Logs

You want to look for any mail errors that might clue you into the root cause of your email issues. In Linux, you can check /var/log/maillog, and in Windows, you can filter eventvwr.msc for mail only. If there are errors, a simple search will highlight them quickly.

If there are no errors, it's time to dig into the mail queue directly.

Check the Mail Queue

Depending on the mail server you use, the commands here are going to vary. Here are a few examples of how we'd investigate the most common mail servers we encounter:

QMail

Display the mail queue: /var/qmail/bin/qmail-qread
Display the number of messages in the queue: /var/qmail/bin/qmail-qstat
Reference article: Gaining Control Over the QMail Queue

Sendmail

Display the mail queue: sendmail -bp or mailq
Display the number of messages in the queue: mailq –OmaxQueueRunSize=1
Reference article: Quick Sendmail Cheatsheet

Exim

Display the mail queue: exim -bp
Display the number of messages in the queue: exim -bpc
Reference article: Exim cheatsheet

MailEnable

MailEnable users can can check to see that messages are moving by opening the mail directory:
Program Files\MailEnable\Queues\SMTP\Inbound\Messages
Reference article: How to diagnose inbound message delivery delays

With these commands, you can filter through the email queues to see whether any of them are for the users or domains you're having problems with. If nothing obvious presents itself at that point, it's time for some active testing.

Active Testing

Send an email to your mailserver from an external mailserver (anything will do as long as it's not on the same server). Watch for logging of the email as it's delivered:
tail -f maillog
On busy mailservers you might add |grep youremailid or simply look for a new message in the directory where the email will be stored.

The your primary goal in troubleshooting your email issues in this way is to isolate the root cause of your problem so that you can fix it more quickly. SoftLayer customers have direct access to our support team to help you through this process, but it's always nice to keep a quick reference like this in your back pocket to be able to pinpoint the problem yourself.

-Bill

October 14, 2013

Product Spotlight: Vyatta Network Gateway Appliance

In the wake of our recent Vyatta network gateway appliance product launch, I thought I'd address some of the most common questions customers have asked me about the new offering. With inquiries spanning the spectrum from broad and general to detailed and specific, I might not be able to cover everything in this blog post, but at the very least, it should give a little more context for our new network gateway offering.

To begin, let's explore the simplest question I've been asked: "What is a network gateway?" A network gateway provides tools to manage traffic into and out of one or more VLANs (Virtual Local Area Networks). The network gateway serves a customer-configurable routing device that sits in front of designated VLANs. The servers in those VLANs route through the network gateway appliance as their first hop instead of Front-end Customer Routers (FCR) or Back-end Customer Routers (BCR). From an infrastructure perspective, SoftLayer's network gateway offering consists of a single server, and in the future, the offering will be expanded to multi-server configurations to support high availability needs and larger clustered configurations.

The general function of a network gateway may seem a little abstract, so let's look at a couple real world use cases to see how you can put that functionality to work in your own cloud environment.

Example 1: Complex Traffic Management
You have a multi-server cloud environment and a complex set of firewall rules that allow certain types of traffic to certain servers from specific addresses. Without a network gateway, you would need to configure multiple hardware and software firewalls throughout your topology and maintain multiple rules sets, but with the network gateway appliance, you streamline your configuration into a single point of control on both the public and private networks.

After you order a gateway appliance in the SoftLayer portal and configure which VLANs route through the appliance, the process of configuring the device is simple: You define your production, development and QA environments with distinct traffic rules, and the network gateway handles the traffic segmentation. If you wanted to create your own VPN to connect your hosted environment to your office or in-house data center, that configuration is quick and easy as well. The high-touch challenge of managing several sets of network rules across multiple devices is simplified and streamlined.

Example 2: Creating a Static NAT
You want to create a static NAT (Network Address Translation) so that you can direct traffic through a public IP address to an internal IP address. With the IPv4 address pool dwindling and new allocations being harder to come by, this configuration is becoming extremely popular to accommodate users who can't yet reach IPv6 addresses. This challenge would normally require a significant level of effort of even the most seasoned systems administrator, but with the gateway appliance, it's a painless process.

In addition to the IPv4 address-saving benefits, your static NAT adds a layer of protection for your internal web servers from the public network, and as we discussed in the first example, your gateway device also serves as a single configuration point for both inbound and outbound firewall rules.

If you have complex network-related needs, and you want granular control of the traffic to and from your servers, a gateway appliance might be the perfect tool for you. You get the control you want and save yourself a significant amount of time and effort configuring and tweaking your environment on-the-fly. You can terminate IPSec VPN tunnels, execute your own network address translation, and run diagnostic commands such as traffic monitoring (tcpdump) on your global environment. And in addition to that, your gateway serves as a single point of contact to configure sophisticated firewall rules!

If you want to learn more about the gateway appliance, check out KnowledgeLayer or contact our friendly sales team directly with your questions: sales@softlayer.com

-Ben

October 3, 2013

Improving Communications for Customer-Affecting Events

Service disruptions are never a good thing. Though SoftLayer invests extensively in design, equipment, and personnel training to reduce the risk of disruptions to our customers, in the technology world there are times where scheduled events or unplanned incidents are inevitable. During those times, we understand that restoring service is top priority, and almost as important is communicating to customers regarding the cause of the incident and the current status of our work to resolve it.

To date we've used a combination of tickets, emails, forum posts, portal "yellow" notifications, as well as RSS and Twitter feeds to provide status updates during service-affecting events. Many of these methods require customers to "come and get it," so we've been working on a more targeted, proactive approach to disseminating information.

I'm excited to report that our Development and Operations teams have collaborated on new functionality in the SoftLayer portal that will improve the way we share information with customers about unplanned infrastructure troubles or upcoming planned maintenances. With our new Event Communications toolset, we're able to pinpoint the accounts affected by an event and update users who opt-in to receive notifications about how these events may impact their services.

Notifications

As the development work is finalized, we plan to roll out a few phases of improvements. The first phase of implementation, which is ready today, enables email alerts for unplanned incidents, and any portal user account can opt-in to receive them. These emails provide details about the impact and current status of an unplanned incident in progress (UIP). In this phase, notifications can be sent for devices such as physical servers, CCIs and shared SLB VIPs, and we will be adding additional services over time.

In future phases of this project, we plan to include:

  • A new "Event" section of the Customer Portal which will allow customers to browse upcoming scheduled maintenances or current/recent unplanned incidents which may impact their services. In the past, we generated tickets for scheduled maintenances, so separating these event notifications will improve customer visibility.
  • Enhanced visibility for events in our mobile apps (phone/tablet).
  • Updates to affected services for a given event as customers add / change services.
  • Notification of newly added or newly updated events that have not been read by the user (similar email "inbox" functionality) in the portal.
  • Identification of any related current or recent events as a customer begins to open a ticket in the portal.
  • Reminders of upcoming scheduled maintenances along with progress updates to the event notification throughout the maintenance in some cases.
  • Improved ability to correlate specific incidents to customer service troubles.
  • Dissemination of RFO (reason-for-outage) statements to customers following a post-incident review of an unplanned service disruption.

Since we respect our customers' inboxes, these notifications will only be sent to user accounts that have opted in. If you'd like to receive them, simply log into the Customer Portal and navigate to "Notification Subscriptions" under the "Administration" menu (direct link). From that page, individual users can control event subscriptions, and portal logins that have administrative control over multiple users on the account can control the opt-in for themselves and their downstream users. For a more detailed walkthrough of the opt-in process, visit the KnowledgeLayer: "Update Subscription Settings for the Event Management System"

The Network Operations Center has already begun using this customer notification toolset for customer-affecting events, so we recommend that you opt-in as soon as possible to benefit from this new functionality.

-Dani

September 12, 2013

"Cloud First" or "Mobile First" - Which Development Strategy Comes First?

Company XYZ knows that the majority of its revenue will come from recurring subscriptions to its new SaaS service. To generate visibility and awareness of the SaaS offering, XYZ needs to develop a mobile presence to reach the offering's potential audience. Should XYZ focus on building a mobile presence first (since its timing is most critical), or should it prioritize the completion of the cloud service first (since its importance is most critical)? Do both have to be delivered simultaneously?

It's the theoretical equivalent of the "Which came first: The chicken or the egg?" causality dilemma for many technology companies today.

Several IBM customers have asked me recently about whether the implementation of a "cloud first" strategy or a "mobile first" strategy is most important, and it's a fantastic question. They know that cloud and mobile are not mutually exclusive, but their limited development resources demand that some sort of prioritization be in place. However, should this prioritization be done based on importance or urgency?

IBM MobileFirst

The answer is what you'd expect: It depends! If a company's cloud offering consists solely of back-end services (i.e. no requirement or desire to execute natively on a mobile device), then a cloud-first strategy is clearly needed, right? A mobile presence would only be effective in drawing customers to the back-end services if they are in place and work well. However, what if the cloud offering is targeting only mobile users? Not focusing on the mobile-first user experience could sabotage a great set of back-end services.

As this simple example illustrated, prioritizing one development strategy at the expense of the other strategy can have devastating consequences. In this "Is there an app for that?" generation, a lack of predictable responsiveness for improved quality of service and/or quality of experience can drive your customers to your competitors who are only a click away. Continuous delivery is an essential element of both "cloud first and "mobile first" development. The ability to get feedback quickly from users for new services (and more importantly incorporate that feedback quickly) allows a company to re-shape a service to turn existing users into advocates for the service as well as other adjacent or tiered services. "Cloud first" developers need a cloud service provider that can provide continuous delivery of predictable and superior compute, storage and network services that can be optimized for the type of workload and can adapt to changes in scale requirements. "Mobile first" developers need a mobile application development platform that can ensure the quality of the application's mobile user experience while allowing the mobile application to also leverage back-end services. To accommodate both types of developers, IBM established two "centers of gravity" to allow our customers to strike the right balance between their "cloud first" and "mobile first" development.

It should come as no surprise that the cornerstone of IBM's cloud first offering is SoftLayer. SoftLayer's APIs to its infrastructure services allow companies to optimize their application services based on the needs of application, and the SoftLayer network also optimizes delivery of the application services to the consumer of the service regardless of the location or the type of client access.

For developers looking to prioritize the delivery of services on mobile devices, we centered our MobileFirst initiative on Worklight. Worklight balances the native mobile application experience and integration with back-end services to streamline the development process for "mobile first" companies.

We are actively working on the convergence of our IBM Cloud First and Mobile First strategies via optimized integration of SoftLayer and Worklight services. IBM customers from small businesses through large enterprises will then be able to view "cloud first and "mobile first" as two sides of the same development strategy coin.

-Mac

Mac Devine is an IBM distinguished engineer, director of cloud innovation and CTO, IBM Cloud Services Division. Follow him on Twitter: @mac_devine.

August 14, 2013

Setting Up Your New Server

As a technical support specialist at SoftLayer, I work with new customers regularly, and after fielding a lot of the same kinds of questions about setting up a new server, I thought I'd put together a quick guide that could be a resource for other new customers who are interested in implementing a few best practices when it comes to setting up and securing a new server. This documentation is based on my personal server setup experience and on the experience I've had helping customers with their new servers, so it shouldn't be considered an exhaustive, authoritative guide ... just a helpful and informative one.

Protect Your Data

First and foremost, configure backups for your server. The server is worthless without your data. Data is your business. An old adage says, "It's better to have and not need, than to need and not have." Imagine what would happen to your business if you lost just some of your data. There's no excuse for neglecting backup when configuring your new server. SoftLayer does not backup your server, but SoftLayer offers several options for data protection and backup to fit any of your needs.

Control panels like cPanel and Plesk include backup functionality and can be configured to automatically backup regularly an FTP/NAS account. Configure backups now, before doing anything else. Before migrating or copying your data to the server. This first (nearly empty) backup will be quick. Test the backup by restoring the data. If your server has RAID, it important to remember that RAID is not backup!

For more tips about setting up and checking your backups, check out Risk Management: The Importance of Redundant Backups

Use Strong Passwords

I've seen some very week and vulnerable password on customers' servers. SoftLayer sets a random, complex password on every new server that is provisioned. Don't change it to a weak password using names, birthdays and other trivia that can be found or guessed easily. Remember, a strong password doesn't have to be a complicated one: xkcd: Password Strength

Write down your passwords: "If I write them down and then protect the piece of paper — or whatever it is I wrote them down on — there is nothing wrong with that. That allows us to remember more passwords and better passwords." "We're all good at securing small pieces of paper. I recommend that people write their passwords down on a small piece of paper, and keep it with their other valuable small pieces of paper: in their wallet." Just don't use any of these passwords.
I've gone electronic and use 1Password and discovered just how many passwords I deal with. With such strong, random passwords, you don't have to change your password frequently, but if you have to, you don't have to worry about remembering the new one or updating all of your notes. If passwords are too much of a hassle ...

Or Don't Use Passwords

One of the wonderful things of SSH/SFTP on Linux/FreeBSD is that SSH-keys obviate the problem of passwords. Mac and Linux/FreeBSD have an SSH-client installed by default! There are a lot of great SSH clients available for every medium you'll use to access your server. For Windows, I recommend PuTTY, and for iOS, Panic Prompt.

Firewall

Firewalls block network connections. Configuring a firewall manually can get very complicated, especially when involving protocols like FTP which opens random ports on either the client or the server. A quick way to deal with this is to use the system-config-securitylevel-tui tool. Or better, use a firewall front end such as APF or CSF. These tools also simplify blocking or unblocking IPs.

Firewall Allow Block Unblock
APF apf -a <IP> apf -d <IP> apf -u <IP>
CSF* csf -a <IP> csf -d <IP> csf -dr <IP>

*CSF has a handy search command: csf -g <IP>.

SoftLayer customers should be sure to allow SoftLayer IP ranges through the firewall so we can better support you when you have questions or need help. Beyond blocking and allowing IP addresses, it's also important to lock down the ports on your server. The only open ports on your system should be the ones you want to use. Here's a quick list of some of the most common ports:

cPanel ports

  • 2078 - webDisk
  • 2083 - cPanel control panel
  • 2087 - WHM control panel
  • 2096 - webmail

Other

  • 22 - SSH (secure shell - Linux)
  • 53 - DNS name servers
  • 3389 - RDP (Remote Desktop Protocol - Windows)
  • 8443 - Plesk control panel

Mail Ports

  • 25 - SMTP
  • 110 - POP3
  • 143 - IMAP
  • 465 - SMTPS
  • 993 - IMAPS
  • 995 - POP3S

Web Server Ports

  • 80 - HTTP
  • 443 - HTTPS

DNS

DNS is a naming system for computers and services on the Internet. Domain names like "softlayer.com" and "manage.softlayer.com" are easier to remember than IP address like 66.228.118.53 or even 2607:f0d0:1000:11:1::4 in IPv6. DNS looks up a domain's A record (AAAA record for IPv6), to retrieve its IP address. The opposite of an A record is a PTR record: PTR records resolve an IP address to a domain name.

Hostname
A hostname is the human-readable label you assign of your server to help you differentiate it from your other devices. A hostname should resolve to its server's main IP address, and the IP should resolve back to the hostname via a PTR record. This configuration is extremely important for email ... assuming you don't want all of your emails rejected as spam.

Avoid using "www" at the beginning of a hostname because it may conflict with a website on your server. The hostnames you choose don't have to be dry or boring. I've seen some pretty awesome hostname naming conventions over the years (Simpsons characters, Greek gods, superheros), so if you aren't going to go with a traditional naming convention, you can get creative and have some fun with it. A server's hostname can be changed in the customer portal and in the server's control panel. In cPanel, the hostname can be easily set in "Networking Setup". In Plesk, the hostname is set in "Server Preferences". Without a control panel, you can update the hostname from your operating system (ex. RedHat, Debian)

A Records
If you buy your domain name from SoftLayer, it is automatically added to our nameservers, but if your domain was registered externally, you'll need to go through a few additional steps to ensure your domain resolves correctly on our servers. To include your externally-registered domain on our DNS, you should first point it at our nameservers (ns1.softlayer.com, ns2.softlayer.com). Next, Add a DNS Zone, then add an A record corresponding to the hostname you chose earlier.

PTR Records
Many ISPs configure their servers that receive email to lookup the IP address of the domain in a sender's email address (a reverse DNS check) to see that the domain name matches the email server's host name. You can look up the PTR record for your IP address. In Terminal.app (Mac) or Command Prompt (Windows), type "nslookup" command followed by the IP. If the PTR doesn't match up, you can change the PTR easily.

NSLookup

SSL Certificates

Getting an SSL certificate for your site is optional, but it has many benefits. The certificates will assure your customers that they are looking at your site securely. SSL encrypts passwords and data sent over the network. Any website using SSL Certificates should be assigned its own IP address. For more information, we have a great KnowledgeLayer article about planning ahead for an SSL, and there's plenty of documentation on how to manage SSL certificates in cPanel and Plesk.

Move In!

Now that you've prepared your server and protected your data, you are ready to migrate your content to its new home. Be proactive about monitoring and managing your server once it's in production. These tips aren't meant to be a one-size-fits-all, "set it and forget it" solution; they're simply important aspects to consider when you get started with a new server. You probably noticed that I alluded to control panels quite a few times in this post, and that's for good reason: If you don't feel comfortable with all of the ins and outs of server administration, control panels are extremely valuable resources that do a lot of the heavy lifting for you.

If you have any questions about setting up your new server or you need any help with your SoftLayer account, remember that we're only a phone call away!

-Lyndell

August 1, 2013

The "Unified Field Theory" of Storage

This guest blog was contributed by William Rocca of OS NEXUS. OS NEXUS makes the Quantastor Software Defined Storage platform designed to tackle the storage challenges facing cloud computing, Big Data and high performance applications.

Over the last decade, the creation and popularization of SAN/NAS systems simplified the management of storage into a single appliance so businesses could efficiently share, secure and manage data centrally. Fast forward about 10 years in storage innovation, and we're now rapidly changing from a world of proprietary hardware sold by big-iron vendors to open-source, scale-out storage technologies from software-only vendors that make use of commodity off-the-shelf hardware. Some of the new technologies are derivatives of traditional SAN/NAS with better scalability while others are completely new. Object storage technologies such as OpenStack SWIFT have created a foundation for whole new types of applications, and big data technologies like MongoDB, Riak and Hadoop go even further to blur the lines between storage and compute. These innovations provide a means for developing next-generation applications that can collect and analyze mountains of data. This is the exciting frontier of open storage today.

This frontier looks a lot like the "Wild West." With ad-hoc solutions that have great utility but are complex to setup and maintain, many users are effectively solving one-off problems, but these solutions are often narrowly defined and specifically designed for a particular application. The question everyone starts asking is, "Can't we just evolve to having one protocol ... one technology that unites them all?"

If each of these data storing technologies have unique advantages for specific use cases or applications, the answer isn't to eliminate protocols. To borrow a well-known concept from Physics, the solution lies in a "Unified Field Theory of Storage" — weaving them together into a cohesive software platform that makes them simple to deploy, maintain and operate.

When you look at the latest generation of storage technologies, you'll notice a common thread: They're all highly-available, scale-out, open-source and serve as a platform for next-generation applications. While SAN/NAS storage is still the bread-and-butter enterprise storage platform today (and will be for some time to come) these older protocols often don't measure up to the needs of applications being developed today. They run into problems storing, processing and gleaning value out of the mountains of data we're all producing.

Thinking about these challenges, how do we make these next-generation open storage technologies easy to manage and turn-key to deploy? What kind of platform could bring them all together? In short, "What does the 'Unified Field Theory of Storage' look like?"

These are the questions we've been trying to answer for the last few years at OS NEXUS, and the result of our efforts is the QuantaStor Software Defined Storage platform. In its first versions, we focused on building a flexible foundation supporting the traditional SAN/NAS protocols but with the launch of QuantaStor v3 this year, we introduced the first scale-out version of QuantaStor and integrated the first next-gen open storage technology, Gluster, into the platform. In June, we launched support of ZFS on Linux (ZoL), and enhanced the platform with a number of advanced enterprise features, such as snapshots, compression, deduplication and end-to-end checksums.

This is just the start, though. In our quest to solve the "Unified Field Theory of Storage," we're turning our eyes to integrating platforms like OpenStack SWIFT and Hadoop in QuantaStor v4 later this year, and as these high-power technologies are streamlined under a single platform, end users will have the ability to select the type(s) of storage that best fit a given application without having to learn (or unlearn) specific technologies.

The "Unified Field Theory of Storage" is emerging, and we hope to make it downloadable. Visit OSNEXUS.com to keep an eye on our progress. If you want to incorporate QuantaStor into your environment, check out SoftLayer's preconfigured QuantaStor Mass Storage Server solution.

-William Rocca, OS NEXUS

July 29, 2013

A Brief History of Cloud Computing

Believe it or not, "cloud computing" concepts date back to the 1950s when large-scale mainframes were made available to schools and corporations. The mainframe's colossal hardware infrastructure was installed in what could literally be called a "server room" (since the room would generally only be able to hold a single mainframe), and multiple users were able to access the mainframe via "dumb terminals" – stations whose sole function was to facilitate access to the mainframes. Due to the cost of buying and maintaining mainframes, an organization wouldn't be able to afford a mainframe for each user, so it became practice to allow multiple users to share access to the same data storage layer and CPU power from any station. By enabling shared mainframe access, an organization would get a better return on its investment in this sophisticated piece of technology.

Mainframe Computer

A couple decades later in the 1970s, IBM released an operating system called VM that allowed admins on their System/370 mainframe systems to have multiple virtual systems, or "Virtual Machines" (VMs) on a single physical node. The VM operating system took the 1950s application of shared access of a mainframe to the next level by allowing multiple distinct compute environments to live in the same physical environment. Most of the basic functions of any virtualization software that you see nowadays can be traced back to this early VM OS: Every VM could run custom operating systems or guest operating systems that had their "own" memory, CPU, and hard drives along with CD-ROMs, keyboards and networking, despite the fact that all of those resources would be shared. "Virtualization" became a technology driver, and it became a huge catalyst for some of the biggest evolutions in communications and computing.

Mainframe Computer

In the 1990s, telecommunications companies that had historically only offered single dedicated point–to-point data connections started offering virtualized private network connections with the same service quality as their dedicated services at a reduced cost. Rather than building out physical infrastructure to allow for more users to have their own connections, telco companies were able to provide users with shared access to the same physical infrastructure. This change allowed the telcos to shift traffic as necessary to allow for better network balance and more control over bandwidth usage. Meanwhile, virtualization for PC-based systems started in earnest, and as the Internet became more accessible, the next logical step was to take virtualization online.

If you were in the market to buy servers ten or twenty years ago, you know that the costs of physical hardware, while not at the same level as the mainframes of the 1950s, were pretty outrageous. As more and more people expressed demand to get online, the costs had to come out of the stratosphere, and one of the ways that was made possible was by ... you guessed it ... virtualization. Servers were virtualized into shared hosting environments, Virtual Private Servers, and Virtual Dedicated Servers using the same types of functionality provided by the VM OS in the 1950s. As an example of what that looked like in practice, let's say your company required 13 physical systems to run your sites and applications. With virtualization, you can take those 13 distinct systems and split them up between two physical nodes. Obviously, this kind of environment saves on infrastructure costs and minimizes the amount of actual hardware you would need to meet your company's needs.

Virtualization

As the costs of server hardware slowly came down, more users were able to purchase their own dedicated servers, and they started running into a different kind of problem: One server isn't enough to provide the resources I need. The market shifted from a belief that "these servers are expensive, let's split them up" to "these servers are cheap, let's figure out how to combine them." Because of that shift, the most basic understanding of "cloud computing" was born online. By installing and configuring a piece of software called a hypervisor across multiple physical nodes, a system would present all of the environment's resources as though those resources were in a single physical node. To help visualize that environment, technologists used terms like "utility computing" and "cloud computing" since the sum of the parts seemed to become a nebulous blob of computing resources that you could then segment out as needed (like telcos did in the 90s). In these cloud computing environments, it became easy add resources to the "cloud": Just add another server to the rack and configure it to become part of the bigger system.

Clouds

As technologies and hypervisors got better at reliably sharing and delivering resources, many enterprising companies decided to start carving up the bigger environment to make the cloud's benefits to users who don't happen to have an abundance of physical servers available to create their own cloud computing infrastructure. Those users could order "cloud computing instances" (also known as "cloud servers") by ordering the resources they need from the larger pool of available cloud resources, and because the servers are already online, the process of "powering up" a new instance or server is almost instantaneous. Because little overhead is involved for the owner of the cloud computing environment when a new instance is ordered or cancelled (since it's all handled by the cloud's software), management of the environment is much easier. Most companies today operate with this idea of "the cloud" as the current definition, but SoftLayer isn't "most companies."

SoftLayer took the idea of a cloud computing environment and pulled it back one more step: Instead of installing software on a cluster of machines to allow for users to grab pieces, we built a platform that could automate all of the manual aspects of bringing a server online without a hypervisor on the server. We call this platform "IMS." What hypervisors and virtualization do for a group of servers, IMS does for an entire data center. As a result, you can order a bare metal server with all of the resources you need and without any unnecessary software installed, and that server will be delivered to you in a matter of hours. Without a hypervisor layer between your operating system and the bare metal hardware, your servers perform better. Because we automate almost everything in our data centers, you're able to spin up load balancers and firewalls and storage devices on demand and turn them off when you're done with them. Other providers have cloud-enabled servers. We have cloud-enabled data centers.

SoftLayer Pod

IBM and SoftLayer are leading the drive toward wider adoption of innovative cloud services, and we have ambitious goals for the future. If you think we've come a long way from the mainframes of the 1950s, you ain't seen nothin' yet.

-James

Categories: 
July 24, 2013

Deconstructing SoftLayer's Three-Tiered Network

When Sun Microsystems VP John Gage coined the phrase, "The network is the computer," the idea was more wishful thinking than it was profound. At the time, personal computers were just starting to show up in homes around the country, and most users were getting used to the notion that "The computer is the computer." In the '80s, the only people talking about networks were the ones selling network-related gear, and the idea of "the network" was a little nebulous and vaguely understood. Fast-forward a few decades, and Gage's assertion has proven to be prophetic ... and it happens to explain one of SoftLayer's biggest differentiators.

SoftLayer's hosting platform features an innovative, three-tier network architecture: Every server in a SoftLayer data center is physically connected to public, private and out-of-band management networks. This "network within a network" topology provides customers the ability to build out and manage their own global infrastructure without overly complex configurations or significant costs, but the benefits of this setup are often overlooked. To best understand why this network architecture is such a game-changer, let's examine each of the network layers individually.

SoftLayer Private Network

Public Network

When someone visits your website, they are accessing content from your server over the public network. This network connection is standard issue from every hosting provider since your content needs to be accessed by your users. When SoftLayer was founded in 2005, we were the first hosting provider to provide multiple network connections by default. At the time, some of our competitors offered one-off private network connections between servers in a rack or a single data center phase, but those competitors built their legacy infrastructures with an all-purpose public network connection. SoftLayer offers public network connection speeds up to 10Gbps, and every bare metal server you order from us includes free inbound bandwidth and 5TB of outbound bandwidth on the public network.

Private Network

When you want to move data from one server to another in any of SoftLayer's data centers, you can do so quickly and easily over the private network. Bandwidth between servers on the private network is unmetered and free, so you don't incur any costs when you transfer files from one server to another. Having a dedicated private network allows you to move content between servers and facilities without fighting against or getting in the way of the users accessing your server over the public network.

It should come as no surprise to learn that all private network traffic stays on SoftLayer's network exclusively when it travels between our facilities. The blue lines in this image show how the private network connects all of our data centers and points of presence:

SoftLayer Private Network

To fully replicate the functionality provided by the SoftLayer private network, competitors with legacy single-network architecture would have to essentially double their networking gear installation and establish safeguards to guarantee that customers can only access information from their own servers via the private network. Because that process is pretty daunting (and expensive), many of our competitors have opted for "virtual" segmentation that logically links servers to each other. The traffic between servers in those "virtual" private networks still travels over the public network, so they usually charge you for "private network" bandwidth at the public bandwidth rate.

Out-of-Band Management Network

When it comes to managing your server, you want an unencumbered network connection that will give you direct, secure access when you need it. Splitting out the public and private networks into distinct physical layers provides significant flexibility when it comes to delivering content where it needs to go, but we saw a need for one more unique network layer. If your server is targeted for a denial of service attack or a particular ISP fails to route traffic to your server correctly, you're effectively locked out of your server if you don't have another way to access it. Our management-specific network layer uses bandwidth providers that aren't included in our public/private bandwidth mix, so you're taking a different route to your server, and you're accessing the server through a dedicated port.

If you've seen pictures or video from a SoftLayer data center (or if you've competed in the Server Challenge), you probably noticed the three different colors of Ethernet cables connected at the back of every server rack, and each of those colors carries one of these types of network traffic exclusively. The pink/red cables carry public network traffic, the blue cables carry private network traffic, and the green cables carry out-of-band management network traffic. All thirteen of our data centers have the same colored cables in the same configuration doing the same jobs, so we're able to train our operations staff consistently between all thirteen of our data centers. That consistency enables us to provide quicker service when you need it, and it lessens the chance of human error on the data center floor.

The most powerful server on the market can be sidelined by a poorly designed, inefficient network. If "the network is the computer," the network should be a primary concern when you select your next hosting provider.

-@khazard

July 16, 2013

Riak Performance Analysis: Bare Metal v. Virtual

In December, I posted a MongoDB performance analysis that showed the quantitative benefits of using bare metal servers for MongoDB workloads. It should come as no surprise that in the wake of SoftLayer's Riak launch, we've got some similar data to share about running Riak on bare metal.

To run this test, we started by creating five-node clusters with Riak 1.3.1 on SoftLayer bare metal servers and on a popular competitor's public cloud instances. For the SoftLayer environment, we created these clusters using the Riak Solution Designer, so the nodes were all provisioned, configured and clustered for us automatically when we ordered them. For the public cloud virtual instance Riak cluster, each node was provisioned indvidually using a Riak image template and manually configured into a cluster after all had come online. To optimize for Riak performance, I made a few tweaks at the OS level of our servers (running CentOS 64-bit):

Noatime
Nodiratime
barrier=0
data=writeback
ulimit -n 65536

The common Noatime and Nodiratime settings eliminate the need for writes during reads to help performance and disk wear. The barrier and writeback settings are a little less common and may not be what you'd normally set. Although those settings present a very slight risk for loss of data on disk failure, remember that the Riak solution is deployed in five-node rings with data redundantly available across multiple nodes in the ring. With that in mind and considering each node also being deployed with a RAID10 storage array, you can see that the minor risk for data loss on the failure of a single disk in the entire solution would have no impact on the entire data set (as there are plenty of redundant copies for that data available). Given the minor risk involved, the performance increases of those two settings justify their use.

With all of the nodes tweaked and configured into clusters, we set up Basho's test harness — Basho Bench — to remotely simulate load on the deployments. Basho Bench allows you to create a configurable test plan for a Riak cluster by configuring a number of workers to utilize a driver type to generate load. It comes packaged as an Erlang application with a config file example that you can alter to create the specifics for the concurrency, data set size, and duration of your tests. The results can be viewed as CSV data, and there is an optional graphics package that allows you to generate the graphs that I am posting in this blog. A simplified graphic of our test environment would look like this:

Riak Test Environment

The following Basho Bench config is what we used for our testing:

{mode, max}.
{duration, 120}.
{concurrent, 8}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator,{int_to_bin,{uniform_int,1000000}}}.
{value_generator,{exponential_bin,4098,50000}}.
{riakc_pb_ips, [{10,60,68,9},{10,40,117,89},{10,80,64,4},{10,80,64,8},{10,60,68,7}]}.
{riakc_pb_replies, 2}.
{operations, [{get, 10},{put, 1}]}.

To spell it out a little simpler:

Tests Performed

Data Set: 400GB
10:1 Query-to-Update Operations
8 Concurrent Client Connections
Test Duration: 2 Hours

You may notice that in the test cases that use SoftLayer "Medium" Servers, the virtual provider nodes are running 26 virtual compute units against our dual proc hex-core servers (12 cores total). In testing with Riak, memory is important to the operations than CPU resources, so we provisioned the virtual instances to align with the 36GB of memory in each of the "Medium" SoftLayer servers. In the public cloud environment, the higher level of RAM was restricted to packages with higher CPU, so while the CPU counts differ, the RAM amounts are as close to even as we could make them.

One final "housekeeping" note before we dive into the results: The graphs below are pulled directly from the optional graphics package that displays Basho Bench results. You'll notice that the scale on the left-hand side of graphs differs dramatically between the two environments, so a cursory look at the results might not tell the whole story. Click any of the graphs below for a larger version. At the end of each test case, we'll share a few observations about the operations per second and latency results from each test. When we talk about latency in the "key observation" sections, we'll talk about the 99th percentile line — 99% of the results had latency below this line. More simply you could say, "This is the highest latency we saw on this platform in this test." The primary reason we're focusing on this line is because it's much easier to read on the graphs than the mean/median lines in the bottom graphs.

Riak Test 1: "Small" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Small Riak Server Node
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
4 x 500GB SATAII – RAID10
1Gb Bonded Network
Virtual Provider Node
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
4 x 500GB Network Storage – RAID10
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

The SoftLayer environment showed much more consistency in operations per second with an average throughput around 450 Op/sec. The virtual environment throughput varied significantly between about 50 operations per second to more than 600 operations per second with the trend line fluctuating slightly between about 220 Op/sec and 350 Op/sec.

Comparing the latency of get and put requests, the 99th percentile of results in the SoftLayer environment stayed around 50ms for gets and under 200ms for puts while the same metric for the virtual environment hovered around 800ms in gets and 4000ms in puts. The scale of the graphs is drastically different, so if you aren't looking closely, you don't see how significantly the performance varies between the two.

Riak Test 2: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 300GB 15K SAS – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

Similar to the results of Test 1, the throughput numbers from the bare metal environment are more consistent (and are consistently higher) than the throughput results from the virtual instance environment. The SoftLayer environment performed between 1500 and 1750 operations per second on average while the virtual provider environment averaged around 1200 operations per second throughout the test.

The latency of get and put requests in Test 2 also paints a similar picture to Test 1. The 99th percentile of results in the SoftLayer environment stayed below 50ms and under 400ms for puts while the same metric for the virtual environment averaged about 250ms in gets and over 1000ms in puts. Latency in a big data application can be a killer, so the results from the virtual provider might be setting off alarm bells in your head.

Riak Test 3: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 128GB SSD – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

In Test 3, we're using the same specs in our virtual provider nodes, so the results for the virtual node environment are the same in Test 3 as they are in Test 2. In this Test, the SoftLayer environment substitutes SSD hard drives for the 15K SAS drives used in Test 2, and the throughput numbers show the impact of that improved I/O. The average throughput of the bare metal environment with SSDs is between 1750 and 2000 operations per second. Those numbers are slightly higher than the SoftLayer environment in Test 2, further distancing the bare metal results from the virtual provider results.

The latency of gets for the SoftLayer environment is very difficult to see in this graph because the latency was so low throughout the test. The 99th percentile of puts in the SoftLayer environment settled between 500ms and 625ms, which was a little higher than the bare metal results from Test 2 but still well below the latency from the virtual environment.

Summary

The results show that — similar to the majority of data-centric applications that we have tested — Riak has more consistent, better performing, and lower latency results when deployed onto bare metal instead of a cluster of public cloud instances. The stark differences in consistency of the results and the latency are noteworthy for developers looking to host their big data applications. We compared the 99th percentile of latency, but the mean/median results are worth checking out as well. Look at the mean and median results from the SoftLayer SSD Node environment: For gets, the mean latency was 2.5ms and the median was somewhere around 1ms. For puts, the mean was between 7.5ms and 11ms and the median was around 5ms. Those kinds of results are almost unbelievable (and that's why I've shared everything involved in completing this test so that you can try it yourself and see that there's no funny business going on).

It's commonly understood that local single-tenant resources that bare metal will always perform better than network storage resources, but by putting some concrete numbers on paper, the difference in performance is pretty amazing. Virtualizing on multi-tenant solutions with network attached storage often introduces latency issues, and performance will vary significantly depending on host load. These results may seem obvious, but sometimes the promise of quick and easy deployments on public cloud environments can lure even the sanest and most rational developer. Some applications are suited for public cloud, but big data isn't one of them. But when you have data-centric apps that require extreme I/O traffic to your storage medium, nothing can beat local high performance resources.

-Harold

April 30, 2013

Big Data at SoftLayer: Riak

Big data is only getting bigger. Late last year, SoftLayer teamed up with 10Gen to launch a high-performance MongoDB solution, and since then, many of our customers have been clamoring for us to support other big data platforms in the same way. By automating the provisioning process of a complex big data environment on bare metal infrastructure, we made life a lot easier for developers who demanded performance and on-demand scalability for their big data applications, and it's clear that our simple formula produced amazing results. As Marc mentioned when he started breaking down big data database models, document-oriented databases like MongoDB are phenomenal for certain use-cases, and in other situations, a key-value store might be a better fit. With that in mind, we called up our friends at Basho and started building a high-performance architecture specifically for Riak ... And I'm excited to announce that we're launching it today!

Riak is an open source, distributed database platform based on the principles enumerated in the DynamoDB paper. It uses a simple key/value model for object storage, and it was architected for high availability, fault tolerance, operational simplicity and scalability. A Riak cluster is composed of multiple nodes that are all connected, all communicating and sharing data automatically. If one node were to fail, the other nodes would automatically share the data that the failed node was storing and processing until the node is back up and running or a new node is added. See the diagram below for a simple illustration of how adding a node to a cluster works within Riak.

Riak Nodes

We will support both the open source and the Enterprise versions of Riak. The open source version is a great place to start. It has all of the database functionality of Riak Enterprise, but it is limited to a single cluster. The Enterprise version supports replication between clusters across data centers, giving you lots of architectural options. You can use replication to build highly available, live-live failover applications. You can also use it to distribute your application's data across regions, giving you a global platform that you can update anywhere in the world and know that those modifications will be available anywhere else. Riak Enterprise customers also receive 24×7 coverage, both from SoftLayer and Basho. This includes SoftLayer's one-hour guaranteed response for Severity 1 hardware issues and unlimited support available via our secure web portal, email and phone.

The business use-case for this flexibility is that if you need to scale up or down, nodes can be easily added or taken down as your requirements change. You can opt for a single-data center environment with a few nodes or you can broaden your architecture to a multi-data center deployment with a 40-node cluster. While these capabilities are inherent in Riak, they can be complicated to build and configure, so we spent countless hours working with Basho to streamline Riak deployment on the SoftLayer platform. The fruit of that labor can be found in our Riak Solution Designer:

Riak Solution Designer

The server configurations and packages in the Riak Solution Designer have been selected to deliver the performance, availability and stability that our customers expect from their bare metal and virtual cloud infrastructure at SoftLayer. With a few quick clicks, you can order a fully configured Riak environment, and it'll be provisioned and online for you in two to four hours. And everything you order is on a month-to-month contract.

Thanks to the hard work done by the SoftLayer development group and Basho's team, we're proud to be the first in the marketplace to offer a turn-key Riak solution on bare metal infrastructure. You don't need to sacrifice performance and agility for simplicity.

For more information, visit SoftLayer.com/Riak or contact our sales team.

-Duke

Subscribe to technology