Tips And Tricks Posts

February 5, 2016

Enable SSD caching on Bare Metal Server for 10X IOPS Improvements

Have you ever wondered how you could leverage the benefits of an SSD at the cost of cheap SATA hard drives?

SSDs provide extremely high IOPS for read and writes and are really tempting for creating volumes, which are IOPS centric. However, because SSD prices are significantly higher than SATA drives, IT managers are at a crossroad and must decide whether to go for SSDs and burn a fortune on them or stay with SATA drives.

But there is a way to use SATA drives and experience SSD performance using some intelligent caching techniques. If you have the right PCI RAID card installed on bare metal servers, you can leverage certain SSD caching feature benefits.

Make sure when configuring a bare metal server, which has sufficient drives bays (8+ the least), to have a LSI (AVAGO) MegaRAID card as the chosen RAID card. You can select the appropriate RAID configuration for OS and other workload data during the order process itself so that the RAIDs come preconfigured with them. As an additional resource for high speed cache device, consider ordering at least two or more SSDs. You can add this to your server even after deployment. These drives are the SSD caching drives that can be used to improve the overall performance of the cheap SATA drives from which one has carved out the volume. 

Install MSM for Easy Management of the RAID Card

Once the server is deployed, consider installing AVAGO MegaRAID Storage Manager (MSM) for the OS that has been installed in the server. (You can also perform a remote management of the RAID controller from a local machine by providing the IP of the server where the controller is installed).

Users can directly download MegaRAID Store Manager from the AVAGO website for the installed card in the machine. For the most popular MegaRAID SAS 9361-8i card download the MSM from the AVAGO website here.

How to Create CacheCade - SSD Caching Volumes and Attach to the Volume Drives

Follow these three steps to improve the IOPS on the existing Volumes on the bare metal server.

Step 1: Creating CacheCade Volumes

Once SSDs are deployed on bare metal servers and Regular Volumes are created, users can create a CacheCade volumes to perform SSD Caching. This can be easily achieved by right clicking AVAGO Controller and selecting the Create Cachecade – SSD Caching option.

Step 2: Choosing the right RAID Level and Write Policy for CacheCade Volumes

It is recommended to use a RAID 1 SSD Cache Cade Volume. This will eliminate a single point of failure at the SSD device level. This can be done by selecting available SSDs on the system and choosing RAID 1 as the RAID level. Click Add to add all available disks and Create Drive Group. Also, be sure to select Write Back as the Write Policy for increased IO performance for both Read and Writes to a Volume that needs to be cached. 

Step 3: Enabling SSD Caching For Volumes

If the Virtual Drives were created without SSD caching enabled, then this is the right time to enable them as shown below—selectively enable or disable set of Virtual drives which needs SSD caching.

Right click on the volume and select Enable SSD Caching.

Performance Comparison

We tried a simple comparison here on a 3.6TB RAID 50 (3 Drive with 2 Spans) volume with and without SSD caching using IOmeter tool (available here). The workload was a 50/50 (Read/Write) 4kb Pure Random IO workload subjected for about an hour on the volumes. 

Without SSD Caching – IOPS 970

With SSD Caching – IOPS 9000 (10X Improvement)

The result shows a 10X IOPS and workload dependent benefit. Results also show how repeatable the Read/Writes are happening with the same LBA.

This could certainly help a database application or IO centric workloads, which are hungry for IOPS, get an instant boost in performance. Try this today at Softlayer, and see the difference!!

-Subramanian 

 

February 3, 2016

Use TShark to see what traffic is passing through your gateway

Many of SoftLayer’s solutions make excellent use of the Brocade vRouter (Vyatta) dedicated security appliance. It’s a true network gateway, router, and firewall for your servers in a SoftLayer data center. It’s also an invaluable trouble-shooting tool should you have a connectivity issue or just want to take a gander at your network traffic. Built into vRouter’s command line and available to you, is a full-fledged terminal-based Wireshark command line implementation—TShark.

TShark is fully implemented in vRouter. If you’re already familiar with using TShark, you know you can call it from the terminal in either configuration or operational mode.  You accomplish this by prefacing a command with sudo; making the full command sudo tshark – flags.

For those of us less versed in the intricacies of Wireshark and its command line cousin, here are a couple of useful examples to help you out.

One common flag I use in nearly every capture is –i (and as a side note, for those coming from a Microsoft Windows background, the flags are case sensitive). -i is a specific interface on which to capture traffic and immediately helps to cut down on the amount of information unrelated to the problem at hand. If you don’t set this flag, the capture will default to “the first non-loopback address;” or in the case of vRouter on SoftLayer, Bond0. Additionally, if you want to trace a packet and reply, you can set –i any to watch or capture traffic through all the interfaces on the device.

The second flag that I nearly always use to define a capture filter is –f, which defines a filter to match traffic against. The only traffic that matches this pattern will be captured. The filter uses the standard Wireshark syntax. Again, if you’re familiar with Wireshark, you can go nuts; but here are a few of the common filters I frequently use to help you get started:

  • host 8.8.8.8 will match any traffic to or from the specified host. In this case, the venerable Google DNS servers. 
  • net 8.8.8.0/24 works just like host, but for the entire network specified, in case you don’t know the exact host address you are looking for.
  • dst and src are useful if you want to drill down to a specific flow or want to look at just the incoming or outgoing traffic. These filters are usually paired with a host or net to match against.
  • port lets you specify a port to capture traffic, like host and net. Used by itself, port will match both source and destination port. In the case of well-known services, you can also define the port by the common name, i.e., dns.  

One final cool trick with the –f filter is the and and the negation not. They let you combine search terms and specifically exclude traffic in order to create a very finely tuned capture for your needs.

If you want to capture to a file to share with a team or to plug into more advanced analysis tools on another system, the –w flag is your friend. Without -w, the file will behave like a tcpdump and the output will appear in your terminal session. If you want to load the file into Wireshark or another packet analyzer tool you should make sure to add the –F flag to specify the file format. Here is an example:

Vyatta# sudo tshark –i Bond0 –w testcap.pcap –F pcap –f ‘src 10.128.3.5 and not port 80’

The command will capture on Bond0 and output the capture to a .pcap file called testcap.pcap in the root directory of the file system. It will match only traffic on bond0 from 10.128.3.5 that is not source or destination port 22. While that is a bit of a mouthful to explain, it does capture a very well defined stream! 

Here is one more example:

Vyatta#sudo tshark –I any –f ‘host 10.145.23.4 and not ssh’

This command will capture traffic to the terminal that is to or from the specified IP (10.145.23.4) that is not SSH. I frequently use this filter, or one a lot like it, when I am SSHed into a host and want to get a more general idea of what it is doing on the network. I don’t care about ssh because I know the cause of that traffic (me!), but I want to know anything else that’s going to or from the host.

This is all very much the tip of the iceberg; you can find a lot more information at the TShark main page. Hopefully these tips help out next time you want to see just what traffic is passing through your gateway.

- Jeff 

 

January 27, 2016

Sales Primer for Non-Sales Startup Founders

The founder of one of the startups in our Global Entrepreneur Program reached out to me this week. He is ready to start selling his company’s product, but he's never done sales before.

Often, startups consist of a hacker and a hustler—where the tech person is the hacker and the non-tech person is the hustler. In the aforementioned company, there are three hackers. Despite the founder being deeply technical, he is the closest thing they have to a hustler. I'm sure he'll do fine getting in front of customers, but the fact remains that he's never done sales.

So where do you begin as a startup founder if you've never sold before?

Free vs. Paid
His business is B2B, focusing on car dealers. He's worried about facing a few problems, including working with business owners who don’t normally work with startups. He wants to give the product away for free to a few customers to get some momentum, but is worried that after giving it away, he won’t be able to convert them to paying customers.

Getting that first customer is incredibly important, but there needs to be a value exchange. Giving products away for free presents two challenges:

  1. By giving something away, you devalue your product in the eyes of the customer.
  2. The customer has no skin in the game—no incentive to use it or try to make it work.

Occasionally, founders have a very close relationship with a potential customer (e.g., a former manager or a trusted ex-colleague) where they can be assured the product will get used. In those cases, it might be appropriate to give it away, but only for a defined time.

The goal is sales. Paying customers reduce burn and show traction.

Price your product, go to market, and start conversations. Be willing to negotiate to get that first sale. If you do feel strongly about giving it away for free, put milestones and limitations in place for how and when that customer will convert to paid. For example, agree to a three-month free trial that becomes a paid fee in the fourth month. Or tie specific milestones to the payment, such as delivering new product features or achieving objectives for the client.

Build Credibility
When putting a new product in the market, especially one in an industry not enamored with startups and where phrases like “beta access” will net you funny looks, it helps to build credibility. This can be done incrementally. If you don't have customers, start with the conversations you’re having: “We’re currently in conversations with over a dozen companies.”

If you get asked about customers, don’t lie. Don’t even fudge it. I recommend being honest, and framing it by saying, “We’re deciding who we want to work with first. We want to find the right customer who is willing to work closely with us at the early stage. It’s the opportunity to have a deep impact on the future of the product. We're building this for you, after all.”

When you have interest and are in negotiations, you can then mention to other prospective customers that you’re in negotiations with several companies. Be respectful of the companies you’re in negotiations with; I wouldn't recommend mentioning names unless you have explicit permission to do so.

As you gain customers, get their permission to put them on your website. Get quotes from them about the product, and put those on your site and marketing materials. You can even put these in your sales contracts.

Following this method, you can build credibility in the market, show outside interest in your product, and maintain an ethical standing.

Get to No
A common phrase when I was first learning to sell was, “get to the ‘no’.” It has a double meaning: expect that someone is going to say “no” so be ready for it, and keep asking until you get a “no.” For example, if “Are you interested in my product?" gets you a “yes,” then ask, “Would you like to sign up today?”

When you get to no, the next step is to uncover why they said no. At this point, you’re not selling; you’re just trying to understand why the person you’re talking to is saying no. It could be they don't have the decision-making authority, they don't have the budget, they need to see more, or the product is missing something important. The point is, you don’t know, and your goal here is to get to the next step in their process. And you don’t know what that is unless you ask.

Interested in learning more? Dharmesh Shah, co-founder and CTO of Hubspot and creator of the community OnStartups, authored a post with 10 Ideas For Those Critical Early Startup Sales that is well worth reading.

As a founder, you’re the most passionate person about your business and therefore the most qualified to get out and sell. You don't have to be “salesy” to sell; you just need to get out and start conversations.

-Rich

January 22, 2016

Using Cyberduck to Access SoftLayer Object Storage

SoftLayer object storage provides a low cost option to store files in the cloud. There are three primary methods for managing files in SoftLayer object storage: via a web browser, using the object storage API, or using a third-party application. Here, we’ll focus on the third-party application method, demonstrating how to configure Cyberduck to perform file uploads and downloads. Cyberduck is a free and open source (GPL) software package that can be used to connect to FTP, SFTP, WebDAV, S3, or any OpenStack Swift-based object storage such as SoftLayer object storage.

Download and Install Cyberduck

You can download Cyberduck here, with clients for both Windows and Mac. After the installation is complete, download the profile for SoftLayer object storage here. Choose any of the download links under the Connecting section; preconfigured locations won’t matter as the settings will be modified later.

Once the profile has been downloaded, it needs to be modified to allow the hostname to be changed. Open the downloaded file (e.g. Softlayer (Amsterdam).cyberduckprofile) in a text editor. Locate the Hostname Configurable key (<key>Hostname Configurable</key>), and change the XML tag following that from <false/> to <true/>. Once this change has been made, there are two options to load the configuration file: Move the file to the profiles directory where Cyberduck is installed (on Windows this will be C:\Program Files (x86)\Cyberduck\profiles by default), or double-click on the profile, and Cyberduck will add the profile.

Configure Cyberduck to Work with SoftLayer

Now that Cyberduck has been installed, it needs to be configured to connect to object storage in SoftLayer. You can do this by creating a bookmark in Cyberduck. With Cyberduck open, click on Bookmark in the main menu bar, then New Bookmark in the dropdown menu.

In the dropdown box at the top of the Bookmark window, select SoftLayer Object Storage (Name of Location).

In the dropdown box at the top of the Bookmark window, select SoftLayer Object Storage (Name of Location). Depending on the profile that was downloaded, the location may be different. When the SoftLayer profile has been selected, the configurable options for that profile will be displayed. Enter a nickname that will identify the object storage location.

Next, depending on which data center will store the objects, the server option in Cyberduck may need to be changed. To find out which server should be specified, open a web browser and log into the SoftLayer portal. Once in the portal click on Storage then Object Storage. Select the object storage account that will be used for this connection.

If no accounts exist, a new object storage account can be ordered by using the Order Object
Storage link located in the upper right-hand corner. After selecting the account, select the data center where the object storage will reside.

When the Object Storage page loads, there will be a View Credentials link under the object storage container dropdown box in the upper left section of the screen.

Clicking on that link will bring up a dialog box that contains the information necessary for creating a connection in Cyberduck. Because SoftLayer has both public and private networks, there are two authentication endpoints available. The setup for each endpoint is the same, but a VPN connection to the SoftLayer private network is necessary in order to use the private endpoint.

Here, we will be using the public endpoints. Select the server address for the public endpoint (see the blue highlighted text) and enter it into the server text box in Cyberduck.

Next, select the username. It will be in the format:

object_storage_account_name:softlayer_user_name.

Then enter it into the Username text box. (Make note of the API Key, it will be used later.)

Once those options have been set (Nickname, Server, and Username), close the new bookmark window. In the main Cyberduck window, you should see the newly created bookmark listed. Double-click on it to connect to the SoftLayer object storage.

At this point, Cyberduck will prompt for the API key. Use the API key noted above and Cyberduck will connect to SoftLayer object storage. Uploading files can be accomplished by selecting the files and dragging them to the Cyberduck window. Downloading can be accomplished by selecting a file in Cyberduck and dragging it to the local folder where it will be downloaded.

-Bryan Bush

January 8, 2016

A guide to Direct Link connectivity

So you’ve got your infrastructure running on SoftLayer, but you find yourself wishing for a more direct way to connect your on-premises or co-located infrastructure to your SoftLayer cloud infrastructure—with higher bandwidth and lower latency. And you also think the Internet just isn’t good enough when we’re talking VPN tunnels and private networking connectivity. Does that sound like you?

What are my options?

SoftLayer offers three Direct Link products that are specifically for customers looking for the most efficient connection to their SoftLayer private network. A Direct Link enables you to connect to the SoftLayer private network backbone with low latency speeds—up to 10Gbps using fiber cross-connect patches directly into the SoftLayer private network. A Direct Link is used to connect to a SoftLayer private network within the same geographical location of the physical cross-connect. (An add-on is available that enables you to connect to any of your SoftLayer private networks on a global scale.)

Direct Link Network Service Provider


The Direct Link NSP option allows you to create a cross-connect using single-mode fiber from one of our PoP locations onto the SoftLayer private backbone. You’ll have a Network Service Provider of your own preference that provides you with connectivity from your on-prem location to the SoftLayer PoP. This could be an “in-facility” cross-connect to your own equipment, MPLS, Metro WAN, or Fiber provider. The Direct Link NSP is the top-tier connectivity option we offer pertaining to private networking connectivity onto the SoftLayer private backbone.

Direct Link Cloud Exchange Provider


A cloud exchange provider is a carrier/network provider that is already connected to SoftLayer using multi-tenant, high capacity links. This allows you to purchase a virtual circuit at this provider and a Direct Link cloud exchange link at SoftLayer at reduced costs, because the physical connectivity from SoftLayer to the cloud exchange provider is already in place and shared amongst other customers.

Direct Link Colocation Provider


If your gear is co-located in a cabinet purchased via SoftLayer that’s in the same facility near or adjacent to a SoftLayer data center or POD, this option would work for you. Similar to the NSP option, this is a single-mode fiber but there’s no need to connect to a SoftLayer PoP location first—you can connect directly from your cabinet to the relevant SoftLayer data center.

How do you communicate over a Direct Link?

The SoftLayer Direct Link service is a routed Layer 3 service. Routing options are: routing using a SoftLayer assigned subnet, NAT, GRE or IPsec tunnels, VRF, and BGP.

Routing
We directly bind the 172.x.x.x IP block to your remote hosts that need to communicate with your SoftLayer infrastructure. You can either renumber your existing hosts on the remote networks or bind these as secondary IPs and setup appropriate static routes on the host. You can then use the 172.x.x.x IP space to communicate with the 10.x.x.x IP's of your SoftLayer hosts as necessary. Routing via BGP is optional.

NAT
With NAT, SoftLayer will assign you a block of IPs from the 172.16.0.0/12 IP block to NAT into a device from your remote network to prevent IP conflicts with the SoftLayer 10.x.x.x IP range(s) assigned.

GRE / IPsec Tunneling
You can create a GRE or IPSEC tunnel between the remote network and your infrastructure here at SoftLayer. This allows you to use whatever IP space you want on the SoftLayer side and route back across the tunnel to the remote network. With that being said, this is a configuration that will have to be managed and supported by you, independent of SoftLayer. Furthermore, this configuration could break connectivity to the SoftLayer services network if you use a 10.x.x.x block that SoftLayer has in use for services. This solution will also require that each host needing connectivity to the SoftLayer services network and the remote network have two IPs assigned (one from the SL 10.x.x.x block, and one from the remote network block) and static routes setup on the host to ensure traffic is routed appropriately. You will not be able to assign whatever IP space you want directly on the SoftLayer hosts (BYOIP) and have it routable on the SoftLayer network inherently. The only way to do this is as outlined above and is not supported by SoftLayer.

VRF
You can opt-in to utilizing a VRF (Virtual Routing and Forwarding) instance. This allows the customer to either utilize their own remote IP addresses or overlap with a large majority of the SoftLayer infrastructure; however, you must be aware that if you utilize the 10.x.x.x network you still cannot overlap with your hosts within SoftLayer nor within the SoftLayer services network (10.0.0.0/14 and 10.200.0.0/14). You will not be able to set any of the following for your remote prefixes: 10.0.0.0/14, 10.200.0.0/14, 10.198.0.0/15, 169.254.0.0/16, 224.0.0.0/4, and any IP ranges assigned to your VLANs on the SoftLayer platform. When choosing the VRF option, the ability to use SoftLayer VPN services for management of your servers will no longer be possible. Routing via BGP is optional.

Example:

FAQ

Will I need to provide my own cross-connect?
Yes, you will need to order your own cross-connect at your data center of choice—to be connected to the SoftLayer switch port described in the LOA (Letter of Authorization) provided.

What kind of cross-connects are supported?
We strictly use Single Mode Fiber (SMF). We do not accept MMF or Copper.

What is the default size of the remote 172.16.*.* subnet assigned?
Unless otherwise requested, Direct Link customers will be assigned a /24 (256 IPs) subnet.

Which IP block has been reserved for SoftLayer servers on the backend?
We've allocated the entire 10.0.0.0/8 block for use on the SL private network. Specifically, 10.0.0.0/14 has been ear-marked for services. Here’s the full list of service subnets: http://knowledgelayer.softlayer.com/faqs/196#154

Which IP block has been reserved for point-to-point SoftLayer XCR to customer router?
10.254.0.0/16 range. We normally allocate either a /30 or /31 subnet for the point-to-point connection (between our XCR and their equipment on the other end of the Direct Link).

Does Direct Link support jumbo frames?
Yes, just like the private SoftLayer network Direct Link can support up to MTU (Maximum Transmission Unit) 9000-size jumbo frames.

Pricing and locations

A list of available locations and pricing can be found at www.softlayer.com/direct-link.

-Mathijs Dubbe

December 30, 2015

Using Ansible on SoftLayer to Streamline Deployments

Many companies today are leveraging new tools to automate deployments and handle configuration management. Ansible is a great tool that offers flexibility when creating and managing your environments.

SoftLayer has components built within the Ansible codebase, which means continued support for new features as the Ansible project expands. You can conveniently pull your SoftLayer inventory and work with your chosen virtual servers using the Core Ansible Library along with the SoftLayer Inventory Module. Within your inventory list, your virtual servers are grouped by various traits, such as “all virtual servers with 32GB of RAM,” or “all virtual servers with a domain name of softlayer.com.” The inventory list provides different categorized groups that can be expanded upon. With the latest updates to the SoftLayer Inventory Module, you can now get a list of virtual servers by tags, as well as work with private virtual servers. You can then use each of the categories provided by the inventory list within your playbooks.

So, how can you work with the new categories (such as tags) if you don’t yet have any inventory or a deployed infrastructure within SoftLayer? You can use the new SoftLayer module that’s been added to the Ansible Extras Project. This module provides the ability to provision virtual servers within a playbook. All you have to do is supply the build detail information for your virtual server(s) within your playbook and go.

Let’s look at an example playbook. You’ll want to specify a hostname along with a domain name when defining the parameters for your virtual server(s). The hostname can have an incremental number appended at the end of it if you’re provisioning more than one virtual server; e.g., Hostname-1, Hostname-2, and so on. You just need to specify a value True for the parameter increment. Incremental naming offers the ability to uniquely name virtual servers within your playbook, but is also optional in the case where you want similar hostnames. Notice that you can also specify tags for your virtual servers, which is handy when working with your inventory in future playbooks.

Following is a sample playbook for building Ubuntu virtual servers on SoftLayer:

---
- name: Build Tomcat Servers
  hosts: localhost
  gather_facts: False
  tasks:
  - name: Build Servers
    local_action:
      module: softlayer
      quantity: 2
      increment: True
      hostname: www
      domain: test.com
      datacenter: mex01
      tag: tomcat-test
      hourly: True
      private: False
      dedicated: False
      local_disk: True
      cpus: 1
      memory: 1024
      disks: [25]
      os_code: UBUNTU_LATEST
      ssh_keys: [12345]

By default, your playbook will pause until each of your virtual servers completes provisioning before moving onto the next plays within your playbook. You can specify the wait parameter to False if you choose not to wait for the virtual servers to complete provisioning. The wait parameter is helpful for when you want to build many virtual servers, but some have different characteristics such as RAM or tag naming. You can also set the maximum time you want to wait on the virtual servers by setting the wait_timeout parameter, which takes an integer defining the number of seconds to wait.

Once you’re finished using your virtual servers, canceling them is as easy as creating them. Just specify a new playbook step with a state of absent, as well as specifying the virtual server ID or tags to know which virtual servers to cancel.

The following example will cancel all virtual servers on the account with a tag of tomcat-test:

- name: Cancel Servers
  hosts: localhost
  gather_facts: False
  tasks:
  - name: Cancel by tag
    local_action:
      module: softlayer
      state: absent
      tag: tomcat-test

New features are being developed with the core inventory library to bring additional functionality to Ansible on SoftLayer. These new developments can be found by following the Core Ansible Project hosted on Github. You can also follow the Ansible Extras Project for updates to the SoftLayer module.

As of this blog post, the new SoftLayer module is still pending inclusion into the Ansible Extras Project. Click here to check out the current pull request for the latest code and samples.

-Matt

November 4, 2015

Shared, scalable, and resilient storage without SAN

Storage area networks (SAN) are used most often in the enterprise world. In many enterprises, you will see racks filled with these large storage arrays. They are mainly used to provide a centralized storage platform with limited scalability. They require special training to operate, are expensive to purchase, support, or expand, and if those devices fail, there is big trouble.

Some people might say SAN devices are a necessary evil. But are they really necessary? Aren’t there alternatives?

Most startups nowadays are running their services on commodity hardware, with smart software to distribute their content across server farms globally. Current, well established, and successful companies that run websites or apps like Whatsapp, Facebook, or LinkedIn continue to operate pretty much the same way they started. They need the ability to scale and perform at unpredictable rates all around the world, so they use commodity hardware combined with smart software. These types of companies need the features that SAN storage offers them—but with more scalable, global resiliency, and without being centralized or having to buy expensive hardware. But how do they provide server access to the same data, and how do they avoid data loss?

The answer is actually quite simple, although its technology is quite sophisticated: distributed storage.

In a world where virtualization has become a standard for most companies, where even applications and networking are being virtualized, virtualization giant VMware answers this question with Virtual SAN. It effectively eliminates the need for SAN hardware in a VMware environment (and it will also be available for purchase from SoftLayer before the end of the year). Other similar distributed products are GlusterFS (also offered in our QuantaStor solution), Ceph, Microsoft Windows DFS, Hadoop HDFS, document-oriented databases like MongoDB, and many more.

Many solutions, however, vary in maturity. Object storage is a great example of a new type of storage that has come to market, which doesn’t require SAN devices. With SoftLayer, you can and may run them all.

When you have bare metal servers set up as hypervisors or application servers, it’s likely you have a lot of drive bays within those servers, mostly unused. Stuffing them with hard drives and allowing the software to distribute your data across multiple servers in multiple locations with two or three replicas will result in a big, safe, fast, and distributed storage platform. For such a platform, scaling it would be just adding more bare metal servers with even more hard drives and letting the software handle the rest.

Nowadays we are seeing more and more hardware solutions like SAN—or even networking—being replaced with smarter software on simpler and more affordable hardware. At SoftLayer, we offer month-to-month and hourly bare metal servers with up to 36 drive bays, potentially providing a lot of room for storage. With 10Gbps global connectivity options, we offer fast, low latency networking for syncing between servers and delivering data to the customer.

-Mathijs

October 29, 2015

How to measure the performance of striped block storage volumes

To piggyback on the performance specifications of block and file storage offerings, SoftLayer provides a high degree of volume size and performance combinations for your storage needs. But what if your storage performance or size requirements are much more specific than what is currently offered?

In this post, I’ll show you to configure and validate a sample RAID 0 configuration with:

  1. The use of LVM on CentOS to create a RAID 0 array with 3 volumes
  2. The use of FIO to apply IO load to the array
  3. The ability to measure throughput of the array

Without going into potential drawbacks of RAID 0, we should be able to observe the benefits of up to three times the throughput and size of any single volume. For example, if we needed a volume with 60GB and 240IOPS, we should be able to stripe three 20GB volumes each at 4 IOPS/GB. You can also extrapolate the benefits from this example to fit a range of performance and reliability requirements.

To start, we will provision 3x 20GB Endurance volumes at 4 IOPS/GB and make it accessible to our CentOS VM but stop short of creating a file system; e.g., you should stop once you are able to list three volumes with:

# fdisk -l | grep /dev/mapper
Disk /dev/mapper/3600a09803830344f785d46426c37364a: 21.5 GB, 21474836480 bytes, 41943040 sectors
Disk /dev/mapper/3600a09803830344f785d46426c373648: 21.5 GB, 21474836480 bytes, 41943040 sectors
Disk /dev/mapper/3600a09803830344f785d46426c373649: 21.5 GB, 21474836480 bytes, 41943040 sectors

Then proceed to create the three-stripe volume with the following commands:

# pvcreate /dev/mapper/3600a09803830344f785d46426c37364a /dev/mapper/3600a09803830344f785d46426c373648 /dev/mapper/3600a09803830344f785d46426c373649
 
# vgcreate new_vol_group /dev/mapper/3600a09803830344f785d46426c37364a /dev/mapper/3600a09803830344f785d46426c373648 /dev/mapper/3600a09803830344f785d46426c373649
 
# lvcreate -i3 -I16 -l100%FREE -nstriped_logical_volume new_vol_group

This creates a logical volume with three stripes (-i) and stripe size (-I) of 16KB with a volume size (-l) of 60GB or 100 percent of the free space.

You can now create the file system on the new logical volume, create a mount point, and mount the volume:

# mkfs.ext3 /dev/new_vol_group/striped_logical_volume
# mkdir /mnt
# mount /dev/mapper/new_vol_group-striped_logical_volume /mnt

Now download, build, and run FIO:

# yum install -y gcc libaio-devel
# cd /tmp
# wget http://freecode.com/urls/3aa21b8c106cab742bf1f20d60629e3f
# tar -xvf 3aa21b8c106cab742bf1f20d60629e3f
# cd fio-2.1.10/
# make
# make install
# cd /mnt
# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=16k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=50

This will execute the benchmark test at 16KB blocks (--bs), random sequence (--readwrite=randrw), at 50 percent read, and 50 percent write (rwmixread=50). This will run 64 threads (--iodepth=64) until the test file of 1GB (--size=1G) is size is completed.

Here is a snippet of output once completed:

read : io=51712KB, bw=1955.8KB/s, iops=122, runt= 26441msec
write: io=50688KB, bw=1917.3KB/s, iops=119, runt= 26441msec

This shows that throughput is rated at 122r + 119w = ~240 IOPS. To validate that it is what we expect, we provisioned 3x 20 GB x 4 IOPS/GB = 3 x 80 IOPS = 240 IOPS.

Here is a table showing how results would differ if we tuned the load with varying block sizes (--bs) :

As you can see from the results, you may not observe the expected 3x throughput (IOPS) in every case, so please be mindful of your logical volume configuration (stripe size) versus your load profile (--bs). Please refer to our FAQ for further details on other possible limits.

-Nam

October 21, 2015

The Dumbest Thing I’ve Ever Said

Last week, I attended the LAUNCH Scale conference and had the pleasure of attending the VIP dinner the night before the event began. We hosted the top 10 startups from the IBM SmartCamp worldwide competition for the dinner and throughout the events. Famed Internet entrepreneur Jason Calacanis joined us for the dinner and gave a quick pep talk to the teams. He mentioned that people come up to him and lament that they wished they’d gotten into the "Internet thing" earlier—and that he's been hearing this since 1999. His story reminded me of a similar personal experience.

In the fall semester of 1995, I was a junior at St. Bonaventure University, working in the computer lab. One day after helping a cute girl I had a crush on, she said to me, “You’re so good with computers, why aren’t you a computer science major?” Swelling with pride, I tried to sound impressive and intelligent as I definitively stated, “Windows 95 just came out, and pretty much everything that can be built with computers has been built.”

Yep. Windows 95. The pinnacle of software achievement.

It is easily the dumbest thing I've ever said—and perhaps up there as one of the dumbest things anyone has said. Ever.

But I hear corollaries to this fairly often, both in and outside the startup world. "There's no room for innovation there," or "You can't make money there," or "That sector is awful, don't bother." I'm guilty of a few of those statements myself—yet businesses find a way. We live in an age of unprecedented innovation. Just because one person didn't have the key to unlock it doesn't mean the door is closed.

Catch yourself before you fall into this loop of thinking. It might mean being the "Uber of X" or starting a business that's far ahead of its time. Think it's crazy to say everything that can be built has been built? I think it's just as crazy to say, "It's too late to get into ___ market."

For example, when markets grow in size, they also grow in complexity. The first mover in the space defines the market, catches the innovators and early adopters, and builds the bridge over the chasm to the early and late majority. (For more on this, read Crossing the Chasm by Geoffrey Moore.) When a market begins to service the majority, the needs of many are not being met, which leaves room for new entrants to build a business that addresses the segments dissatisfied with the current offerings or needing specialized versions.

The LAUNCH Scale event showcased dozens of startups and the innovation out there in the world always amazes me. I'd recommend it to any startup that has built something great, and now needs to scale. Still haven't built something yourself? Think you missed the opportunity to build and create? In 1995, I didn't think about how things would change in five, 10, even 20 years. Now it's 2015 and the startup world has been growing faster than any sector in history.

Think everything that could be built has been built? Think again. Want to build something? Do it. Build something. What are you waiting for? Go make a difference in the world.

-Rich

September 2, 2015

Backup and Restore in a Cloud and DevOps World

Virtualization has brought many improvements to the compute infrastructure, including snapshots and live migration1. When an infrastructure moves to the cloud, these options often become a client’s primary backup strategy. While snapshots and live migration are also part of a successful strategy, backing up on the cloud may need additional tools.

First, a basic question: Why do we take backups? They’re taken to recover from

  • The loss of an entire machine
  • Partially corrupted files
  • A complete data loss (either through hardware or human error)

While losing an entire machine is frightening, corrupted files or data loss are the more common reasons for data backups.

Snapshots are useful when the snapshot and restore occur in close proximity to each other, e.g., when you’re migrating middleware or an operating system and want to fall back quickly if something goes wrong. If you need to restore after extensive changes (hardware or data), a snapshot isn’t an adequate resource. The restore may require restoring to a new machine, selecting files to be restored, and moving data back to the original machine.

So if a snapshot isn’t the silver bullet for backing up in the cloud, what are the effective backup alternatives? The solution needs to handle a full system loss, partial data loss, or corruption, and ideally work for both virtualized and non-virtualized environments.

What to back up

There are three types of files that you’ll want to consider when backing up an active machine’s disks:

  • Binary files: Changed by operating system and middleware updates; can be easily stored and recovered.
  • Configuration files: Defined by how the binary files are connected, configured, and what data is accessible to them.
  • Data files: Generated by users and unrecoverable if not backed up. Data files are the most precious part of the disk content and losing them may result in a financial impact on the client’s business.

Keep in mind when determining your backup strategy that each file type has a different change rate—data files change faster than configuration files, which are more fluid than binary files. So, what are your options for backing up and restoring each type of file?

Binary files
In the case of a system failure, DevOps advocates (see Phoenix Servers from Martin Fowler) propose getting a new machine, which all cloud providers can automatically provision, including middleware. Automated provisioning processes are available for both bare metal and virtual machines.

Note that most Open Source products only require an Internet connection and a single command line for installation, while commercial products can be provisioned through automation.

Configuration files
Cloud-centric operations have a distinct advantage over traditional operations when it comes to backing up configuration files. With traditional operations, each element is configured manually, which has several drawbacks such as being time-consuming and error-prone. Cloud-centric operations, or DevOps, treat each configuration as code, which allows an environment to be built from a source configuration via automated tools and procedures. Tools such as Chef, Puppet, Ansible, and SaltStack show their power with central configuration repositories that are used to drive the composition of an environment. A central repository works well with another component of automated provisioning—changing the IP address and hostname.

You have limited control of how the cloud will allocate resources, so you need an automated method to collect the information and apply it to all the machines being provisioned.

In a cloud context, it’s suboptimal to manage machines individually; instead, the machines have to be seen as part of a cluster of servers, managed via automation. Cluster automation is one the core tenants of solutions like CoreOS’ Fleet and Apache Mesos. Resources are allocated and managed as a single entity via API, configuration repositories, and automation.

You can attain automation in small steps. Start by choosing an automation tool and begin converting your existing environment one file at a time. Soon, your entire configuration is centrally available and recovering a machine or deploying a full environment is possible with a single automated process.

In addition to being able to quickly provision new machines with your binary and configuration files, you are also able to create parallel environments, such as disaster recovery, test and development, and quality assurance. Using the same provisioning process for all of your environments assures consistent environments and early detection of potential production problems. Packages, binaries, and configuration files can be treated as data and stored in something similar to object stores, which are available in some form with all cloud solutions.

Data files
The final files to be backed up and restored are the data files. These files are the most important part of a backup and restore and the hardest ones to replace. Part of the challenge is the volume of data as well as access to it. Data files are relatively easy to back up; the exception being files that are in transition, e.g., files being uploaded. Data file backups can be done with several tools, including synchronization tools or a full file backup solution. Another option is object stores, which is the natural repository for relatively static files, and allows for a pay–as-you-go model.

Database content is a bit harder to back up. Even with instant snapshots on storage, backing up databases can be challenging. A snapshot at the storage level is an option, but it doesn’t allow for a partial database restore. Also, a snapshot can capture inflight transactions that can cause issues during a restore; which is why most database systems provide a mechanism for online backups. The online backups should be leveraged in combination with tools for file backups.

Something to remember about databases: many solutions end up accumulating data even after the data is no longer used by users. The data within an active database includes data currently being used and historical data. Having current and historical data allows for data analytics on the same database, it also increases the size of the database, making database-related operations harder. It may make sense to archive older data in either other databases or flat files, which makes the database volumes manageable.

Summary

To recap, because cloud provides rapid deployment of your operating system and convenient places to store data (such as object stores), it’s easy to factor cloud into your backup and recovery strategy. By leveraging the containerization approach, you should split the content of your machines—binary, configuration, and data. Focus on automating the deployment of binaries and configuration; it allows easier delivery of an environment, including quality assurance, test, and disaster recovery. Finally, use traditional backup tools for backing up data files. These tools make it possible to rapidly and repeatedly recover complete environments while controlling the amount of backed up data that has to be managed.

-Thomas

1 Snapshots are not available on bare metal servers that have no virtualization capability.

Subscribe to tips-and-tricks