Technology Posts

March 24, 2016

future.ready(): 7 Things to Check Off Your Big Data Development List

Frank Ketelaars, Big Data Technical Leader for Europe at IBM, offers a checklist that every developer should have pinned to their board when starting a big data project. Editor’s Note: Does your brain switch off when you hear industryspeak words like “innovation,” “transformation,” “leading edge,” “disruptive,” and “paradigm shift”? Go on, go ahead and admit it. Ours do, too. That’s why we’re launching the future.ready() series—consisting of blogs, podcasts, webinars, and Twitter chats— with content created by developers, for developers. Nothing fluffy, nothing buzzy. With the future.ready() series, we aim to equip you with tools and knowledge that you can use—not just talk and tweet about.

For the first edition, I’ve invited Frank Ketelaars, an expert in high volume data space, to walk us through seven things to check off when starting a big data development project.

-Michalina Kiera, SoftLayer EMEA senior marketing manager


This year, big data moves from a water cooler discussion to the to-do list. Gartner estimates that more than 75 percent of companies are investing or planning to invest in big data in the next two years.

I have worked on multiple high volume projects in industries that include banking, telecommunications, manufacturing, life sciences, and government, and in roles including architect, big data developer, and streaming analytics specialist. Based on my experience, here’s a checklist I put together that should give developers a good start. Did I miss anything? Join me on the Twitter chat or webinar to share your experience, ask questions, and discuss further. (See details below.)     

1. Team up with a person who has a budget and a problem you can solve.

For a successful big data project, you need to solve a business problem that’s keeping somebody awake at night. If there isn’t a business problem and a business owner—ideally one with a budget— your project won’t get implemented. Experimentation is important when learning any new technology. But before you invest a lot of time in your big data platform, find your sponsor. To do so, you’ll need to talk to everyone, including IT, business users, and management. Remember that the technical advantages of analytics at scale might not immediately translate into business value.

2. Get your systems ready to collect the data.

With additional data sources, such as devices, vehicles, and sensors connected to networks and generating data, the variety of information and transportation mechanisms has grown dramatically, posing new challenges for the collection and interpretation of data.

Big data often comes from sources outside the business. External data comes at you in a variety of formats (including XML, JSON, and binary), and using a variety of different APIs. In 2016, you might think that everyone is on REST and JSON, but think again: SOAP still exists! The variety of the data is the primary technical driver behind big data investments, according to a survey of 402 business and IT professionals by management consultancy NewVantage Partners[SM1] . From one day to the next, the API might change or a source might become unavailable.

Maybe one day we’ll see more standardization, but it won’t happen any time soon. For now, developers must plan to spend time checking for changes in APIs and data formats, and be ready to respond quickly to avoid service interruptions. And to expect the unexpected.

3. Make sure you have the right to use that data.

Governance is a business challenge, but it’s going to touch developers more than ever before—from the very start of the project. Much of the data they will be handling is unstructured, such as text records from a call center. That makes it hard to work out what’s confidential, what needs to be masked, and what can be shared freely with external developers. Data will need to be structured before it can be analyzed, but part of that process includes working out where the sensitive data is, and putting measures in place to ensure it is adequately protected throughout its lifecycle.

Developers need to work closely with the business to ensure that they can keep data safe, and provide end users with a guarantee that the right data is being analyzed and that its provenance can be trusted. Part of that process will be about finding somebody who will take ownership of the data and attest to its quality.

4. Pick the right tools and languages.

With no real standards in place yet, there are many different languages and tools used to collect, store, transport, and analyze big data. Languages include R, Python, Julia, Scala, and Go (plus the Java and C++ you might need to work with your existing systems). Technologies include Apache Pig, Hadoop, and Spark, which provide massive parallel processing on top of a file system without Hadoop. There’s a list of 10 popular big data tools here, another 12 here, and a round-up of 45 big data tools here. 451 Research has created a map that classifies data platforms according to the database type, implementation model, and technology. It’s a great resource, but its 18-color key shows how complex the landscape has become.

Not all of these tools and technologies will be right for you, but they hint at one way the developer’s core competency must change. Big data will require developers to be polyglots, conversant in perhaps five languages, who specialize in learning new tools and languages fast—not deep experts in one or two languages.

Nota bene: MapReduce and Pig are among the top highest paid technology skills in the US, and other big data skills are likely to be highly sought-after as the demand for them also grows. Scala is a relatively new functional programming language for data preparation and analysis, and I predict it will be in high demand in the near future.

5. Forget “off-the-shelf.” Experiment and set up a big data solution that fits your needs. 

You can think of big data analytics tools like Hadoop as a car. You want to go to the showroom, pay, get in, and drive away. Instead, you’re given the wheels, doors, windows, chassis, engine, steering wheel, and a big bag of nuts and bolts. It’s your job to assemble it.

As InfoWorld notes, DevOps tools can help to create manageable Hadoop solutions. But you’re still faced with a lot of pieces to combine, diverse workloads, and scheduling challenges.

When experimenting with concepts and technologies to solve a certain business problem, also think about successful deployment in the organization. The project does not stop after the proof.

6. Secure resources for changes and updates.

Apache Hadoop and Apache Spark are still evolving rapidly and it is inevitable that the behavior of components will change over time and some may get deprecated shortly after initial release. Implementing new releases will be painful, and developers will need to have an overview of the big data infrastructure to ensure that as components change, their big data projects continue to perform as expected.

The developer team must plan time for updates and deprecated features, and a coordinated approach will be essential for keeping on top of the change.

7. Use infrastructure that’s ready for CPU and I/O intensive workloads.

My preferred definition of big data (and there are many – Forbes found 12) is this: "Big data is when you can no longer afford to bring the data to the processing, and you have to do the processing where the data is."

In traditional database and analytics applications, you get the data, load it onto your reporting server, process it, and post the results to the database.

With big data, you have terabytes of data, which might reside in different places—and which might not even be yours to move. Getting it to the processor is impractical. Big data technologies like Hadoop are based on the concept of data locality—doing the processing where the data resides.

You can run Hadoop in a virtualized environment. Virtual servers don’t have local data, though, so the time taken to transport data between the SAN or other storage device and the server hurts the application’s performance. Noisy neighbors, unpredictable server speeds and contested network connections can have a significant impact on performance in a virtualized environment. As a result, it’s difficult to offer service level agreements (SLAs) to end users, which makes it hard for them to depend on your big data implementations.

The answer is to use bare metal servers on demand, which enable you to predict and guarantee the level of performance your application can achieve, so you can offer an SLA with confidence. Clusters can be set up quickly, so you can accelerate your project really fast. Because performance is predictable and consistent, it’s possible to offer SLAs to business owners that will encourage them to invest in the big data project and rely on it for making business decisions.

How can I learn more?

Join me in the Twitter chat and webinar (details below) to discuss how you’re addressing big data or have your questions answered by me and my guests.  

Add our Twitter chat to your calendar. It happens Thursday, March 31 at 1 p.m. CET. Use the hashtag #SLdevchat to share your views or post your questions to me.

Register for the webinar on Wednesday, Apr 20, at 5 p.m. to 6 p.m. CET.


About the author

Frank Ketelaars has been Big Data Technical Leader in Europe for IBM since August 2013. As an architect, big data developer, and streaming analytics specialist, he has worked on multiple high volume projects in banking, telecommunications, manufacturing, life sciences and government. He is a specialist in Hadoop and real-time analytical processing.


March 4, 2016

Adventures with Bluemix

Keeping up with the rapid evolution of web programming is frighteningly difficult—especially when you have a day job. To ensure I don’t get left behind, I like to build a small project every year or so with a collection of the most buzzworthy technologies I can find. Nothing particularly impressive, of course, but just a collection of buttons that do things. This year I am trying to get a good grasp on “as a Service,” which seems to be everywhere these days. Hopefully this adventure will prove educational.

Why use services when I can do it myself?

The main idea behind “as a Service” is that somewhere out there in the cloud, someone has figured out how to do a particular task really well. This someone is willing to provide you access to that for a small service fee—thereby letting you, the developer, focus as much time as possible on your code and not so much time worrying about optimal configurations of things that you need to work efficiently.

SoftLayer is an Infrastructure as a Service (IaaS) provider, which is what will be the home for my little application—due in large part because I already have a ton of experience running servers myself.

I’m a big fan of Python, so I’m going to start programing with the Pyramids framework as the base for my new application. Like the “as a Service” offerings, programming frameworks and libraries exist to help the developer focus on their code and leverage the expertise of others for the auxiliary components.

To make everything pretty, I am going to use Bootstrap.js, which is apparently the de facto front-end library these days.

For everything else I want to use, there will be an attached Bluemix service. For the uninitiated, Bluemix is a pretty awesome collection of tools for developing and deploying code. At its core, Bluemix uses Cloud Foundry to provision cloud resources and deploy code. For now, I’m going to deploy my own code, but what I’m really interested in are the add-on services that I can just drop into my application and get going. The first service I want to try out is going to be Cloudant nosql, which is a managed couchDB instance with a few added features like a pretty neat dashboard.

Welcome to Bluemix

Combining Bluemix services with SoftLayer servers

One of the great things about services in Bluemix is that they can be provisioned in a standalone deployment—meaning Bluemix services can be used by any computer with an Internet connection and therefore, so can my SoftLayer servers. Since Bluemix services are deployed on SoftLayer hardware (in general, but there are some exceptions), the latency between SoftLayer servers and Bluemix services should be minimal, which is nice.

Creating a Cloudant service in Bluemix is as easy as hitting the Create button in the console. Creating a simple web application in Pyramid took a bit longer, but the quick tutorial helped me learn about all the cool things the Pyramid project can do. I also got to skip all the mess with SQLAlchemy, since I’m storing all the data in Cloudant. All that’s required is a sane ID system (I am using uuid) and some json. No need to get bogged down with a rigid table structure since Cloudant is a document store. If I want to change the data format, I just need to upload a new copy of the data, and a new revision of that document will be automatically created.

After cobbling together a basic application that can publish and edit content, all I had to do to make everything look like it was designed intentionally was to add a few bootstrap classes to my templates. And then I had a ready to use website!


Although making a web application is still as intensive as it’s always been, at least using technology in an “as a Service” fashion helps cut down on all the tertiary technologies you need to become an expert on to get anything to work. Even though the application I created here was pretty simple, I hope to expand it to include some of the more interesting Bluemix services to see what kind of Frankenstein application I can manage to produce. There are currently 100 Bluemix services, so I think the hardest part is going to be figuring out which one to use next.


February 3, 2016

Use TShark to see what traffic is passing through your gateway

Many of SoftLayer’s solutions make excellent use of the Brocade vRouter (Vyatta) dedicated security appliance. It’s a true network gateway, router, and firewall for your servers in a SoftLayer data center. It’s also an invaluable trouble-shooting tool should you have a connectivity issue or just want to take a gander at your network traffic. Built into vRouter’s command line and available to you, is a full-fledged terminal-based Wireshark command line implementation—TShark.

TShark is fully implemented in vRouter. If you’re already familiar with using TShark, you know you can call it from the terminal in either configuration or operational mode.  You accomplish this by prefacing a command with sudo; making the full command sudo tshark – flags.

tshark graphic

For those of us less versed in the intricacies of Wireshark and its command line cousin, here are a couple of useful examples to help you out.

One common flag I use in nearly every capture is –i (and as a side note, for those coming from a Microsoft Windows background, the flags are case sensitive). -i is a specific interface on which to capture traffic and immediately helps to cut down on the amount of information unrelated to the problem at hand. If you don’t set this flag, the capture will default to “the first non-loopback address;” or in the case of vRouter on SoftLayer, Bond0. Additionally, if you want to trace a packet and reply, you can set –i any to watch or capture traffic through all the interfaces on the device.

The second flag that I nearly always use to define a capture filter is –f, which defines a filter to match traffic against. The only traffic that matches this pattern will be captured. The filter uses the standard Wireshark syntax. Again, if you’re familiar with Wireshark, you can go nuts; but here are a few of the common filters I frequently use to help you get started:

  • host will match any traffic to or from the specified host. In this case, the venerable Google DNS servers. 
  • net works just like host, but for the entire network specified, in case you don’t know the exact host address you are looking for.
  • dst and src are useful if you want to drill down to a specific flow or want to look at just the incoming or outgoing traffic. These filters are usually paired with a host or net to match against.
  • port lets you specify a port to capture traffic, like host and net. Used by itself, port will match both source and destination port. In the case of well-known services, you can also define the port by the common name, i.e., dns.  

One final cool trick with the –f filter is the and and the negation not. They let you combine search terms and specifically exclude traffic in order to create a very finely tuned capture for your needs.

If you want to capture to a file to share with a team or to plug into more advanced analysis tools on another system, the –w flag is your friend. Without -w, the file will behave like a tcpdump and the output will appear in your terminal session. If you want to load the file into Wireshark or another packet analyzer tool you should make sure to add the –F flag to specify the file format. Here is an example:

Vyatta# sudo tshark –i Bond0 –w testcap.pcap –F pcap –f ‘src and not port 80’

The command will capture on Bond0 and output the capture to a .pcap file called testcap.pcap in the root directory of the file system. It will match only traffic on bond0 from that is not source or destination port 22. While that is a bit of a mouthful to explain, it does capture a very well defined stream! 

Here is one more example:

Vyatta#sudo tshark –I any –f ‘host and not ssh’

This command will capture traffic to the terminal that is to or from the specified IP ( that is not SSH. I frequently use this filter, or one a lot like it, when I am SSHed into a host and want to get a more general idea of what it is doing on the network. I don’t care about ssh because I know the cause of that traffic (me!), but I want to know anything else that’s going to or from the host.

This is all very much the tip of the iceberg; you can find a lot more information at the TShark main page. Hopefully these tips help out next time you want to see just what traffic is passing through your gateway.

- Jeff 


January 22, 2016

Using Cyberduck to Access SoftLayer Object Storage

SoftLayer object storage provides a low cost option to store files in the cloud. There are three primary methods for managing files in SoftLayer object storage: via a web browser, using the object storage API, or using a third-party application. Here, we’ll focus on the third-party application method, demonstrating how to configure Cyberduck to perform file uploads and downloads. Cyberduck is a free and open source (GPL) software package that can be used to connect to FTP, SFTP, WebDAV, S3, or any OpenStack Swift-based object storage such as SoftLayer object storage.

Download and Install Cyberduck

You can download Cyberduck here, with clients for both Windows and Mac. After the installation is complete, download the profile for SoftLayer object storage here. Choose any of the download links under the Connecting section; preconfigured locations won’t matter as the settings will be modified later.

Once the profile has been downloaded, it needs to be modified to allow the hostname to be changed. Open the downloaded file (e.g. Softlayer (Amsterdam).cyberduckprofile) in a text editor. Locate the Hostname Configurable key (<key>Hostname Configurable</key>), and change the XML tag following that from <false/> to <true/>. Once this change has been made, there are two options to load the configuration file: Move the file to the profiles directory where Cyberduck is installed (on Windows this will be C:\Program Files (x86)\Cyberduck\profiles by default), or double-click on the profile, and Cyberduck will add the profile.

Configure Cyberduck to Work with SoftLayer

Now that Cyberduck has been installed, it needs to be configured to connect to object storage in SoftLayer. You can do this by creating a bookmark in Cyberduck. With Cyberduck open, click on Bookmark in the main menu bar, then New Bookmark in the dropdown menu.

In the dropdown box at the top of the Bookmark window, select SoftLayer Object Storage (Name of Location).

In the dropdown box at the top of the Bookmark window, select SoftLayer Object Storage (Name of Location). Depending on the profile that was downloaded, the location may be different. When the SoftLayer profile has been selected, the configurable options for that profile will be displayed. Enter a nickname that will identify the object storage location.

Next, depending on which data center will store the objects, the server option in Cyberduck may need to be changed. To find out which server should be specified, open a web browser and log into the SoftLayer portal. Once in the portal click on Storage then Object Storage. Select the object storage account that will be used for this connection.

If no accounts exist, a new object storage account can be ordered by using the Order Object
Storage link located in the upper right-hand corner. After selecting the account, select the data center where the object storage will reside.

When the Object Storage page loads, there will be a View Credentials link under the object storage container dropdown box in the upper left section of the screen.

Clicking on that link will bring up a dialog box that contains the information necessary for creating a connection in Cyberduck. Because SoftLayer has both public and private networks, there are two authentication endpoints available. The setup for each endpoint is the same, but a VPN connection to the SoftLayer private network is necessary in order to use the private endpoint.

Here, we will be using the public endpoints. Select the server address for the public endpoint (see the blue highlighted text) and enter it into the server text box in Cyberduck.

Next, select the username. It will be in the format:


Then enter it into the Username text box. (Make note of the API Key, it will be used later.)

Once those options have been set (Nickname, Server, and Username), close the new bookmark window. In the main Cyberduck window, you should see the newly created bookmark listed. Double-click on it to connect to the SoftLayer object storage.

At this point, Cyberduck will prompt for the API key. Use the API key noted above and Cyberduck will connect to SoftLayer object storage. Uploading files can be accomplished by selecting the files and dragging them to the Cyberduck window. Downloading can be accomplished by selecting a file in Cyberduck and dragging it to the local folder where it will be downloaded.

-Bryan Bush

January 8, 2016

A guide to Direct Link connectivity

So you’ve got your infrastructure running on SoftLayer, but you find yourself wishing for a more direct way to connect your on-premises or co-located infrastructure to your SoftLayer cloud infrastructure—with higher bandwidth and lower latency. And you also think the Internet just isn’t good enough when we’re talking VPN tunnels and private networking connectivity. Does that sound like you?

What are my options?

SoftLayer offers three Direct Link products that are specifically for customers looking for the most efficient connection to their SoftLayer private network. A Direct Link enables you to connect to the SoftLayer private network backbone with low latency speeds—up to 10Gbps using fiber cross-connect patches directly into the SoftLayer private network. A Direct Link is used to connect to a SoftLayer private network within the same geographical location of the physical cross-connect. (An add-on is available that enables you to connect to any of your SoftLayer private networks on a global scale.)

Direct Link Network Service Provider

The Direct Link NSP option allows you to create a cross-connect using single-mode fiber from one of our PoP locations onto the SoftLayer private backbone. You’ll have a Network Service Provider of your own preference that provides you with connectivity from your on-prem location to the SoftLayer PoP. This could be an “in-facility” cross-connect to your own equipment, MPLS, Metro WAN, or Fiber provider. The Direct Link NSP is the top-tier connectivity option we offer pertaining to private networking connectivity onto the SoftLayer private backbone.

Direct Link Cloud Exchange Provider

A cloud exchange provider is a carrier/network provider that is already connected to SoftLayer using multi-tenant, high capacity links. This allows you to purchase a virtual circuit at this provider and a Direct Link cloud exchange link at SoftLayer at reduced costs, because the physical connectivity from SoftLayer to the cloud exchange provider is already in place and shared amongst other customers.

Direct Link Colocation Provider

If your gear is co-located in a cabinet purchased via SoftLayer that’s in the same facility near or adjacent to a SoftLayer data center or POD, this option would work for you. Similar to the NSP option, this is a single-mode fiber but there’s no need to connect to a SoftLayer PoP location first—you can connect directly from your cabinet to the relevant SoftLayer data center.

How do you communicate over a Direct Link?

The SoftLayer Direct Link service is a routed Layer 3 service. Routing options are: routing using a SoftLayer assigned subnet, NAT, GRE or IPsec tunnels, VRF, and BGP.

We directly bind the 172.x.x.x IP block to your remote hosts that need to communicate with your SoftLayer infrastructure. You can either renumber your existing hosts on the remote networks or bind these as secondary IPs and setup appropriate static routes on the host. You can then use the 172.x.x.x IP space to communicate with the 10.x.x.x IP's of your SoftLayer hosts as necessary. Routing via BGP is optional.

With NAT, SoftLayer will assign you a block of IPs from the IP block to NAT into a device from your remote network to prevent IP conflicts with the SoftLayer 10.x.x.x IP range(s) assigned.

GRE / IPsec Tunneling
You can create a GRE or IPSEC tunnel between the remote network and your infrastructure here at SoftLayer. This allows you to use whatever IP space you want on the SoftLayer side and route back across the tunnel to the remote network. With that being said, this is a configuration that will have to be managed and supported by you, independent of SoftLayer. Furthermore, this configuration could break connectivity to the SoftLayer services network if you use a 10.x.x.x block that SoftLayer has in use for services. This solution will also require that each host needing connectivity to the SoftLayer services network and the remote network have two IPs assigned (one from the SL 10.x.x.x block, and one from the remote network block) and static routes setup on the host to ensure traffic is routed appropriately. You will not be able to assign whatever IP space you want directly on the SoftLayer hosts (BYOIP) and have it routable on the SoftLayer network inherently. The only way to do this is as outlined above and is not supported by SoftLayer.

You can opt-in to utilizing a VRF (Virtual Routing and Forwarding) instance. This allows the customer to either utilize their own remote IP addresses or overlap with a large majority of the SoftLayer infrastructure; however, you must be aware that if you utilize the 10.x.x.x network you still cannot overlap with your hosts within SoftLayer nor within the SoftLayer services network ( and You will not be able to set any of the following for your remote prefixes:,,,,, and any IP ranges assigned to your VLANs on the SoftLayer platform. When choosing the VRF option, the ability to use SoftLayer VPN services for management of your servers will no longer be possible. Routing via BGP is optional.



Will I need to provide my own cross-connect?
Yes, you will need to order your own cross-connect at your data center of choice—to be connected to the SoftLayer switch port described in the LOA (Letter of Authorization) provided.

What kind of cross-connects are supported?
We strictly use Single Mode Fiber (SMF). We do not accept MMF or Copper.

What is the default size of the remote 172.16.*.* subnet assigned?
Unless otherwise requested, Direct Link customers will be assigned a /24 (256 IPs) subnet.

Which IP block has been reserved for SoftLayer servers on the backend?
We've allocated the entire block for use on the SL private network. Specifically, has been ear-marked for services. Here’s the full list of service subnets:

Which IP block has been reserved for point-to-point SoftLayer XCR to customer router? range. We normally allocate either a /30 or /31 subnet for the point-to-point connection (between our XCR and their equipment on the other end of the Direct Link).

Does Direct Link support jumbo frames?
Yes, just like the private SoftLayer network Direct Link can support up to MTU (Maximum Transmission Unit) 9000-size jumbo frames.

Pricing and locations

A list of available locations and pricing can be found at

-Mathijs Dubbe

January 6, 2016

Do You Speak SoftLayer Object Storage?

So you’ve made the decision to utilize object storage at SoftLayer. Great! But are you and your applications fluent in object storage? Do you know how to transfer data to SoftLayer object storage as well as modify and delete objects? How about when to use APIs and when to use storage gateways? If not, you’re not alone.

We’ve found that most IT professionals understand the difference between “traditional” (i.e., file and block) storage and object storage. They have difficulty, however, navigating the methods to interact with SoftLayer’s object storage service that is based on OpenStack Swift. This is understandable because traditional storage systems expose volumes and or shares that can be mounted and consumed via iSCSI, NFS, or SMB protocols.

That’s not the case with object storage, including the object storage service offered by SoftLayer. Data is only accessed via the use of REST APIs and language bindings, third-party applications supporting SFTP, the SoftLayer customer portal, or via storage gateways.

The solutions are outlined below, including guidance on when to utilize each access method. Figure 1 provides a high level overview of the available options and their purpose.

Figure 1: Object storage data access methods

REST APIs and Language Bindings
The first and possibly most flexible method to access SoftLayer object storage is via REST APIs and language bindings. These APIs and bindings give you the ability to interact with SoftLayer object storage via command line or programmatically. As a result, you can create scripts to perform a file upload, download certain objects, and modify metadata related to the object. Additionally, the current support for PHP, Java, Ruby, and Python bindings give application developers the flexibility to support SoftLayer object storage in their applications.

While this method is flexible in terms of capabilities, it does assume the user has knowledge and experience writing scripts, programs, and applications. REST APIs and language bindings aren’t the best methods for IT organizations that want to integrate existing environment backup, archive, and disaster recovery solutions. These solutions typically require traditional storage mount points, which REST APIs and language bindings don’t provide.

Third-Party Applications
The second method is to use third-party applications that support SFTP. This method abstracts the use of REST APIs and gives users the ability to upload, download, and delete objects via a GUI. However, you won’t have the ability to modify metadata when using an SFTP client. Additionally, third-party applications have a 5GB upload limit placed on each object by SoftLayer and OpenStack Swift. If an object greater than 5GB needs to be uploaded, you have to follow the OpenStack method of creating large objects on object storage to assure successful and efficient object upload. Unless you’re comfortable with this methodology, it’s strongly recommended that you use either the REST APIs or storage gateway solutions to access files over 5GB.

SoftLayer Customer Portal
The third method to access SoftLayer object storage is to simply use the SoftLayer customer portal. By using the portal, you have the ability to add containers, add files to containers, delete files from containers, modify metadata, and enable CDN capabilities. As with the SFTP method of accessing the object store, you can upload an unlimited number of files as long as each file does not exceed 20MB in size. Also, there is no bulk upload option within the customer portal; users must select and upload on a per-file basis. While using the portal is simple, it does provide some limitations and is best for users only wanting to upload a few files that occupy 20MB or less.

Storage Gateways
The last method to access and utilize SoftLayer object storage is storage gateways. Unlike other methods, storage gateways are unique. They’re able to expose traditional storage protocols like iSCSI, NFS, CIFS, and SMB and translate the read/write/modify commands into REST API calls against the object storage service. As a result, these devices offer an easier path to consume SoftLayer object storage for businesses looking to integrate their on-premises environment with the cloud. Some storage gateways also have the ability to compress, deduplicate, and encrypt data in-flight and at-rest. Storage gateways work best with organizations looking to integrate existing applications requiring traditional storage access methods (like backup software) with object storage or to securely transfer and store data to cloud object storage.

While there are many methods to access SoftLayer object storage, it’s important that you select an option that best meets your requirements relating to data access, security, and integration. For example, if you’re writing an application that requires object storage, you would most likely choose to interact with object storage via REST APIs or use language bindings. Or, if you simply need to integrate existing applications in your environment to cloud object storage, storage gateway would be the best option. In all cases, make sure you can meet your requirements with the appropriate method.

Table 1 lists sample requirements and shows whether each option meets the requirements. Use it to help you with your decision making process:

Table 1: Decision making tool

Click here for more information about SoftLayer’s object storage service and click here for FAQs on object storage.

Click here for information about SoftLayer’s REST-APIs and language bindings.

-Daniel De Araujo & Naeem Altaf

December 28, 2015

Semantics: "Public," "Private," and "Hybrid" in Cloud Computing, Part II

Welcome back! In the second post in this two-part series, we’ll look at the third definition of “public” and “private,” and we’ll have that broader discussion about “hybrid”—and we’ll figure out where we go after the dust has cleared on the semantics. If you missed the first part of our series, take a moment to get up to speed here before you dive in.

Definition 3—Control: Bare Metal v. Virtual

A third school of thought in the “public v. private” conversation is actually an extension of Definition 2, but with an important distinction. In order for infrastructure to be “private,” no one else (not even the infrastructure provider) can have access to a given hardware node.

In Definition 2, a hardware node provisioned for single-tenancy would be considered private. That single-tenant environment could provide customers with control of the server at the bare metal level—or it could provide control at the operating system level on top of a provider-managed hypervisor. In Definition 3, the latter example would not be considered “private” because the infrastructure provider has some level of control over the server in the form of the virtualization hypervisor.

Under Definition 3, infrastructure provisioned with full control over bare metal hardware is “private,” while any provider-virtualized or shared environment would be considered “public.” With complete, uninterrupted control down to the bare metal, a user can monitor all access and activity on the infrastructure and secure it from any third-party usage.

Defining “public cloud” and “private cloud” using the bare metal versus virtual delineation is easy. If a user orders infrastructure resources from a provider, and those resources are delivered from a shared, virtualized environment, that infrastructure would be considered public cloud. If the user orders a number of bare metal servers and chooses to install and maintain his or her own virtualization layer across those bare metal servers, that environment would be a private cloud.


Mix and Match

Now that we see the different meanings “public” and “private” can have in cloud computing, the idea of a “hybrid” environment is a lot less confusing. In actuality, it really only has one definition: A hybrid environment is a combination of any variation of public and private infrastructure.

Using bare metal servers for your database and virtual servers for your Web tier? That’s a hybrid approach. Using your own data centers for some of your applications and scaling out into another provider’s data centers when needed? That’s hybrid, too. As soon as you start using multiple types of infrastructure, by definition, you’ve created a hybrid environment.

And Throw in the Kitchen Sink

Taking our simple definition of “hybrid” one step further, we find a few other variations of that term’s usage. Because the cloud stack is made up of several levels of services—Infrastructure as a Service, Platform as a Service, Software as a Service, Business Process as a Service—“hybrid” may be defined by incorporating various “aaS” offerings into a single environment.

Perhaps you need bare metal infrastructure to build an off-prem private cloud at the IaaS level—and you also want to incorporate a managed analytics service at the BPaaS level. Or maybe you want to keep all of your production data on-prem and do your sandbox development in a PaaS environment like Bluemix. At the end of the day, what you’re really doing is leveraging a “hybrid” model.

Where do we go from here?

Once we can agree that this underlying semantic problem exists, we should be able to start having better conversations:

  • Them: We’re considering a hybrid approach to hosting our next application.
  • You: Oh yeah? What platforms or tools are we going to use in that approach?
  • Them: We want to try and incorporate public and private cloud infrastructure.
  • You: That’s interesting. I know that there are a few different definitions of public and private when it comes to infrastructure…which do you mean?
  • Them: That’s a profound observation! Since we have our own data centers, we consider the infrastructure there to be our private cloud, and we’re going to use bare metal servers from SoftLayer as our public cloud.
  • You: Brilliant! Especially the fact that we’re using SoftLayer.

Your mileage may vary, but that’s the kind of discussion we can get behind.

And if your conversation partner balks at either of your questions, send them over to this blog post series.


December 18, 2015

Semantics: "Public, "Private," and "Hybrid" in Cloud Computing, Part I

What does the word “gift” mean to you? In English, it most often refers to a present or something given voluntarily. In German, it has a completely different meaning: “poison.” If a box marked “gift” is placed in front of an English-speaker, it’s safe to assume that he or she would interact with it very differently than a German-speaker would.

In the same way, simple words like “public,” “private,” and “hybrid” in cloud computing can mean very different things to different audiences. But unlike our “gift” example above (which would normally have some language or cultural context), it’s much more difficult for cloud computing audiences to decipher meaning when terms like “public cloud,” “private cloud,” and “hybrid cloud” are used.

We, as an industry, need to focus on semantics.

In this two-part series, we’ll look at three different definitions of “public” and “private” to set the stage for a broader discussion about “hybrid.”

“Public” v. “Private”

Definition 1—Location: On-premises v. Off-premises

For some audiences (and the enterprise market), whether an infrastructure is public or private is largely a question of location. Does a business own and maintain the data centers, servers, and networking gear it uses for its IT needs, or does the business use gear that’s owned and maintained by another party?

This definition of “public v. private” makes sense for an audience that happens to own and operate its own data centers. If a business has exclusive physical access to and ownership of its gear, the business considers that gear “private.” If another provider handles the physical access and ownership of the gear, the business considers that gear “public.”

We can extend this definition a step further to understand what this audience would consider to be a “private cloud.” Using this definition of “private,” a private cloud is an environment with an abstracted “cloud” management layer (a la OpenStack or CloudStack or VMWare) that runs in a company’s own data center. In contrast, this audience would consider a “public cloud” to be a similar environment that’s owned and maintained by another provider.

Enterprises are often more likely to use this definition because they’re often the only ones that can afford to build and run their own data centers. They use “public” and “private” to distinguish between their own facilities or outside facilities. This definition does not make sense for businesses that don’t have their own data center facilities.

Definition 2—Population: Single-tenant v. Multi-tenant

Businesses that don’t own their own data center facilities would not use Definition 1 to distinguish “public” and “private” infrastructure. If the infrastructure they use is wholly owned and physically maintained by another provider, these businesses are most interested in whether hardware resources are shared with any other customers: Do any other customers have data on or access to a given server’s hardware? If so, the infrastructure is public. If not, the infrastructure is private.

Using this definition, public and private infrastructure could be served from the same third-party-owned data center, and the infrastructure could even be in the same server rack. “Public” infrastructure just happens to provide multiple users with resources and access to a single hardware node. Note: Even though the hardware node is shared, each user can only access his or her own data and allotted resources.

On the flip side, if a user has exclusive access to a hardware node, a business using Definition 2 would consider the node to be private.

Using this definition of “public” and “private,” multiple users share resources at the server level in a “public cloud” environment—and only one user has access to resources at the server level in a “private cloud” environment. Depending on the environment configuration, a “private cloud” user may or may not have full control over the individual servers he or she is using.

This definition echoes back to Definition 1, but it is more granular. Businesses using Definition 2 believe that infrastructure is public or private based on single-tenancy or multi-tenancy at the hardware level, whereas businesses using Definition 1 consider infrastructure to be public or private based on whether the data center itself is single-tenant or multi-tenant.

Have we blown your minds yet? Stay tuned for Part II, where we’ll tackle bare metal servers, virtual servers, and control. We’ll also show you how clear hybrid environments really are, and we’ll figure out where the heck we go from here now that we’ve figured it all out.


December 17, 2015

Xen Hypervisor Maintenance - December 2015

Security of your assets on our cloud platform is very important to the SoftLayer team. Last week, our Security Operations Center – which provides real time monitoring of suspicious activity (including being part of multiple security pre-disclosure lists) – alerted our engineering team to a potential vulnerability (advisory CVE-2015-8555 / XSA-165) in the Xen Hypervisor that if left un-remediated could allow a malicious user to access data from another VSI guest sharing the same hardware node and hypervisor instance.

Upon learning of this vulnerability, SoftLayer issued a notification including a per-data center schedule for applying critical maintenance to remediate the vulnerability. Our schedule was performed over multiple days and on a POD-by-POD basis with individual VM instances being offline for minutes while they rebooted. The updates were completed successfully in all data centers in advance of the public announcement of this vulnerability.

While deployment techniques such as clustering and failover across data centers and PODs allows continuous operations during a planned or unplanned event, you should be aware that SoftLayer is committed to working aggressively to further reduce the impact of events on your deployment and operations teams.

We value your business and will continue to take actions that insure your environment is secure and efficient to operate. If you have any questions or concerns, don't hesitate to reach out to SoftLayer support or your direct SoftLayer contacts.


November 4, 2015

Shared, scalable, and resilient storage without SAN

Storage area networks (SAN) are used most often in the enterprise world. In many enterprises, you will see racks filled with these large storage arrays. They are mainly used to provide a centralized storage platform with limited scalability. They require special training to operate, are expensive to purchase, support, or expand, and if those devices fail, there is big trouble.

Some people might say SAN devices are a necessary evil. But are they really necessary? Aren’t there alternatives?

Most startups nowadays are running their services on commodity hardware, with smart software to distribute their content across server farms globally. Current, well established, and successful companies that run websites or apps like Whatsapp, Facebook, or LinkedIn continue to operate pretty much the same way they started. They need the ability to scale and perform at unpredictable rates all around the world, so they use commodity hardware combined with smart software. These types of companies need the features that SAN storage offers them—but with more scalable, global resiliency, and without being centralized or having to buy expensive hardware. But how do they provide server access to the same data, and how do they avoid data loss?

The answer is actually quite simple, although its technology is quite sophisticated: distributed storage.

In a world where virtualization has become a standard for most companies, where even applications and networking are being virtualized, virtualization giant VMware answers this question with Virtual SAN. It effectively eliminates the need for SAN hardware in a VMware environment (and it will also be available for purchase from SoftLayer before the end of the year). Other similar distributed products are GlusterFS (also offered in our QuantaStor solution), Ceph, Microsoft Windows DFS, Hadoop HDFS, document-oriented databases like MongoDB, and many more.

Many solutions, however, vary in maturity. Object storage is a great example of a new type of storage that has come to market, which doesn’t require SAN devices. With SoftLayer, you can and may run them all.

When you have bare metal servers set up as hypervisors or application servers, it’s likely you have a lot of drive bays within those servers, mostly unused. Stuffing them with hard drives and allowing the software to distribute your data across multiple servers in multiple locations with two or three replicas will result in a big, safe, fast, and distributed storage platform. For such a platform, scaling it would be just adding more bare metal servers with even more hard drives and letting the software handle the rest.

Nowadays we are seeing more and more hardware solutions like SAN—or even networking—being replaced with smarter software on simpler and more affordable hardware. At SoftLayer, we offer month-to-month and hourly bare metal servers with up to 36 drive bays, potentially providing a lot of room for storage. With 10Gbps global connectivity options, we offer fast, low latency networking for syncing between servers and delivering data to the customer.


Subscribe to technology