Technology Posts

July 5, 2016

Figuring Out the “Why” of IBM

When IBM acquired SoftLayer, I felt proud. I thought, “Now we can make a difference.” Why did I feel that way, and why didn’t I think we could make a difference where we were? What brought out these feelings about IBM?

As I expand my knowledge of programming, I often come across books that don’t really pertain strictly to software development—but they pique my interest. The most recent of those is Start with Why: How Great Leaders Inspire Everyone to Take Action by Simon Sinek, suggested in a recent talk by Mary Poppendiek about leading development. Start with Why is a book about product development, leadership, and life in general. It explains why we feel the way we do about certain companies and how we should move forward to generate that feeling about ourselves and the companies we believe in.

Who cares why?

In Start with Why, Sinek talks about several different big companies, including Apple, Harley-Davidson, and Walmart. He writes that one thing that is very important when developing a product or even working in a company is to understand that company’s “why.” What makes the company tick? He says Apple has a clear message about this: “to start a revolution.” He claims Apple is clear as to why they do what they do and it has formed a culture of people around it that cares more about that message than any one product they sell. The products, in turn, embody that message, as do Apple employees. This is why, when Apple decided to move into the phone, tablet, and music industry, rather than focus only computers and hardware, their customers moved with them. Although the differences between an Apple iPad and a Dell tablet might be small, Apple consumers like feeling that they are part of the Apple society, so they choose what they know and love, based on their gut instinct.

Think now about Harley-Davidson. Many of its customers have tattoos with the Harley-Davidson logo, because those customers identify with the lifestyle that Harley-Davison projects—a statement more about the person than the company. It says, “I am a Harley-Davison type of person.” Mitsubishi or Kawasaki could have similar bikes—of even better quality and cheaper prices—but that customer is choosing Harley-Davidson. They have made a lifetime commitment to a brand because they identify with the iconography and want to be a part of the society that is Harley-Davidson.

What is IBM’s “why”?

I applied the idea of “why” to my work and my company, bringing up the question, “What is IBM’s ‘why’?” In pursuit of this question, I searched “Why IBM?” on the IBM intranet. Luckily, there was a document meant for sales reps to help define IBM for new customers with the following on the first slide:

“IBM is a global information technology services company operating in over 170 countries. We bring innovative solutions to a diverse client base to help solve some of their toughest business challenges. In addition to being the world’s largest information technology and consulting services company, IBM is a global business and technology leader, innovating in research and development to shape the future of society at large.”

I dissected this blurb, pulling out the parts which describe IBM. I ended up with this:

  • IBM is large (the world’s largest)
  • IBM is global (diverse, international, in more than 170 countries)
  • IBM is business-oriented (solves business challenges)
  • IBM is a technology leader (innovative, focus on research and development)
  • IBM is shaping the future of society at large

Then I put it together into a single sentence:

“IBM is a large, global, business-oriented technology leader, shaping the future of society at large.”

That is when I realized that I was too focused on IBM’s “what,” so I removed everything that focused too heavily on the subject of the sentence (IBM) and focused my attention instead on the predicate. This left me with a single, easy sentence answering the questions: “Why is IBM?”, “What is its function?”, and “What are we trying to do?”.

“IBM is shaping the future of society at large.”

This is why IBMers get up in the morning. This is why we work hard. This is what we are hoping to accomplish in our own lives.

Simon Sinek states, “The 'why' comes from looking back.” Every person or company’s achievement should prove the “why”—so how do we prove IBM’s “why”? Let’s take a look at some of our victories in the past and present and compare.

In 1937, IBM’s tabulating equipment helped maintain employment records for 26 million people in support of the Social Security Act. In 1973-1974, IBM developed the Universal Product Code and released systems to support bar code scanning and automatic inventory maintenance. In a recent employee webcast, IBM’s senior vice president of Global Technology Services Martin Jetter communicated the idea, “We are the backbone of he world’s economy.” His supporting comments included our footprint in the airline industry, stating, “We manage the systems that support 25 percent of the total paid-passenger miles flown globally.” He also said, “Our services support 60 percent of daily card transactions in banking, 53 percent of mobile connections worldwide in telecom, and 61 percent of passenger vehicles produced in the auto industry.”

Lately, IBM brought attention to its revolutionary AI, better known as Watson, and is ushering in the idea of cognitive business analytics. In my opinion, these things prove that we are invested in shaping the future of a global society.

What does this mean about IBM? What does this mean about me?

I can’t speak for IBM as a whole, but I can talk about myself. I want to be a part of something bigger than myself; I want to contribute in a meaningful way, and understand what that contribution meant. I believe in a global society; we are all in this world together and I feel like there are more important issues that we can deal with other than our differences. I want to lead, or be a part of a team that leads; I strive to be successful. I am not OK with the status quo; I believe there is a better way. I have hope for the future. I don’t want to start a revolution. I want to be a part of something more pervasive, an underlying foundation that helps society thrive—not just changing society for the sake of change. I want to help lay a foundation that allows it to thrive and grow into something better. I believe that IBM identifies these goals, and projects this same message—a message that resonates with me at a very basic level. It sums up why I am proud to be an IBMer.

What about you?

“I am an IBMer” is not a sentiment that only employees need. In fact, it should go well beyond being employed at IBM. Our customers should feel the sentiment as well. Even people completely unaffiliated with IBM should be able to say, “I am an IBMer,” meaning that they believe in the same dream—the dream of a global society, working together to meet global goals; a dream about the future of society at-large.

What does IBM mean to you? Are you an IBMer too?

-Kevin Trachier

June 20, 2016

VMware on SoftLayer Just Got Even Easier

SoftLayer customers have been bringing VMware workloads and VMware add-ons to the infrastructure as a service (IaaS) platform for years. With the roll-out of per-processor monthly licensing and the automation of vSphere and vCenter deployment, the provisioning process has never been easier. 

Now SoftLayer has taken the next step by allowing customers to order and manage VMware add-ons with the same per-processor monthly pricing model. To celebrate, the sales engineering team has updated KnowledgeLayer and added a new section focused on VMware 6, including step-by-step guides for getting started on the platform. VMware vSphere 6 Getting Started, for example, details how to get vSphere servers up and running. It gives a detailed instructions on how to create from scratch, what VLAN and IP addresses customer should use, and the recommended network structure.  

Let’s review what else is new.

SoftLayer has added the vCenter Server Appliance to the catalog to allow customers to fully scale their environments up on their own. We’ve also added instructions on how you can deploy vCenter as an appliance. For smaller environments, customers can still deploy vCenter as a Windows add-on and get up and running in under an hour.

To make the vCenter appliance and other add-ons possible, SoftLayer has enhanced the customer portal to allow customers to order and manage all VMware licensing add-ons in a simple panel. Customers use this system to order and manage licenses for vCenter Server Appliance, Virtual SAN, NSX-V, Site Recovery Manager, and vRealize Operations/Automation/Log Insight. Combined with speedy SoftLayer bare metal server provisioning times, customers can stand up or extend their VMware footprint across the globe in no time.

VMware NSX on SoftLayer is nothing new, but the capabilities of the latest version and the month-to-month pricing make it an option worth considering. Between the edge gateways and distributed networking enhancements, customers can build security and standardization into the platform that follows their workloads from server to server and site to site. Customers can span a private layer 2 domain across completely different locations by using a VXLAN overlay across a layer 3 routed network. This is particularly useful for disaster recovery and for bursting on-premises workloads out to SoftLayer. Customers also leverage NSX to isolate workloads in a multi-tenant environment without the need for additional VLANs from SoftLayer. VMware 6 NSX Getting Started is your first stop to learn about micro-segmentation and best practices with NSX at SoftLayer.

VMware Virtual SAN is our latest addition to the platform and provides customers with a great option for hosting mission-critical workloads on single-tenant infrastructure with software-defined storage (SDS). Customers can leverage common x86 compute available on SoftLayer to build reliable, high performance, and scalable dedicated storage pools. It was designed for performance (caching and local disk access), affordability (mixing solid state and capacity SATA drives), and supportability without the need for a storage architect. It is tightly integrated with vSphere administration and brings features like snapshots, linked clones, vSphere Replication, and vSphere APIs for data protection. 

If you have questions about VMware on the SoftLayer cloud, get in touch with our sales representatives on live chat or phone. They’ll be happy to help and can also coordinate a consultation with the SoftLayer sales engineering team if you need one. You may find some of your initial questions have already been answered in our VMware FAQ.

I’m also delighted to share some video tutorials our sales engineering team created, entitled, “Getting Started With VMware 6.0 (Parts 1, 2, 3, 4).” This series will give you examples of deploying VMware and get some of your initial questions answered.

With that said, why not start deploying your VMware solution—or expanding your current VMware workloads with feature rich add-ons? Now is the best time for you to take advantage of our promotion to spin up your VMware solution at SoftLayer. Ask a SoftLayer sales representative on live chat to get more details.

-Rick Ji

May 27, 2016

Data Security and Encryption in the Cloud

In Wikipedia’s words, encryption is the process of encoding messages or information in such a way that only authorized parties can read it. On a daily basis, I meet customers from various verticals. Whether it is health care, finance, government, technology, or any other public or privately held entity, they all have specific data security requirements. More importantly, the thought of moving to a public cloud brings its own set of challenges around data security. In fact, data security is the biggest hurdle when making the move from a traditional on-premises data center to a public cloud.

One of the ways to protect your data is by encryption. There are a few ways to encrypt data, and they all have their pros and cons. By the end of this post, you will hopefully have a better understanding of the options available to you and how to choose one that meets your data security requirements.

Data “At Rest” Encryption

At rest encryption refers to the encryption of data that is not moving. This data is usually stored on hardware such as local disk, SAN, NAS, or other portable storage devices. Regardless of how the data gets there, as long as it remains on that device and is not transferred or transmitted over a network, it is considered at rest data.

There are different methodologies to encrypt at rest data. Let’s look at the few most common ones:

Disk Encryption: This is a method where all data on a particular physical disk is encrypted. This can be done by using SED (self-encrypting disk) or using a third party solutions from vendors like Vormetric, SafeNet, PrimeFactors, and more. In a public cloud environment, your data will most likely be hosted on a multitenant SAN infrastructure, so key management and the public cloud vendor’s ability to offer dedicated, local, or SAN spindles becomes critical. Moreover, keep in mind that using this encryption methodology does not protect data when it leaves the disk. This method may also be more expensive and may add management overhead. On the other hand, disk encryption solutions are mostly operating system agnostic, allowing for more flexibility.

File Level Encryption: File level encryption is usually implemented by running a third-party application within the operating system to encrypt files and folders. In many cases, these solutions create a virtual or a logical disk where all files and folders residing in it are encrypted. Tools like VeraCrypt (TrueCrypt’s successor), BitLocker, and 7-Zip are a few examples of file encryption software. These are very easy to implement and support all major operating systems.  

Data “In Flight” Encryption

Encrypting data in flight involves encrypting the data stream at one point and decrypting it at another point. For example, if you replicate data across two data centers and want to ensure confidentiality of this exchange, you would use data in flight encryption to encrypt the data stream as it leaves the primary data center, then decrypt it at the other end of the cable at the secondary data center. Since the data exchange is very brief, the keys used to encrypt the frames or packets are no longer needed after the data is decrypted at the other end so they are discarded—no need to manage these keys. Most common protocols used for in flight data encryption are IPsec VPN and TLS/SSL.

And there you have it. Hopefully by now you have a good understanding of the most commonly encryption options available to you. Just keep in mind that more often than not, at rest and in flight encryption are implemented in conjunction and complement each other. When choosing the right methodology, it is critical to understand the use case, application, and compliance requirements. You would also want to make sure that the software or the technology you chose adheres to the highest level of encryption standards, such as 3DES, RSA, AES, Blowfish, etc.

-Zeb Ahmed

May 19, 2016

Bringing the power of GPUs to cloud

The GPU was invented by NVIDIA back in 1999 as a way to quickly render computer graphics by offloading the computational burden from the CPU. A great deal has happened since then—GPUs are now enablers for leading edge deep learning, scientific research, design, and “fast data” querying startups that have ambitions of changing the world.

That’s because GPUs are very efficient at manipulating computer graphics, image processing, and other computationally intensive high performance computing (HPC) applications. Their highly parallel structure makes them more effective than general purpose CPUs for algorithms where the processing of large blocks of data is done in parallel. GPUs, capable of handling multiple calculations at the same time, also have a major performance advantage. This is the reason SoftLayer (now part of IBM Cloud) has brought these capabilities to a broader audience.

We support the NVIDIA Tesla Accelerated Computing Platform, which makes HPC capabilities more accessible to, and affordable for, everyone. Companies like Artomatix and MapD are using our NVIDIA GPU offerings to achieve unprecedented speed and performance, traditionally only achievable by building or renting an HPC lab.

By provisioning SoftLayer bare metal servers with cutting-edge NVIDIA GPU accelerators, any business can harness the processing power needed for HPC. This enables businesses to manage the most complex, compute-intensive workloads—from deep learning and big data analytics to video effects—using affordable, on-demand computing infrastructure.

Take a look at some of the groundbreaking results companies like MapD are experiencing using GPU-enabled technology running on IBM Cloud. They’re making big data exploration visually interactive and insightful by using NVIDIA Tesla K80 GPU accelerators running on SoftLayer bare metal servers.

SoftLayer has also added the NVIDIA Tesla M60 GPU to our arsenal. This GPU technology enables clients to deploy fewer, more powerful servers on our cloud while being able to churn through more jobs. Specifically, running server simulations are cut down from weeks or days to hours when compared to using a CPU-only based server—think of performance running tools and applications like Amber for molecular dynamics, Terachem for quantum chemistry, and Echelon for oil and gas.

The Tesla M60 also speeds up virtualized desktop applications. There is widespread support for running virtualized applications such as AutoCAD to Siemens NX from a GPU server. This allows clients to centralize their infrastructure while providing access to the application, regardless of location. There are endless use cases with GPUs.

With this arsenal, we are one step closer to offering real supercomputing performance on a pay-as-you-go basis, which makes this new approach to tackling big data problems accessible to customers of all sizes. We are at an interesting inflection point in our industry, where GPU technology is opening the door for the next wave of breakthroughs across multiple industries.

-Jerry Gutierrez

April 26, 2016

Cloud. Ready-to-Wear.

It’s been five years since I started my journey with SoftLayer. And what a journey it has been—from being one of the first few folks in our Amsterdam office, to becoming part of the mega-family of IBMers; from one data center in Europe to six on this side of the pond and 40+ around the globe; from “Who is SoftLayer?” (or my favorite, “SoftPlayer”), to becoming a cloud environment fundamental for some of the biggest and boldest organizations worldwide.

But the most thrilling difference between 2016 and 2011 that I’ve been observing lately is a shift of the market’s perception of cloud, which matters are important to adopters, and the technology itself becoming mainstream.

Organizations of all sizes—small, medium, and large, while still raising valid questions around the level of control and security—are more often talking about challenges regarding managing the combined on-prem and shared environments, readiness of their legacy applications to migrate to cloud, and their staff competency to orchestrate the new architecture.

At Cloud Expo 2016 (the fifth one for the SoftLayer EMEA team), next to two tremendous keynotes given by Sebastian Krause, General Manager IBM Cloud Europe, and by Rashik Parmar, Lead IBM Cloud Advisor/Europe IBM Distinguished Engineer, we held a roundtable to discuss the connection between hybrid cloud and agile business. Moderated by Rashik Parmar, the discussion confirmed the market’s evolution: from recognizing cloud as technology still proving its value, to technology critical in gaining a competitive advantage in today’s dynamic economy.

Rashik’s guests had deep technology backgrounds and came from organizations of all sizes and flavors—banking, supply chain managements, ISV, publishing, manufacturing, MSP, insurance, and digital entertainment, to name a few. Most of them already have live cloud deployments, or they have one ready to go into production this year.

When it came to the core factors underlying a move into the cloud, they unanimously listed gaining business agility and faster time-to-market. For a few minutes, there was a lively conversation among the panelists about the cost and savings. They raised examples citing  poorly planned cloud implementations that were 20-30 percent more costly than keeping the legacy IT setup. Based on an example of a large Australian bank, Rashik urged companies to start the process of moving into cloud with a vigilant map of their own applications landscape before thinking about remodeling the architecture to accommodate cloud.

The next questions the panelists tackled pertained to the drivers behind building hybrid cloud environments, which included:

  • Starting with some workloads and building a business case based on their success; from there, expanding the solution organization-wide
  • Increasing the speed of market entry for new solutions and products
  • Retiring certain legacy applications on-prem, while deploying new ones on cloud
  • Regulatory requirements that demand some workloads or data to remain on-prem.

When asked to define “hybrid cloud,” Rashik addressed the highly ambiguous term by simply stating that it refers to any combination of software-defined environment and automation with traditional IT.

The delegates discussed the types of cloud—local, dedicated, and shared—and found it difficult to define who controls hybrid cloud, and who is accountable for what component when something goes wrong. There was a general agreement that many organizations still put physical security over the digital one, which is not entirely applicable in the world of cloud.

Rashik explored, from his experience, where most cases of migrating into cloud usually originate. He referred to usage patterns and how organizations become agile with hybrid IT. The delegates agreed that gaining an option of immediate burstability and removing the headache of optimal resource management, from hardware to internal talent, are especially important.

Rashik then addressed the inhibitors of moving into cloud—and here’s the part that inspired me to write this post. While mentions of security (data security and job security) and the control over the environment arose, the focus repeatedly shifted toward the challenges of applications being incompatible with cloud architecture, complicated applications landscape, and scarcity of IT professionals skilled in managing complex (hybrid) cloud environments.

This is a visible trend that demonstrates the market has left the cloud department store’s changing room, and ready not only to make the purchase, but “ready to wear” the new technology with a clear plan where, when, and with an aim to achieve specific outcomes.

The conversation ended with energizing insights about API-driven innovation that enables developers to assemble a wide spectrum of functions, as opposed to being “just a coder.” Other topics included cognitive computing that bridges digital business with digital intelligence, and platforms such as blockchain that are gaining momentum.

To think that not so long ago, I had to explain to the average Cloud Expo delegate what “IaaS” stand for. We’ve come a long way.



April 5, 2016

When in doubt with firewalls, “How Do I?” it out

Spring is a great time to take stock and wipe off the cobwebs at home. Within the sales engineering department at SoftLayer, we thought it was a good idea to take a deeper look at our hardware firewall products and revamp our support documentation. Whether you’re using our shared hardware firewalls, a dedicated hardware firewall, or the FortiGate Security Appliance, we have lots of new information to share with you on KnowledgeLayer.

One aspect we’re highlighting is a series of articles entitled, “How Do I?” within the Firewalls KnowledgeLayer node.  A "How Do I?" provides you with a detailed explanation about how to use a SoftLayer service or tool with the customer portal or API.  

For example, perhaps your cloud admin has just won the lottery, and has left the company. And now you need to reorient yourself with your company’s security posture in the cloud. Your first step might be to read “How Do I View My Firewalls?” which provides step-by-step instructions about how to view and manage your hardware firewalls at SoftLayer within the customer portal. If you discover you've been relying on iptables instead of an actual firewall to secure your applications, don't panic—ordering and securing your infrastructure with hardware firewalls can be done in minutes. Be sure to disable any accounts and API keys you no longer need within the Account tab. If you're new to SoftLayer and our portal, take a look at our on-demand webinars and training video series.

Now that you’ve identified the types of firewalls you have protecting your infrastructure, fel free to drill in to our updated articles that can help you out. If you’re running a dedicated hardware firewall and want to know how to manage it within the portal, this “How Do I?” article is for you. We’ve also tailored “How Do I?” entries for shared hardware firewalls and the FortiGate Security Appliance to help you beat the heat in no time. The SoftLayer customer portal also provides you with the ability to download firewall access logs in a CSV file. See for yourself how the Internet can truly be a hostile environment for a web-facing server. Every access attempt blocked by your firewall has saved your server from the work of processing software firewall rules, and keeps your application safer.  

We know that not all issues can be covered by how-to articles. To address that, we’ve also added a number of new entries to the Firewalls FAQ section. 

Keep the feedback coming! We’re here to help answer your sales-related technical questions. And be sure to check out our latest Sales Engineering Webinar: Creating a Digital Defense Plan with Firewalls. 

March 25, 2016

Be an Expert: Handle Drive Failures with Ease

Bare metal servers at SoftLayer employ best-in-class and industry proven SAS, SATA, or SSD disks, which are extensively tested and qualified in-house by the data center technicians. They are reliable and are enterprise grade hardware. However, single-point device failure cannot be neglected for unforeseen circumstances. HDD or device failures could happen for various reasons like power surge, mechanical/internal failure, drive firmware bugs, overheating, aging, etc. Though all efforts are made to mitigate these issues by selecting the best-in-class hard drives and pre-tested devices before making them available to customer, one could still run into drive failures occasionally.

Is having RAID protection just good enough?

Drive failures on dedicated bare metal servers may cause data loss, downtime, and service interruptions if they are not adequately deployed with a risk mitigation plan. As a first line of defense, users choose to have RAID at various levels. This may seem sufficient but may have the following problems:

  • Volume associated with the failed drive becomes degraded. This brings the VD performance below acceptable level. A degraded volume is most likely to disable write-back caching and further degrades write performance as well.
  • There is always a chance of another disk failing in the meantime. Unless a new disk is inserted and a rebuild is completed, a second disk failure could be catastrophic.    

Today a manual response to disk failure may take quite some time between when the user gets notified or becomes aware that the disks have failed and when a technician is involved to change the disks at the servers. During this time, a second disk failure is looming large over the user—while the system is in a degraded state.

To mitigate this risk, SoftLayer recommends that users always have a Global Hot Spare or Dedicated Hot Spare Disks wherever available on the bare metal servers. Users can choose one or more Hot Spare disks per server. This typically requires the user to earmark a drive slot for hot spares. It is recommended while ordering bare metal servers to take into consideration having empty drive slots for global hot spare drives.

Adding Hot Spare on a LSI MegaRAID Adaptor

Users can use WebBIOS utility or MegaRAID Storage Manager to add Hot Spare drive.

It is easiest to configure using MegaRAID Storage Manager Software,  available on the AVAGO website

Once logged in, you’ll will want to choose the Logical tab to view the unused disks under the “Unconfigured Drives.” Right-clicking and selecting “Assign Global Hot Spare” will make sure this drive is standby for any drive failure for any of the RAID volumes configured in the system. You can also choose to have Dedicated Hot Spare for specific volumes, which are critical. Figure 1 shows how to add a Global Hot Space using MSM. MegaRAID Storage Manager can also be used to access the server from a third-party machine or service laptops by providing the server IP address.

Figure 1 shows how to add a Global Hot Space using MSM.

You can also use the WebBios interface to add Hot Spare drives. This is done by breaking into the card BIOS at the early stage of booting by using Ctrl+R to access the BIOS Configuration Utility. As a prerequisite for accessing the KVM screen to see the boot time messages, you’ll need to VPN into the SoftLayer network and use KVM under the “Actions” dropdown in the customer portal.

Once inside the WebBIOS screen, access the “PD Mgmt” tab and choose a free drive. Pressing F2 on the highlighted drive will display a menu for making the drive as a Global Hot Spare. Figure 2 below provides more details for making a Hot Spare using BIOS interface. We recommend using virtual keyboard while navigating and issuing commands in the KVM viewer.

Figure 2 provides more details for making a Hot Spare using BIOS interface.

Adding Hot Spare Through Adaptec Adaptor

Adaptec also provides the Adaptec Storage Manager and a BIOS option to add Global Hot Spares.

The Adaptec Storage Manager comes preinstalled on SoftLayer servers for the supported chosen OS. This can also be downloaded for the specific Adaptec card from this link. After launching the Adaptec Storage Manager, users can select a specific available free drive and create a global hot spare drive as shown in Figure 3.

After launching the Adaptec Storage Manager, users can select a specific available free drive and create a global hot spare drive as shown in Figure 3.

Adaptec also provides a BIOS-based configuration utility that can be used to add a Hot Spare. To do this, you’ll need to break into the BIOS utility by using Ctrl+A at the early boot. After that, select the Global Hot Spares from the main menu to enter the drive selection page. Select a drive by pressing Insert and Enter to submit changes. Figure 4 below depicts the selection of a Global Hot Spare using BIOS configuration utility.

Figure 4 depicts the selection of a Global Hot Spare using BIOS configuration utility.

Using Hot Spares reduces a risk of further drive failures and also lowers the time the system remains in degraded state. We recommend  SoftLayer customers leverage these benefits on their bare metal servers to be better armed against drive failures.


March 24, 2016

future.ready(): 7 Things to Check Off Your Big Data Development List

Frank Ketelaars, Big Data Technical Leader for Europe at IBM, offers a checklist that every developer should have pinned to their board when starting a big data project. Editor’s Note: Does your brain switch off when you hear industryspeak words like “innovation,” “transformation,” “leading edge,” “disruptive,” and “paradigm shift”? Go on, go ahead and admit it. Ours do, too. That’s why we’re launching the future.ready() series—consisting of blogs, podcasts, webinars, and Twitter chats— with content created by developers, for developers. Nothing fluffy, nothing buzzy. With the future.ready() series, we aim to equip you with tools and knowledge that you can use—not just talk and tweet about.

For the first edition, I’ve invited Frank Ketelaars, an expert in high volume data space, to walk us through seven things to check off when starting a big data development project.

-Michalina Kiera, SoftLayer EMEA senior marketing manager


This year, big data moves from a water cooler discussion to the to-do list. Gartner estimates that more than 75 percent of companies are investing or planning to invest in big data in the next two years.

I have worked on multiple high volume projects in industries that include banking, telecommunications, manufacturing, life sciences, and government, and in roles including architect, big data developer, and streaming analytics specialist. Based on my experience, here’s a checklist I put together that should give developers a good start. Did I miss anything? Join me on the Twitter chat or webinar to share your experience, ask questions, and discuss further. (See details below.)     

1. Team up with a person who has a budget and a problem you can solve.

For a successful big data project, you need to solve a business problem that’s keeping somebody awake at night. If there isn’t a business problem and a business owner—ideally one with a budget— your project won’t get implemented. Experimentation is important when learning any new technology. But before you invest a lot of time in your big data platform, find your sponsor. To do so, you’ll need to talk to everyone, including IT, business users, and management. Remember that the technical advantages of analytics at scale might not immediately translate into business value.

2. Get your systems ready to collect the data.

With additional data sources, such as devices, vehicles, and sensors connected to networks and generating data, the variety of information and transportation mechanisms has grown dramatically, posing new challenges for the collection and interpretation of data.

Big data often comes from sources outside the business. External data comes at you in a variety of formats (including XML, JSON, and binary), and using a variety of different APIs. In 2016, you might think that everyone is on REST and JSON, but think again: SOAP still exists! The variety of the data is the primary technical driver behind big data investments, according to a survey of 402 business and IT professionals by management consultancy NewVantage Partners[SM1] . From one day to the next, the API might change or a source might become unavailable.

Maybe one day we’ll see more standardization, but it won’t happen any time soon. For now, developers must plan to spend time checking for changes in APIs and data formats, and be ready to respond quickly to avoid service interruptions. And to expect the unexpected.

3. Make sure you have the right to use that data.

Governance is a business challenge, but it’s going to touch developers more than ever before—from the very start of the project. Much of the data they will be handling is unstructured, such as text records from a call center. That makes it hard to work out what’s confidential, what needs to be masked, and what can be shared freely with external developers. Data will need to be structured before it can be analyzed, but part of that process includes working out where the sensitive data is, and putting measures in place to ensure it is adequately protected throughout its lifecycle.

Developers need to work closely with the business to ensure that they can keep data safe, and provide end users with a guarantee that the right data is being analyzed and that its provenance can be trusted. Part of that process will be about finding somebody who will take ownership of the data and attest to its quality.

4. Pick the right tools and languages.

With no real standards in place yet, there are many different languages and tools used to collect, store, transport, and analyze big data. Languages include R, Python, Julia, Scala, and Go (plus the Java and C++ you might need to work with your existing systems). Technologies include Apache Pig, Hadoop, and Spark, which provide massive parallel processing on top of a file system without Hadoop. There’s a list of 10 popular big data tools here, another 12 here, and a round-up of 45 big data tools here. 451 Research has created a map that classifies data platforms according to the database type, implementation model, and technology. It’s a great resource, but its 18-color key shows how complex the landscape has become.

Not all of these tools and technologies will be right for you, but they hint at one way the developer’s core competency must change. Big data will require developers to be polyglots, conversant in perhaps five languages, who specialize in learning new tools and languages fast—not deep experts in one or two languages.

Nota bene: MapReduce and Pig are among the top highest paid technology skills in the US, and other big data skills are likely to be highly sought-after as the demand for them also grows. Scala is a relatively new functional programming language for data preparation and analysis, and I predict it will be in high demand in the near future.

5. Forget “off-the-shelf.” Experiment and set up a big data solution that fits your needs. 

You can think of big data analytics tools like Hadoop as a car. You want to go to the showroom, pay, get in, and drive away. Instead, you’re given the wheels, doors, windows, chassis, engine, steering wheel, and a big bag of nuts and bolts. It’s your job to assemble it.

As InfoWorld notes, DevOps tools can help to create manageable Hadoop solutions. But you’re still faced with a lot of pieces to combine, diverse workloads, and scheduling challenges.

When experimenting with concepts and technologies to solve a certain business problem, also think about successful deployment in the organization. The project does not stop after the proof.

6. Secure resources for changes and updates.

Apache Hadoop and Apache Spark are still evolving rapidly and it is inevitable that the behavior of components will change over time and some may get deprecated shortly after initial release. Implementing new releases will be painful, and developers will need to have an overview of the big data infrastructure to ensure that as components change, their big data projects continue to perform as expected.

The developer team must plan time for updates and deprecated features, and a coordinated approach will be essential for keeping on top of the change.

7. Use infrastructure that’s ready for CPU and I/O intensive workloads.

My preferred definition of big data (and there are many – Forbes found 12) is this: "Big data is when you can no longer afford to bring the data to the processing, and you have to do the processing where the data is."

In traditional database and analytics applications, you get the data, load it onto your reporting server, process it, and post the results to the database.

With big data, you have terabytes of data, which might reside in different places—and which might not even be yours to move. Getting it to the processor is impractical. Big data technologies like Hadoop are based on the concept of data locality—doing the processing where the data resides.

You can run Hadoop in a virtualized environment. Virtual servers don’t have local data, though, so the time taken to transport data between the SAN or other storage device and the server hurts the application’s performance. Noisy neighbors, unpredictable server speeds and contested network connections can have a significant impact on performance in a virtualized environment. As a result, it’s difficult to offer service level agreements (SLAs) to end users, which makes it hard for them to depend on your big data implementations.

The answer is to use bare metal servers on demand, which enable you to predict and guarantee the level of performance your application can achieve, so you can offer an SLA with confidence. Clusters can be set up quickly, so you can accelerate your project really fast. Because performance is predictable and consistent, it’s possible to offer SLAs to business owners that will encourage them to invest in the big data project and rely on it for making business decisions.

How can I learn more?

Join me in the Twitter chat and webinar (details below) to discuss how you’re addressing big data or have your questions answered by me and my guests.  

Add our Twitter chat to your calendar. It happens Thursday, March 31 at 1 p.m. CET. Use the hashtag #SLdevchat to share your views or post your questions to me.

Register for the webinar on Wednesday, Apr 20, at 5 p.m. to 6 p.m. CET.


About the author

Frank Ketelaars has been Big Data Technical Leader in Europe for IBM since August 2013. As an architect, big data developer, and streaming analytics specialist, he has worked on multiple high volume projects in banking, telecommunications, manufacturing, life sciences and government. He is a specialist in Hadoop and real-time analytical processing.


March 4, 2016

Adventures with Bluemix

Keeping up with the rapid evolution of web programming is frighteningly difficult—especially when you have a day job. To ensure I don’t get left behind, I like to build a small project every year or so with a collection of the most buzzworthy technologies I can find. Nothing particularly impressive, of course, but just a collection of buttons that do things. This year I am trying to get a good grasp on “as a Service,” which seems to be everywhere these days. Hopefully this adventure will prove educational.

Why use services when I can do it myself?

The main idea behind “as a Service” is that somewhere out there in the cloud, someone has figured out how to do a particular task really well. This someone is willing to provide you access to that for a small service fee—thereby letting you, the developer, focus as much time as possible on your code and not so much time worrying about optimal configurations of things that you need to work efficiently.

SoftLayer is an Infrastructure as a Service (IaaS) provider, which is what will be the home for my little application—due in large part because I already have a ton of experience running servers myself.

I’m a big fan of Python, so I’m going to start programing with the Pyramids framework as the base for my new application. Like the “as a Service” offerings, programming frameworks and libraries exist to help the developer focus on their code and leverage the expertise of others for the auxiliary components.

To make everything pretty, I am going to use Bootstrap.js, which is apparently the de facto front-end library these days.

For everything else I want to use, there will be an attached Bluemix service. For the uninitiated, Bluemix is a pretty awesome collection of tools for developing and deploying code. At its core, Bluemix uses Cloud Foundry to provision cloud resources and deploy code. For now, I’m going to deploy my own code, but what I’m really interested in are the add-on services that I can just drop into my application and get going. The first service I want to try out is going to be Cloudant nosql, which is a managed couchDB instance with a few added features like a pretty neat dashboard.

Welcome to Bluemix

Combining Bluemix services with SoftLayer servers

One of the great things about services in Bluemix is that they can be provisioned in a standalone deployment—meaning Bluemix services can be used by any computer with an Internet connection and therefore, so can my SoftLayer servers. Since Bluemix services are deployed on SoftLayer hardware (in general, but there are some exceptions), the latency between SoftLayer servers and Bluemix services should be minimal, which is nice.

Creating a Cloudant service in Bluemix is as easy as hitting the Create button in the console. Creating a simple web application in Pyramid took a bit longer, but the quick tutorial helped me learn about all the cool things the Pyramid project can do. I also got to skip all the mess with SQLAlchemy, since I’m storing all the data in Cloudant. All that’s required is a sane ID system (I am using uuid) and some json. No need to get bogged down with a rigid table structure since Cloudant is a document store. If I want to change the data format, I just need to upload a new copy of the data, and a new revision of that document will be automatically created.

After cobbling together a basic application that can publish and edit content, all I had to do to make everything look like it was designed intentionally was to add a few bootstrap classes to my templates. And then I had a ready to use website!


Although making a web application is still as intensive as it’s always been, at least using technology in an “as a Service” fashion helps cut down on all the tertiary technologies you need to become an expert on to get anything to work. Even though the application I created here was pretty simple, I hope to expand it to include some of the more interesting Bluemix services to see what kind of Frankenstein application I can manage to produce. There are currently 100 Bluemix services, so I think the hardest part is going to be figuring out which one to use next.


February 3, 2016

Use TShark to see what traffic is passing through your gateway

Many of SoftLayer’s solutions make excellent use of the Brocade vRouter (Vyatta) dedicated security appliance. It’s a true network gateway, router, and firewall for your servers in a SoftLayer data center. It’s also an invaluable trouble-shooting tool should you have a connectivity issue or just want to take a gander at your network traffic. Built into vRouter’s command line and available to you, is a full-fledged terminal-based Wireshark command line implementation—TShark.

TShark is fully implemented in vRouter. If you’re already familiar with using TShark, you know you can call it from the terminal in either configuration or operational mode.  You accomplish this by prefacing a command with sudo; making the full command sudo tshark – flags.

tshark graphic

For those of us less versed in the intricacies of Wireshark and its command line cousin, here are a couple of useful examples to help you out.

One common flag I use in nearly every capture is –i (and as a side note, for those coming from a Microsoft Windows background, the flags are case sensitive). -i is a specific interface on which to capture traffic and immediately helps to cut down on the amount of information unrelated to the problem at hand. If you don’t set this flag, the capture will default to “the first non-loopback address;” or in the case of vRouter on SoftLayer, Bond0. Additionally, if you want to trace a packet and reply, you can set –i any to watch or capture traffic through all the interfaces on the device.

The second flag that I nearly always use to define a capture filter is –f, which defines a filter to match traffic against. The only traffic that matches this pattern will be captured. The filter uses the standard Wireshark syntax. Again, if you’re familiar with Wireshark, you can go nuts; but here are a few of the common filters I frequently use to help you get started:

  • host will match any traffic to or from the specified host. In this case, the venerable Google DNS servers. 
  • net works just like host, but for the entire network specified, in case you don’t know the exact host address you are looking for.
  • dst and src are useful if you want to drill down to a specific flow or want to look at just the incoming or outgoing traffic. These filters are usually paired with a host or net to match against.
  • port lets you specify a port to capture traffic, like host and net. Used by itself, port will match both source and destination port. In the case of well-known services, you can also define the port by the common name, i.e., dns.  

One final cool trick with the –f filter is the and and the negation not. They let you combine search terms and specifically exclude traffic in order to create a very finely tuned capture for your needs.

If you want to capture to a file to share with a team or to plug into more advanced analysis tools on another system, the –w flag is your friend. Without -w, the file will behave like a tcpdump and the output will appear in your terminal session. If you want to load the file into Wireshark or another packet analyzer tool you should make sure to add the –F flag to specify the file format. Here is an example:

Vyatta# sudo tshark –i Bond0 –w testcap.pcap –F pcap –f ‘src and not port 80’

The command will capture on Bond0 and output the capture to a .pcap file called testcap.pcap in the root directory of the file system. It will match only traffic on bond0 from that is not source or destination port 22. While that is a bit of a mouthful to explain, it does capture a very well defined stream! 

Here is one more example:

Vyatta#sudo tshark –I any –f ‘host and not ssh’

This command will capture traffic to the terminal that is to or from the specified IP ( that is not SSH. I frequently use this filter, or one a lot like it, when I am SSHed into a host and want to get a more general idea of what it is doing on the network. I don’t care about ssh because I know the cause of that traffic (me!), but I want to know anything else that’s going to or from the host.

This is all very much the tip of the iceberg; you can find a lot more information at the TShark main page. Hopefully these tips help out next time you want to see just what traffic is passing through your gateway.

- Jeff 


Subscribe to technology