Posts Tagged 'Infrastructure'

October 14, 2014

Enterprise Customers See Benefits of Direct Link with GRE Tunnels

We’ve had an overwhelming response to our Direct Link product launch over the past few months and with good reason. Customers can cross connect into the SoftLayer global private network with a direct link in any of our 22 points of presence (POPs) providing fast, secure, and unmetered access to their SoftLayer infrastructure from their remote data center locations.

Many of our enterprise customers who’ve set up a Direct Link want to balance the simplicity of a layer three cross connection with their sophisticated routing and access control list (ACL) requirements. To achieve this balance, many are using GRE tunnels from their on-premises routers to their SoftLayer Vyatta Gateway Appliance.

In previous blogs about Vyatta Gateway Appliance, we’ve described some typical use cases as well as highlighted the differences between the Vyatta OS and the Vyatta Appliance. So we’ll focus specifically on using GRE tunnels here.

What is GRE?
Generic Routing Encapsulation (GRE) is a protocol for packet encapsulation to facilitate routing other protocols over IP networks (RFC 2784). Customers typically create two endpoints for the tunnel; one on their remote router and the other on their Vyatta Gateway Appliance at SoftLayer.
How does GRE work?
GRE encapsulates a payload, an inner packet that needs to be delivered to a destination network, within an outer IP packet. Between two GRE endpoints all routers will look at the outer IP packet and forward it towards the endpoint where the inner packet is parsed and routed to the ultimate destination.
Why use GRE tunnels?
If a customer has multiple subnets at SoftLayer that need routing to, these would need multiple tunnels to each if they were not encapsulating with GRE. Since GRE encapsulates traffic within an outer packet, customers are able to route other protocols within the tunnel and route multiple subnets without multiple tunnels. A GRE endpoint on Vyatta will parse the packets and route them, eliminating that challenge.

Many of our enterprise customers have complex rules governing what servers and networks can communicate with each other. They typically build ACLs on their routers to enforce those rules. Having a GRE endpoint on a Vyatta Gateway Appliance allows customers to route and manage internal packets based on specific rules so that security models stay intact.

GRE tunnels can allow customers to keep their networking scheme; meaning customers can add IP addresses to their SoftLayer servers and directly access them eliminating any routing problems that could occur.

And, because GRE tunnels can run inside a VPN tunnel, customers can put the GRE inside of an IPSec tunnel to make it more secure.

Learn More on KnowledgeLayer

If you are considering Direct Link to achieve fast and unmetered access with the help of GRE tunnels and Vyatta Gateway Appliance but need more information, the SoftLayer KnowledgeLayer is continually updated with new information and best practices. Be sure to check out the entire section devoted to the Vyatta Gateway Appliance.

- Seth

Categories: 
September 9, 2014

Building a Secure Cloud-based Solution: Part I

When you begin a household project, you must first understand what you will need to complete the task. Before you begin, you check your basement or garage to make sure you have the tools to do the work. Building a secure cloud-based solution requires similar planning. You’re in luck—SoftLayer has all the tools needed, including a rapidly maturing set of security products and services to help you build, deploy, and manage your cloud solution. Over the next couple of months, we will take a look at how businesses leverage cloud technologies to deliver new value to their employees and customers, and we’ll discuss how SoftLayer provides the tools necessary to deliver your solutions securely.

Hurricane plan of action: Water: Check. Food: Check. Cloud: Check?

Let’s set the scene here: A hurricane is set to make landfall on the United States’ Gulf Coast, and the IT team at an insurance company must elastically scale its new claim application to accommodate the customers and field agents who will need it in the storm’s aftermath. The team needs to fulfill short-term computing needs and long-term hosting of additional images from the claims application, thereby creating a hybrid cloud environment. The insurance company’s IT staff meet to discuss their security requirements, and together, they identify several high-level needs:

  1. Provide secure connectivity, authentication, access control, and audit capabilities for IT administrators and users.

    SoftLayer provides VPNs, multifactor authentication, audit control logs, API keys, and fine-grained access control. This allows insurance agents to securely access claim forms and supporting documentation and connect to the application via https, using the wide range of SSL certificates (Symantec, Geotrust, and more). Plus, agents can authenticate using identity and access management solutions such as IWS Go Cloud ID and IBM Security Access Manager.
  2. Ensure that stringent data security measures are enforced.

    Data cannot be shifted across borders, and data at rest or in use must be encrypted. SoftLayer leaves data where customers place it, and will never transfer customers’ data. IBM Cloud Marketplace partners like Vormetric offer encryption solutions to ensure sensitive data-at-rest is not stored in clear text, and that customers maintain complete control of the encryption keys. Additionally, the IT team in our example would have the ability to encrypt all sensitive PHI data in database using data-in-use solutions from Eperi.
  3. Ensure multi-layered security for network zone segmentation.

    Users and administrators in the confidential area of insurance need confidence that their network is securely partitioned. SoftLayer native and vendor solutions such as SoftLayer VLANs, Vyatta Gateway, Fortigate firewall, and Citrix Netscaler allow administrators to securely partition a network, creating segmentation according to organizational needs, and providing the routing and filtering needed to isolate users, workloads, and domains.
  4. Enforce host security using anti-virus software, host intrusion prevention systems, and other solutions.

    The IT team can apply best-of-breed third-party solutions, such as Nessus Vulnerability Scanner, McAfee Antivirus, and McAfee Host Intrusion Protection. These capabilities give administrators the means to ensure that infrastructure is protected from malware and other host attacks, enhancing both system availability and performance.
  5. Define and enforce security policies for the hybrid cloud environment, and audit any policy changes.

    Administrators can manage overall policies for the combined public-private environment using IBM solutions like QRadar, Hosted Security Event and Log Management Service, and xForce Threat Analysis Service. Admins can use solutions from vendors like CloudPassage, Sumo Logic, and ObserveIT to automatically define policies around firewall rules, file integrity, security configuration, and access control, and to audit adherence to such policies.

The insurance company’s IT department already knew from SoftLayer’s reputation that it is one of the highest performing cloud infrastructures available, with a wide range of integrated and automated cloud computing options, all through a private network and advanced management system, but now it knows from experience that SoftLayer offers the security solutions needed to get the job done.

When business needs spike and companies need additional capacity, SoftLayer delivers quickly and securely. Stay tuned for Part 2 where we will talk secure development and test activities.

- Rick Hamilton, IBM Cloud Offering Evangelist

February 6, 2014

Building a Bridge to the OpenStack API

OpenStack is experiencing explosive growth in the cloud market. With more than 200 companies contributing code to the source and new installations coming online every day, OpenStack is pushing hard to become a global standard for cloud computing. Dozens of useful tools and software products have been developed using the OpenStack API, so a growing community of administrators, developers and IT organizations have access to easy-to-use, powerful cloud resources. This kind of OpenStack integration is great for users on a full OpenStack cloud, but it introduces a challenge to providers and users on other cloud platforms: Should we consider deploying or moving to an OpenStack environment to take advantage of these tools?

If a cloud provider spends years developing a unique platform with a proprietary API, implementing native support for the OpenStack API or deploying a full OpenStack solution may be cost prohibitive, even with significant customer and market demand. The provider can either bite the bullet to implement OpenStack compatibility, hope that a third party library like libclouds or fog is updated to support its API, or choose to go it alone and develop an ecosystem of products around its own API.

Introducing Jumpgate

When we were faced with this situation at SoftLayer, we chose a fourth option. We wanted to make the process of creating an OpenStack-compatible API simpler and more modular. That's where Jumpgate was born. Jumpgate is a middleware that acts as a compatibility layer between the OpenStack API and a provider's proprietary API. Externally, it exposes endpoints that adhere to OpenStack's published and accepted API specification, which it then translates into the provider's API using a series of drivers. Think of it as a mechanism to enable passing from one realm/space into another — like the jumpgates featured in science fiction works.

Connection

How Jumpgate Works
Let's take a look at a high-level example: When you want to create a new virtual instance on OpenStack, you might use the Horizon dashboard or the Nova command line client. When you issue the request, the tool first makes a REST call to a Keystone endpoint for authentication, which returns an authorization token. The client then makes another REST call to a Nova endpoint, which manages the computing instances, to create the actual virtual instance. Nova may then make calls to other tools within the cluster for networking (Quantum), image information (Glance), block storage (Cinder), or more. In addition, your client may also send requests directly to some of these endpoints to query for status updates, information about available resources, and so on.

With Jumpgate, your tool first hits the Jumpgate middleware, which exposes a Keystone endpoint. Jumpgate takes the request, breaks it apart into its relevant pieces, then loads up your provider's appropriate API driver. Next, Jumpgate reformats your request into a form that the driver supports and sends it to the provider's API endpoint. Once the response comes back, Jumpgate again uses the driver to break apart the proprietary API response, reformats it into an OpenStack compatible JSON payload, and sends it back to your client. The result is that you interact with an OpenStack-compatible API, and your cloud provider processes those interactions on their own backend infrastructure.

Internally, Jumpgate is a lightweight middleware built in Python using the Falcon Framework. It provides endpoints for nearly every documented OpenStack API call and allows drivers to attach handlers to these endpoints. This modular approach allows providers to implement only the endpoints that are of the highest importance, rolling out OpenStack API compatibility in stages rather than in one monumental effort. Since it sits alongside the provider's existing API, Jumpgate provides a new API interface without risking the stability already provided by the existing API. It's a value-add service that increases customer satisfaction without a huge increase in cost. Once full implementations is finished, a provider with a proprietary cloud platform can benefit from and offer all the tools that are developed to work with the OpenStack API.

Jumpgate allows providers to test the proper OpenStack compatibility of their drivers by leveraging the OpenStack Tempest test suite. With these tests, developers run the full suite of calls used by OpenStack itself, highlighting edge cases or gaps in functionality. We've even included a helper script that allows Tempest to only run a subset of tests rather than the entire suite to assist with a staged rollout.

Current Development
Jumpgate is currently in an early alpha stage. We've built the compatibility framework itself and started on the SoftLayer drivers as a reference. So far, we've implemented key endpoints within Nova (computing instances), Keystone (identification and authorization), and Glance (image management) to get most of the basic functionality within Horizon (the web dashboard) working. We've heard that several groups outside SoftLayer are successfully using Jumpgate to drive products like Trove and Heat directly on SoftLayer, which is exciting and shows that we're well beyond the "proof of concept" stage. That being said, there's still a lot of work to be done.

We chose to develop Jumpgate in the open with a tool set that would be familiar to developers working with OpenStack. We're excited to debut this project for the broader OpenStack community, and we're accepting pull requests if you're interested in contributing. Making more clouds compatible with the OpenStack API is important and shouldn’t be an individual undertaking. If you're interested in learning more or contributing, head over to our in-flight project page on GitHub: SoftLayer Jumpgate. There, you'll find everything you need to get started along with the updates to our repository. We encourage everyone to contribute code or drivers ... or even just open issues with feature requests. The more community involvement we get, the better.

-Nathan

Categories: 
July 29, 2013

A Brief History of Cloud Computing

Believe it or not, "cloud computing" concepts date back to the 1950s when large-scale mainframes were made available to schools and corporations. The mainframe's colossal hardware infrastructure was installed in what could literally be called a "server room" (since the room would generally only be able to hold a single mainframe), and multiple users were able to access the mainframe via "dumb terminals" – stations whose sole function was to facilitate access to the mainframes. Due to the cost of buying and maintaining mainframes, an organization wouldn't be able to afford a mainframe for each user, so it became practice to allow multiple users to share access to the same data storage layer and CPU power from any station. By enabling shared mainframe access, an organization would get a better return on its investment in this sophisticated piece of technology.

Mainframe Computer

A couple decades later in the 1970s, IBM released an operating system called VM that allowed admins on their System/370 mainframe systems to have multiple virtual systems, or "Virtual Machines" (VMs) on a single physical node. The VM operating system took the 1950s application of shared access of a mainframe to the next level by allowing multiple distinct compute environments to live in the same physical environment. Most of the basic functions of any virtualization software that you see nowadays can be traced back to this early VM OS: Every VM could run custom operating systems or guest operating systems that had their "own" memory, CPU, and hard drives along with CD-ROMs, keyboards and networking, despite the fact that all of those resources would be shared. "Virtualization" became a technology driver, and it became a huge catalyst for some of the biggest evolutions in communications and computing.

Mainframe Computer

In the 1990s, telecommunications companies that had historically only offered single dedicated point–to-point data connections started offering virtualized private network connections with the same service quality as their dedicated services at a reduced cost. Rather than building out physical infrastructure to allow for more users to have their own connections, telco companies were able to provide users with shared access to the same physical infrastructure. This change allowed the telcos to shift traffic as necessary to allow for better network balance and more control over bandwidth usage. Meanwhile, virtualization for PC-based systems started in earnest, and as the Internet became more accessible, the next logical step was to take virtualization online.

If you were in the market to buy servers ten or twenty years ago, you know that the costs of physical hardware, while not at the same level as the mainframes of the 1950s, were pretty outrageous. As more and more people expressed demand to get online, the costs had to come out of the stratosphere, and one of the ways that was made possible was by ... you guessed it ... virtualization. Servers were virtualized into shared hosting environments, Virtual Private Servers, and Virtual Dedicated Servers using the same types of functionality provided by the VM OS in the 1950s. As an example of what that looked like in practice, let's say your company required 13 physical systems to run your sites and applications. With virtualization, you can take those 13 distinct systems and split them up between two physical nodes. Obviously, this kind of environment saves on infrastructure costs and minimizes the amount of actual hardware you would need to meet your company's needs.

Virtualization

As the costs of server hardware slowly came down, more users were able to purchase their own dedicated servers, and they started running into a different kind of problem: One server isn't enough to provide the resources I need. The market shifted from a belief that "these servers are expensive, let's split them up" to "these servers are cheap, let's figure out how to combine them." Because of that shift, the most basic understanding of "cloud computing" was born online. By installing and configuring a piece of software called a hypervisor across multiple physical nodes, a system would present all of the environment's resources as though those resources were in a single physical node. To help visualize that environment, technologists used terms like "utility computing" and "cloud computing" since the sum of the parts seemed to become a nebulous blob of computing resources that you could then segment out as needed (like telcos did in the 90s). In these cloud computing environments, it became easy add resources to the "cloud": Just add another server to the rack and configure it to become part of the bigger system.

Clouds

As technologies and hypervisors got better at reliably sharing and delivering resources, many enterprising companies decided to start carving up the bigger environment to make the cloud's benefits to users who don't happen to have an abundance of physical servers available to create their own cloud computing infrastructure. Those users could order "cloud computing instances" (also known as "cloud servers") by ordering the resources they need from the larger pool of available cloud resources, and because the servers are already online, the process of "powering up" a new instance or server is almost instantaneous. Because little overhead is involved for the owner of the cloud computing environment when a new instance is ordered or cancelled (since it's all handled by the cloud's software), management of the environment is much easier. Most companies today operate with this idea of "the cloud" as the current definition, but SoftLayer isn't "most companies."

SoftLayer took the idea of a cloud computing environment and pulled it back one more step: Instead of installing software on a cluster of machines to allow for users to grab pieces, we built a platform that could automate all of the manual aspects of bringing a server online without a hypervisor on the server. We call this platform "IMS." What hypervisors and virtualization do for a group of servers, IMS does for an entire data center. As a result, you can order a bare metal server with all of the resources you need and without any unnecessary software installed, and that server will be delivered to you in a matter of hours. Without a hypervisor layer between your operating system and the bare metal hardware, your servers perform better. Because we automate almost everything in our data centers, you're able to spin up load balancers and firewalls and storage devices on demand and turn them off when you're done with them. Other providers have cloud-enabled servers. We have cloud-enabled data centers.

SoftLayer Pod

IBM and SoftLayer are leading the drive toward wider adoption of innovative cloud services, and we have ambitious goals for the future. If you think we've come a long way from the mainframes of the 1950s, you ain't seen nothin' yet.

-James

Categories: 
July 9, 2013

When to Consider Riak for Your Big Data Architecture

In my Breaking Down 'Big Data' – Database Models, I briefly covered the most common database models, their strengths, and how they handle the CAP theorem — how a distributed storage system balances demands of consistency and availability while maintaining partition tolerance. Here's what I said about Dynamo-inspired databases:

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

This type of key/value store architecture is very unique from the document-oriented MongoDB solutions we launched at the end of last year, so we worked with Basho to prioritize development of high-performance Riak solutions on our global platform. Since you already know about MongoDB, let's take a few minutes to meet the new kid on the block.

Riak is a distributed database architected for availability, fault tolerance, operational simplicity and scalability. Riak is masterless, so each node in a Riak cluster is the same and contains a complete, independent copy of the Riak package. This design makes the Riak environment highly fault tolerant and scalable, and it also aids in replication — if a node goes down, you can still read, write and update data.

As you approach the daunting prospect of choosing a big data architecture, there are a few simple questions you need to answer:

  1. How much data do/will I have?
  2. In what format am I storing my data?
  3. How important is my data?

Riak may be the choice for you if [1] you're working with more than three terabytes of data, [2] your data is stored in multiple data formats, and [3] your data must always be available. What does that kind of need look like in real life, though? Luckily, we've had a number of customers kick Riak's tires on SoftLayer bare metal servers, so I can share a few of the use cases we've seen that have benefited significantly from Riak's unique architecture.

Use Case 1 – Digital Media
An advertising company that serves over 10 billion ads per month must be able to quickly deliver its content to millions of end users around the world. Meeting that demand with relational databases would require a complex configuration of expensive, vertically scaled hardware, but it can be scaled out horizontally much easier with Riak. In a matter of only a few hours, the company is up and running with an ad-serving infrastructure that includes a back-end Riak cluster in Dallas with a replication cluster in Singapore along with an application tier on the front end with Web servers, load balancers and CDN.

Use Case 2 – E-commerce
An e-commerce company needs 100-percent availability. If any part of a customer's experience fails, whether it be on the website or in the shopping cart, sales are lost. Riak's fault tolerance is a big draw for this kind of use case: Even if one node or component fails, the company's data is still accessible, and the customer's user experience is uninterrupted. The shopping cart structure is critical, and Riak is built to be available ... It's a perfect match.

As an additional safeguard, the company can take advantage of simple multi-datacenter replication in their Riak Enterprise environment to geographically disperse content closer to its customers (while also serving as an important tool for disaster recovery and backup).

Use Case 3 – Gaming
With customers like Broken Bulb and Peak Games, SoftLayer is no stranger to the gaming industry, so it should come as no surprise that we've seen interesting use cases for Riak from some of our gaming customers. When a game developer incorporated Riak into a new game to store player data like user profiles, statistics and rankings, the performance of the bare metal infrastructure blew him away. As a result, the game's infrastructure was redesigned to also pull gaming content like images, videos and sounds from the Riak database cluster. Since the environment is so easy to scale horizontally, the process on the infrastructure side took no time at all, and the multimedia content in the game is getting served as quickly as the player data.

Databases are common bottlenecks for many applications, but they don't have to be. Making the transition from scaling vertically (upgrading hardware, adding RAM, etc.) to scaling horizontally (spreading the work intelligently across multiple nodes) alleviates many of the pain points for a quickly growing database environment. Have you made that transition? If not, what's holding you back? Have you considered implementing Riak?

-@marcalanjones

February 8, 2013

Data Center Power-Up: Installing a 2-Megawatt Generator

When I was a kid, my living room often served as a "job site" where I managed a fleet of construction vehicles. Scaled-down versions of cranes, dump trucks, bulldozers and tractor-trailers littered the floor, and I oversaw the construction (and subsequent destruction) of some pretty monumental projects. Fast-forward a few years (or decades), and not much has changed except that the "heavy machinery" has gotten a lot heavier, and I'm a lot less inclined to "destruct." As SoftLayer's vice president of facilities, part of my job is to coordinate the early logistics of our data center expansions, and as it turns out, that responsibility often involves overseeing some of the big rigs that my parents tripped over in my youth.

The video below documents the installation of a new Cummins two-megawatt diesel generator for a pod in our DAL05 data center. You see the crane prepare for the work by installing counter-balance weights, and work starts with the team placing a utility transformer on its pad outside our generator yard. A truck pulls up with the generator base in tow, and you watch the base get positioned and lowered into place. The base looks so large because it also serves as the generator's 4,000 gallon "belly" fuel tank. After the base is installed, the generator is trucked in, and it is delicately picked up, moved, lined up and lowered onto its base. The last step you see is the generator housing being installed over the generator to protect it from the elements. At this point, the actual "installation" is far from over — we need to hook everything up and test it — but those steps don't involve the nostalgia-inducing heavy machinery you probably came to this post to see:

When we talk about the "megawatt" capacity of a generator, we're talking about the bandwidth of power available for use when the generator is operating at full capacity. One megawatt is one million watts, so a two-megawatts generator could power 20,000 100-watt light bulbs at the same time. This power can be sustained for as long as the generator has fuel, and we have service level agreements to keep us at the front of the line to get more fuel when we need it. Here are a few other interesting use-cases that could be powered by a two-megawatt generator:

  • 1,000 Average Homes During Mild Weather
  • 400 Homes During Extreme Weather
  • 20 Fast Food Restaurants
  • 3 Large Retail Stores
  • 2.5 Grocery Stores
  • A SoftLayer Data Center Pod Full of Servers (Most Important Example!)

Every SoftLayer facility has an n+1 power architecture. If we need three generators to provide power for three data center pods in one location, we'll install four. This additional capacity allows us to balance the load on generators when they're in use, and we can take individual generators offline for maintenance without jeopardizing our ability to support the power load for all of the facility's data center pods.

Those of you who are in the fondly remember Tonka trucks and CAT crane toys are the true target audience for this post, but even if you weren't big into construction toys when you were growing up, you'll probably still appreciate the work we put into safeguarding our facilities from a power perspective. You don't often see the "outside the data center" work that goes into putting a new SoftLayer data center pod online, so I thought it'd give you a glimpse. Are there an topics from an operations or facilities perspectives that you also want to see?

-Robert

December 30, 2012

Risk Management: Event Logging to Protect Your Systems

The calls start rolling in at 2am on Sunday morning. Alerts start firing off. Your livelihood is in grave danger. It doesn't come with the fanfare of a blockbuster Hollywood thriller, but if a server hosting your critical business infrastructure is attacked, becomes compromised or fails, it might feel like the end of the world. In our Risk Management series, and we've covered the basics of securing your servers, so the next consideration we need to make is for when our security is circumvented.

It seems silly to prepare for a failure in a security plan we spend time and effort creating, but if we stick our heads in the sand and tell ourselves that we're secure, we won't be prepared in the unlikely event of something happening. Every attempt to mitigate risks and stop threats in their tracks will be circumvented by the one failure, threat or disaster you didn't cover in your risk management plan. When that happens, accurate event logging will help you record what happened, respond to the event (if it's still in progress) and have the information available to properly safeguard against or prevent similar threats in the future.

Like any other facet of security, "event logging" can seem overwhelming and unforgiving if you're looking at hundreds of types of events to log, each with dozens of variations and options. Like we did when we looked at securing servers, let's focus our attention on a few key areas and build out what we need:

Which events should you log?
Look at your risk assessment and determine which systems are of the highest value or could cause the most trouble if interrupted. Those systems are likely to be what you prioritized when securing your servers, and they should also take precedence when it comes to event logging. You probably don't have unlimited compute and storage resources, so you have to determine which types of events are most valuable for you and how long you should keep records of them — it's critical to have your event logs on-hand when you need them, so logs should be retained online for a period of time and then backed up offline to be available for another period of time.

Your goal is to understand what's happening on your servers and why it's happening so you know how to respond. The most common audit-able events include successful and unsuccessful account log-on events, account management events, object access, policy change, privilege functions, process tracking and system events. The most conservative approach actually involves logging more information/events and keeping those logs for longer than you think you need. From there, you can evaluate your logs periodically to determine if the level of auditing/logging needs to be adjusted.

Where do you store the event logs?
Your event logs won't do you any good if they are stored in a space that is insufficient for the amount of data you need to collect. I recommend centralizing your logs in a secure environment that is both readily available and scalable. In addition to the logs being accessible when the server(s) they are logging are inaccessible, aggregating and organize your logs in a central location can be a powerful tool to build reports and analyze trends. With that information, you'll be able to more clearly see deviations from normal activity to catch attacks (or attempted attacks) in progress.

How do you protect your event logs?
Attacks can come from both inside and out. To avoid intentional malicious activity by insiders, separation of duties should be enforced when planning logging. Learn from The X Files and "Trust no one." Someone who has been granted the 'keys to your castle' shouldn't also be able to disable the castle's security system or mess with the castle's logs. Your network engineer shouldn't have exclusive access to your router logs, and your sysadmin shouldn't be the only one looking at your web server logs.

Keep consistent time.
Make sure all of your servers are using the same accurate time source. That way, all logs generated from those servers will share consistent time-stamps. Trying to diagnose an attack or incident is exceptionally more difficult if your web server's clock isn't synced with your database server's clock or if they're set to different time zones. You're putting a lot of time and effort into logging events, so you're shooting yourself in the foot if events across all of your servers don't line up cleanly.

Read your logs!
Logs won't do you any good if you're not looking at them. Know the red flags to look for in each of your logs, and set aside time to look for those flags regularly. Several SoftLayer customers — like Tech Partner Papertrail — have come up with innovative and effective log management platforms that streamline the process of aggregating, searching and analyzing log files.

It's important to reiterate that logging — like any other security endeavor — is not a 'one size fits all' model, but that shouldn't discourage you from getting started. If you aren't logging or you aren't actively monitoring your logs, any step you take is a step forward, and each step is worth the effort.

Thanks for reading, and stay secure, my friends!

-Matthew

December 4, 2012

Big Data at SoftLayer: MongoDB

In one day, Facebook's databases ingest more than 500 terabytes of data, Twitter processes 500 million Tweets and Tumblr users publish more than 75 million posts. With such an unprecedented volume of information, developers face significant challenges when it comes to building an application's architecture and choosing its infrastructure. As a result, demand has exploded for "big data" solutions — resources that make it possible to process, store, analyze, search and deliver data from large, complex data sets. In light of that demand, SoftLayer has been working in strategic partnership with 10gen — the creators of MongoDB — to develop a high-performance, on-demand, big data solution. Today, we're excited to announce the launch of specialized MongoDB servers at SoftLayer.

If you've configured an infrastructure to accommodate big data, you know how much of a pain it can be: You choose your hardware, you configure it to run NoSQL, you install an open source NoSQL project that you think will meet your needs, and you keep tweaking your environment to optimize its performance. Assuming you have the resources (and patience) to get everything running efficiently, you'll wind up with the horizontally scalable database infrastructure you need to handle the volume of content you and your users create and consume. SoftLayer and 10gen are making that process a whole lot easier.

Our new MongoDB solutions take the time and guesswork out of configuring a big data environment. We give you an easy-to-use system for designing and ordering everything you need. You can start with a single server or roll out multiple servers in a single replica set across multiple data centers, and in under two hours, an optimized MongoDB environment is provisioned and ready to be used. I stress that it's an "optimized" environment because that's been our key focus. We collaborated with 10gen engineers on hardware and software configurations that provide the most robust performance for MongoDB, and we incorporated many of their MongoDB best practices. The resulting "engineered servers" are big data powerhouses:

MongoDB Configs

From each engineered server base configuration, you can customize your MongoDB server to meet your application's needs, and as you choose your upgrades from the base configuration, you'll see the thresholds at which you should consider upgrading other components. As your data set's size and the number of indexes in your database increase, you'll need additional RAM, CPU, and storage resources, but you won't need them in the same proportions — certain components become bottlenecks before others. Sure, you could upgrade all of the components in a given database server at the same rate, but if, say, you update everything when you only need to upgrade RAM, you'd be adding (and paying for) unnecessary CPU and storage capacity.

Using our new Solution Designer, it's very easy to graphically design a complex multi-site replica set. Once you finalize your locations and server configurations, you'll click "Order," and our automated provisioning system will kick into high gear. It deploys your server hardware, installs CentOS (with OS optimizations to provide MongoDB performance enhancements), installs MongoDB, installs MMS (MongoDB Monitoring Service) and configures the network connection on each server to cluster it with the other servers in your environment. A process that may have taken days of work and months of tweaking is completed in less than four hours. And because everything is standardized and automated, you run much less risk of human error.

MongoDB Configs

One of the other massive benefits of working so closely with 10gen is that we've been able to integrate 10gen's MongoDB Cloud Subscriptions into our offering. Customers who opt for a MongoDB Cloud Subscription get additional MongoDB features (like SSL and SNMP support) and support direct from the MongoDB authority. As an added bonus, since the 10gen team has an intimate understanding of the SoftLayer environment, they'll be able to provide even better support to SoftLayer customers!

You shouldn't have to sacrifice agility for performance, and you shouldn't have to sacrifice performance for agility. Most of the "big data" offerings in the market today are built on virtual servers that can be provisioned quickly but offer meager performance levels relative to running the same database on bare metal infrastructure. To get the performance benefits of dedicated hardware, many users have chosen to build, roll out and tweak their own configurations. With our MongoDB offering, you get the on-demand availability and flexibility of a cloud infrastructure with the raw power and full control of dedicated hardware.

If you've been toying with the idea of rolling out your own big data infrastructure, life just got a lot better for you.

-Duke

November 14, 2012

Risk Management: Securing Your Servers

How do you secure your home when you leave? If you're like most people, you make sure to lock the door you leave from, and you head off to your destination. If Phil is right about "locks keeping honest people honest," simply locking your front door may not be enough. When my family moved into a new house recently, we evaluated its physical security and tried to determine possible avenues of attack (garage, doors, windows, etc.), tools that could be used (a stolen key, a brick, a crowbar, etc.) and ways to mitigate the risk of each kind of attack ... We were effectively creating a risk management plan.

Every risk has different probabilities of occurrence, potential damages, and prevention costs, and the risk management process helps us balance the costs and benefits of various security methods. When it comes to securing a home, the most effective protection comes by using layers of different methods ... To prevent a home invasion, you might lock your door, train your dog to make intruders into chew toys and have an alarm system installed. Even if an attacker can get a key to the house and bring some leftover steaks to appease the dog, the motion detectors for the alarm are going to have the police on their way quickly. (Or you could violate every HOA regulation known to man by digging a moat around the house, filling with sharks with laser beams attached to their heads, and building a medieval drawbridge over the moat.)

I use the example of securing a house because it's usually a little more accessible than talking about "server security." Server security doesn't have to be overly complex or difficult to implement, but its stigma of complexity usually prevents systems administrators from incorporating even the simplest of security measures. Let's take a look at the easiest steps to begin securing your servers in the context of their home security parallels, and you'll see what I'm talking about.

Keep "Bad People" Out: Have secure password requirements.

Passwords are your keys and your locks — the controls you put into place that ensure that only the people who should have access get it. There's no "catch all" method of keeping the bad people out of your systems, but employing a variety of authentication and identification measures can greatly enhance the security of your systems. A first line of defense for server security would be to set password complexity and minimum/maximum password age requirements.

If you want to add an additional layer of security at the authentication level, you can incorporate "Strong" or "Two-Factor" authentication. From there, you can learn about a dizzying array of authentication protocols (like TACACS+ and RADIUS) to centralize access control or you can use active directory groups to simplify the process of granting and/or restricting access to your systems. Each layer of authentication security has benefits and drawbacks, and most often, you'll want to weigh the security risk against your need for ease-of-use and availability as you plan your implementation.

Stay Current on your "Good People": When authorized users leave, make sure their access to your system leaves with them.

If your neighbor doesn't return borrowed tools to your tool shed after you gave him a key when he was finishing his renovation, you need to take his key back when you tell him he can't borrow any more. If you don't, nothing is stopping him from walking over to the shed when you're not looking and taking more (all?) of your tools. I know it seems like a silly example, but that kind of thing is a big oversight when it comes to server security.

Employees are granted access to perform their duties (the principle of least privilege), and when they no longer require access, the "keys to the castle" should be revoked. Auditing who has access to what (whether it be for your systems or for your applications) should be continual.

You might have processes in place to grant and remove access, but it's also important to audit those privileges regularly to catch any breakdowns or oversights. The last thing you want is to have a disgruntled former employee wreak all sorts of havoc on your key systems, sell proprietary information or otherwise cost you revenue, fines, recovery efforts or lost reputation.

Catch Attackers: Monitor your systems closely and set up alerts if an intrusion is detected.

There is always a chance that bad people are going to keep looking for a way to get into your house. Maybe they'll walk around the house to try and open the doors and windows you don't use very often. Maybe they'll ring the doorbell and if no lights turn on, they'll break a window and get in that way.

You can never completely eliminate all risk. Security is a continual process, and eventually some determined, over-caffeinated hacker is going to find a way in. Thinking your security is impenetrable makes you vulnerable if by some stretch of the imagination, an attacker breaches your security (see: Trojan Horse). Continuous monitoring strategies can alert administrators if someone does things they shouldn't be doing. Think of it as a motion detector in your house ... "If someone gets in, I want to know where they are." When you implement monitoring, logging and alerting, you will also be able to recover more quickly from security breaches because every file accessed will be documented.

Minimize the Damage: Lock down your system if it is breached.

A burglar smashes through your living room window, runs directly to your DVD collection, and takes your limited edition "Saved by the Bell" series box set. What can you do to prevent them from running back into the house to get the autographed posted of Alf off of your wall?

When you're monitoring your servers and you get alerted to malicious activity, you're already late to the game ... The damage has already started, and you need to minimize it. In a home security environment, that might involve an ear-piercing alarm or filling the moat around your house even higher so the sharks get a better angle to aim their laser beams. File integrity monitors and IDS software can mitigate damage in a security breach by reverting files when checksums don't match or stopping malicious behavior in its tracks.

These recommendations are only a few of the first-line layers of defense when it comes to server security. Even if you're only able to incorporate one or two of these tips into your environment, you should. When you look at server security in terms of a journey rather than a destination, you can celebrate the progress you make and look forward to the next steps down the road.

Now if you'll excuse me, I have to go to a meeting where I'm proposing moats, drawbridges, and sharks with laser beams on their heads to SamF for data center security ... Wish me luck!

-Matthew

October 16, 2012

An Introduction to Risk Management

Whether you're managing a SaaS solution for thousands of large clients around the world or you're running a small mail server for a few mom-and-pop businesses in your neighborhood, you're providing IT service for a fee — and your customers expect you to deliver. It's easy to get caught up in focusing your attention and energy on day-to-day operations, and in doing so, you might neglect some of the looming risks that threaten the continuity of your business. You need to prioritize risk assessment and management.

Just reading that you need to invest in "Risk Management" probably makes you shudder. Admittedly, when a business owner has to start quantifying and qualifying potential areas of business risk, the process can seem daunting and full of questions ... "What kinds of risks should I be concerned with?" "Once I find a potential risk, should I mitigate it? Avoid it? Accept it?" "How much do I need to spend on risk management?"

When it comes to risk management in hosting, the biggest topics are information security, backups and disaster recovery. While those general topics are common, each business's needs will differ greatly in each area. Because risk management isn't a very "cookie-cutter" process, it's intimidating. It's important to understand that protecting your business from risks isn't a destination ... it's a journey, and whatever you do, you'll be better off than you were before you did it.

Because there's not a "100% Complete" moment in the process of risk management, some people think it's futile — a gross waste of time and resources. History would suggest that risk management can save companies millions of dollars, and that's just when you look at failures. You don't see headlines when businesses effectively protect themselves from attempted hacks or when sites automatically fail over to a new server after a hardware failure.

It's unfortunate how often confidential customer data is unintentionally released by employees or breached by malicious attackers. Especially because those instances are often so easily preventable. When you understand the potential risks of your business's confidential data in the hands of the wrong people (whether malicious attackers or careless employees), you'll usually take action to avoid quantifiable losses like monetary fines and unquantifiable ones like the loss of your reputation.

More and more, regulations are being put in place to holding companies accountable for protecting their sensitive information. In the healthcare industry businesses have to meet the strict Health Insurance Portability and Accountability Act (HIPAA) regulations. Sites that accept credit card payments online are required to operate in Payment Card Industry (PCI) Compliance. Data centers will spend hours (and hours and hours) achieving and maintaining their SSAE 16 certification. These rules and requirements are not arbitrarily designed to be restrictive (though they can feel that way sometimes) ... They are based on best practices to ultimately protect businesses in those industries from risks that are common throughout the respective industry.

Over the coming months, I'll discuss ways that you as a SoftLayer customer can mitigate and manage your risk. We'll talk about security and backup plans that will incrementally protect your business and your customers. While we won't get to the destination of 100% risk-mitigated operations, we'll get you walking down the path of continuous risk assessment, identification and mitigation.

Stay tuned!

-Matthew

Subscribe to infrastructure