Development Posts

August 29, 2013

HTML5 Tips and Tricks - Local Storage

As I'm sure you've heard by now: HTML5 is all the rage. People are creating amazing games with canvases, media interactivity with embeds and mobile/response sites with viewports. We've come a long way since 1990s iFrames! In this blog, I wanted to introduce you to an HTML5 tool that you might find useful: Local Web Storage — quite possibly the holy grail of web development!

In the past (and still most of the present), web sites store information about a surfer's preferences/choices via cookies. With that information, a site can be customized for a specific user, and that customization makes for a much better user experience. For example, you might select your preferred language when you first visit a website, and when you return, you won't have to make that selection again. You see similar functionality at work when you select themes/colors on a site or when you enlist help from one of those "remember me" checkboxes next to where you log into an account. The functionality that cookies enable is extremely valuable, but it's often inefficient.

You might be aware of some of the drawbacks to using cookies (such as size limitation (4KB) and privacy issues with unencrypted cookies), but I believe the most significant problem with cookies is overhead. Even if you limit your site to just a few small cookies per user, as your userbase grows into the thousands and tens of thousands, you'll notice that you're transferring a LOT data of over HTTP (and those bandwidth bills might start adding up). Cookies are stored on the user's computer, so every time that user visits your domain, the browser is transferring cookies to your server with every HTTP request. The file size for each of these transactions is tiny, but at scale, it can feel like death by a thousand cuts.

Enter HTML5 and local storage.

Rather than having to transmit data (cookies) to a remote web server, HTML5 allows a site to store information within the client web browser. The information you need to customize your user's experience doesn't have to travel from the user's hard drive to your server because the customization is stored in (and applied by) the user's browser. And because data in local storage isn't sent with every HTTP request like it is with cookies, the capacity of local storage is a whopping 5MB per domain (though I wouldn't recommend pushing that limit).

Let's check out how easy it is to use HTML5's local storage with JavaScript:

<script type="text/javascript">
    localStorage.setItem('preferredLanguage', 'EN');
</script>

Boom! We just set our first variable. Once that variable has been set in local storage for a given user, that user can close his or her browser and return to see the correct variable still selected when we retrieve it on our site:

<script type="text/javascript">
    localStorage.getItem('preferredLanguage');
</script>

All of the lead-up in this post, you're probably surprised by the simplicity of the actual coding, but that's one of the biggest reasons HTML local storage is such an amazing tool to use. We set our user's preferred language in local storage and retrieved it from local storage with a few simple lines. If want to set an "expiration" for a given variable in local storage the way you would for a cookie, you can script in an expiration variable that removes an entry when you say the time's up:

<script type="text/javascript">
    localStorage.removeItem('preferredLanguage');
</script>

If we stopped here, you'd have a solid fundamental understanding of how HTML5 local storage works, but I want to take you beyond the standard functionality of local storage. You see, local storage is intended primarily to store only strings, so if we wanted to store an object, we'd be out of luck ... until we realized that developers can find workarounds for everything!

Using some handy JSON, we can stringify and parse any object we want to store as local storage:

<script type="text/javascript">
    var user = {};
    user.name = 'CWolff';
    user.job = 'Software Engineer II';
    user.rating = 'Awesome';
 
    //If we were to stop here, the entry would only read as [object Object] when we try to retrieve it, so we stringify with JSON!
    localStorage.setItem('user', JSON.stringify(user));
 
    //Retrieve the object and assign it to a variable
    var getVar = JSON.parse(localStorage.getItem('user'));
 
    //We now have our object in a variable that we can play with, let's try it out
    alert(getVar.name);
    alert(getVar.job);
    alert(getVar.rating);
</script>

If you guys have read any of my other blogs, you know that I tend to write several blogs in a series before I move on to the next big topic, and this won't be an exception. Local storage is just the tip of the iceberg of what HTML5 can do, so buckle up and get ready to learn more about the crazy features and functionality of this next-generation language.

Try local storage for yourself ... And save yourself from the major headache of trying to figure out where all of your bandwidth is going!

-Cassandra

July 16, 2013

Riak Performance Analysis: Bare Metal v. Virtual

In December, I posted a MongoDB performance analysis that showed the quantitative benefits of using bare metal servers for MongoDB workloads. It should come as no surprise that in the wake of SoftLayer's Riak launch, we've got some similar data to share about running Riak on bare metal.

To run this test, we started by creating five-node clusters with Riak 1.3.1 on SoftLayer bare metal servers and on a popular competitor's public cloud instances. For the SoftLayer environment, we created these clusters using the Riak Solution Designer, so the nodes were all provisioned, configured and clustered for us automatically when we ordered them. For the public cloud virtual instance Riak cluster, each node was provisioned indvidually using a Riak image template and manually configured into a cluster after all had come online. To optimize for Riak performance, I made a few tweaks at the OS level of our servers (running CentOS 64-bit):

Noatime
Nodiratime
barrier=0
data=writeback
ulimit -n 65536

The common Noatime and Nodiratime settings eliminate the need for writes during reads to help performance and disk wear. The barrier and writeback settings are a little less common and may not be what you'd normally set. Although those settings present a very slight risk for loss of data on disk failure, remember that the Riak solution is deployed in five-node rings with data redundantly available across multiple nodes in the ring. With that in mind and considering each node also being deployed with a RAID10 storage array, you can see that the minor risk for data loss on the failure of a single disk in the entire solution would have no impact on the entire data set (as there are plenty of redundant copies for that data available). Given the minor risk involved, the performance increases of those two settings justify their use.

With all of the nodes tweaked and configured into clusters, we set up Basho's test harness — Basho Bench — to remotely simulate load on the deployments. Basho Bench allows you to create a configurable test plan for a Riak cluster by configuring a number of workers to utilize a driver type to generate load. It comes packaged as an Erlang application with a config file example that you can alter to create the specifics for the concurrency, data set size, and duration of your tests. The results can be viewed as CSV data, and there is an optional graphics package that allows you to generate the graphs that I am posting in this blog. A simplified graphic of our test environment would look like this:

Riak Test Environment

The following Basho Bench config is what we used for our testing:

{mode, max}.
{duration, 120}.
{concurrent, 8}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator,{int_to_bin,{uniform_int,1000000}}}.
{value_generator,{exponential_bin,4098,50000}}.
{riakc_pb_ips, [{10,60,68,9},{10,40,117,89},{10,80,64,4},{10,80,64,8},{10,60,68,7}]}.
{riakc_pb_replies, 2}.
{operations, [{get, 10},{put, 1}]}.

To spell it out a little simpler:

Tests Performed

Data Set: 400GB
10:1 Query-to-Update Operations
8 Concurrent Client Connections
Test Duration: 2 Hours

You may notice that in the test cases that use SoftLayer "Medium" Servers, the virtual provider nodes are running 26 virtual compute units against our dual proc hex-core servers (12 cores total). In testing with Riak, memory is important to the operations than CPU resources, so we provisioned the virtual instances to align with the 36GB of memory in each of the "Medium" SoftLayer servers. In the public cloud environment, the higher level of RAM was restricted to packages with higher CPU, so while the CPU counts differ, the RAM amounts are as close to even as we could make them.

One final "housekeeping" note before we dive into the results: The graphs below are pulled directly from the optional graphics package that displays Basho Bench results. You'll notice that the scale on the left-hand side of graphs differs dramatically between the two environments, so a cursory look at the results might not tell the whole story. Click any of the graphs below for a larger version. At the end of each test case, we'll share a few observations about the operations per second and latency results from each test. When we talk about latency in the "key observation" sections, we'll talk about the 99th percentile line — 99% of the results had latency below this line. More simply you could say, "This is the highest latency we saw on this platform in this test." The primary reason we're focusing on this line is because it's much easier to read on the graphs than the mean/median lines in the bottom graphs.

Riak Test 1: "Small" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Small Riak Server Node
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
4 x 500GB SATAII – RAID10
1Gb Bonded Network
Virtual Provider Node
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
4 x 500GB Network Storage – RAID10
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

The SoftLayer environment showed much more consistency in operations per second with an average throughput around 450 Op/sec. The virtual environment throughput varied significantly between about 50 operations per second to more than 600 operations per second with the trend line fluctuating slightly between about 220 Op/sec and 350 Op/sec.

Comparing the latency of get and put requests, the 99th percentile of results in the SoftLayer environment stayed around 50ms for gets and under 200ms for puts while the same metric for the virtual environment hovered around 800ms in gets and 4000ms in puts. The scale of the graphs is drastically different, so if you aren't looking closely, you don't see how significantly the performance varies between the two.

Riak Test 2: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 300GB 15K SAS – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

Similar to the results of Test 1, the throughput numbers from the bare metal environment are more consistent (and are consistently higher) than the throughput results from the virtual instance environment. The SoftLayer environment performed between 1500 and 1750 operations per second on average while the virtual provider environment averaged around 1200 operations per second throughout the test.

The latency of get and put requests in Test 2 also paints a similar picture to Test 1. The 99th percentile of results in the SoftLayer environment stayed below 50ms and under 400ms for puts while the same metric for the virtual environment averaged about 250ms in gets and over 1000ms in puts. Latency in a big data application can be a killer, so the results from the virtual provider might be setting off alarm bells in your head.

Riak Test 3: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 128GB SSD – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

In Test 3, we're using the same specs in our virtual provider nodes, so the results for the virtual node environment are the same in Test 3 as they are in Test 2. In this Test, the SoftLayer environment substitutes SSD hard drives for the 15K SAS drives used in Test 2, and the throughput numbers show the impact of that improved I/O. The average throughput of the bare metal environment with SSDs is between 1750 and 2000 operations per second. Those numbers are slightly higher than the SoftLayer environment in Test 2, further distancing the bare metal results from the virtual provider results.

The latency of gets for the SoftLayer environment is very difficult to see in this graph because the latency was so low throughout the test. The 99th percentile of puts in the SoftLayer environment settled between 500ms and 625ms, which was a little higher than the bare metal results from Test 2 but still well below the latency from the virtual environment.

Summary

The results show that — similar to the majority of data-centric applications that we have tested — Riak has more consistent, better performing, and lower latency results when deployed onto bare metal instead of a cluster of public cloud instances. The stark differences in consistency of the results and the latency are noteworthy for developers looking to host their big data applications. We compared the 99th percentile of latency, but the mean/median results are worth checking out as well. Look at the mean and median results from the SoftLayer SSD Node environment: For gets, the mean latency was 2.5ms and the median was somewhere around 1ms. For puts, the mean was between 7.5ms and 11ms and the median was around 5ms. Those kinds of results are almost unbelievable (and that's why I've shared everything involved in completing this test so that you can try it yourself and see that there's no funny business going on).

It's commonly understood that local single-tenant resources that bare metal will always perform better than network storage resources, but by putting some concrete numbers on paper, the difference in performance is pretty amazing. Virtualizing on multi-tenant solutions with network attached storage often introduces latency issues, and performance will vary significantly depending on host load. These results may seem obvious, but sometimes the promise of quick and easy deployments on public cloud environments can lure even the sanest and most rational developer. Some applications are suited for public cloud, but big data isn't one of them. But when you have data-centric apps that require extreme I/O traffic to your storage medium, nothing can beat local high performance resources.

-Harold

July 10, 2013

The Importance of Providing Startups a Sandbox

With the global economy in its current state, it's more important than ever to help inspired value-creators acquire the tools needed to realize their ideas, effect change in the world, and create impact — now. I've had the privilege of working with hundreds of young, innovative companies through Catalyst and our relationships with startup accelerators, incubators and competitions, and I've noticed that the best way for entrepreneurs to create change is to simply let them play! Stick them in a sandbox with a wide variety of free products and services that they can use however they want so that they may find the best method of transitioning from idea to action.

Any attention that entrepreneurs divert from their core business ideas is wasted attention, so the most successful startup accelerators build a bridge for entrepreneurs to the resources they need — from access to hosting service, investors, mentors, and corporate partners to recommendations about summer interns and patent attorneys. That all sounds good in theory, and while it's extremely difficult to bring to reality, startup-focused organizations like MassChallenge make it look easy.

During a recent trip to Boston, I was chatting with Kara Shurmantine and Jibran Malek about what goes on behind the scenes to truly empower startups and entrepreneurs, and they gave me some insight. Startups' needs are constantly shifting, changing and evolving, so MassChallenge prioritizes providing a sandbox chock-full of the best tools and toys to help make life easier for their participants ... and that's where SoftLayer helps. With Kara and Jibran, I got in touch with a few MassChallenge winners to get some insight into their experience from the startup side.

Tish Scolnik, the CEO of Global Research Innovation & Technology (GRIT), described the MassChallenge experience perfectly: "You walk in and you have all these amazing opportunities in front of you, and then in a pretty low pressure environment you can decide what you need at a specific moment." Tish calls it a "buffet table" — an array of delectable opportunities, some combination of which will be the building blocks of a startup's growth curve. Getting SoftLayer products and services for free (along with a plethora of other valuable resources) has helped GRIT create a cutting-edge wheelchair for disabled people in developing countries.

The team from Neumitra, a Silver Winner of MassChallenge 2012, chose to use SoftLayer as an infrastructure partner, and we asked co-founder Rob Goldberg about his experience. He explained that his team valued the ability to choose tools that fit their ever-changing and evolving needs. Neumitra set out to battle stress — the stress you feel every day — and they've garnered significant attention while doing so. With a wearable watch, Neumitra's app tells you when your stress levels are too high and you need to take a break.

Jordan Fliegal, the founder and CEO of CoachUp, another MassChallenge winner, also benefited from playing around in the sandbox. This environment, he says, is constantly "giving to you and giving to you and giving to you without asking for anything in return other than that you work hard and create a company that makes a difference." The result? CoachUp employs 20 people, has recruited thousands of judges, and has raised millions in funding — and is growing at breakneck speed.

If you give inspired individuals a chance and then give them not only the resources that they need, but also a diverse range of resources that they could need, you are guaranteed to help create global impact.

In short: Provide a sandbox. Change the world.

-@KelleyHilborn

July 9, 2013

When to Consider Riak for Your Big Data Architecture

In my Breaking Down 'Big Data' – Database Models, I briefly covered the most common database models, their strengths, and how they handle the CAP theorem — how a distributed storage system balances demands of consistency and availability while maintaining partition tolerance. Here's what I said about Dynamo-inspired databases:

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

This type of key/value store architecture is very unique from the document-oriented MongoDB solutions we launched at the end of last year, so we worked with Basho to prioritize development of high-performance Riak solutions on our global platform. Since you already know about MongoDB, let's take a few minutes to meet the new kid on the block.

Riak is a distributed database architected for availability, fault tolerance, operational simplicity and scalability. Riak is masterless, so each node in a Riak cluster is the same and contains a complete, independent copy of the Riak package. This design makes the Riak environment highly fault tolerant and scalable, and it also aids in replication — if a node goes down, you can still read, write and update data.

As you approach the daunting prospect of choosing a big data architecture, there are a few simple questions you need to answer:

  1. How much data do/will I have?
  2. In what format am I storing my data?
  3. How important is my data?

Riak may be the choice for you if [1] you're working with more than three terabytes of data, [2] your data is stored in multiple data formats, and [3] your data must always be available. What does that kind of need look like in real life, though? Luckily, we've had a number of customers kick Riak's tires on SoftLayer bare metal servers, so I can share a few of the use cases we've seen that have benefited significantly from Riak's unique architecture.

Use Case 1 – Digital Media
An advertising company that serves over 10 billion ads per month must be able to quickly deliver its content to millions of end users around the world. Meeting that demand with relational databases would require a complex configuration of expensive, vertically scaled hardware, but it can be scaled out horizontally much easier with Riak. In a matter of only a few hours, the company is up and running with an ad-serving infrastructure that includes a back-end Riak cluster in Dallas with a replication cluster in Singapore along with an application tier on the front end with Web servers, load balancers and CDN.

Use Case 2 – E-commerce
An e-commerce company needs 100-percent availability. If any part of a customer's experience fails, whether it be on the website or in the shopping cart, sales are lost. Riak's fault tolerance is a big draw for this kind of use case: Even if one node or component fails, the company's data is still accessible, and the customer's user experience is uninterrupted. The shopping cart structure is critical, and Riak is built to be available ... It's a perfect match.

As an additional safeguard, the company can take advantage of simple multi-datacenter replication in their Riak Enterprise environment to geographically disperse content closer to its customers (while also serving as an important tool for disaster recovery and backup).

Use Case 3 – Gaming
With customers like Broken Bulb and Peak Games, SoftLayer is no stranger to the gaming industry, so it should come as no surprise that we've seen interesting use cases for Riak from some of our gaming customers. When a game developer incorporated Riak into a new game to store player data like user profiles, statistics and rankings, the performance of the bare metal infrastructure blew him away. As a result, the game's infrastructure was redesigned to also pull gaming content like images, videos and sounds from the Riak database cluster. Since the environment is so easy to scale horizontally, the process on the infrastructure side took no time at all, and the multimedia content in the game is getting served as quickly as the player data.

Databases are common bottlenecks for many applications, but they don't have to be. Making the transition from scaling vertically (upgrading hardware, adding RAM, etc.) to scaling horizontally (spreading the work intelligently across multiple nodes) alleviates many of the pain points for a quickly growing database environment. Have you made that transition? If not, what's holding you back? Have you considered implementing Riak?

-@marcalanjones

May 10, 2013

Understanding and Implementing Coding Standards

Coding standards provide a consistent framework for development within a project and across projects in an organization. A dozen programmers can complete a simple project in a dozen different ways by using unique coding methodologies and styles, so I like to think of coding standards as the "rules of the road" for developers.

When you're driving in a car, traffic is controlled by "standards" such as lanes, stoplights, yield signs and laws that set expectations around how you should drive. When you take a road trip to a different state, the stoplights might be hung horizontally instead of vertically or you'll see subtle variations in signage, but because you're familiar with the rules of the road, you're comfortable with the mechanics of driving in this new place. Coding standards help control development traffic and provide the consistency programmers need to work comfortably with a team across projects. The problem with allowing developers to apply their own unique coding styles to a project is the same as allowing drivers to drive as they wish ... Confusion about lane usage, safe passage through intersections and speed would result in collisions and bottlenecks.

Coding standards often seem restrictive or laborious when a development team starts considering their adoption, but they don't have to be ... They can be implemented methodically to improve the team's efficiency and consistency over time, and they can be as simple as establishing that all instantiations of an object must be referenced with a variable name that begins with a capital letter:

$User = new User();

While that example may seem overly simplistic, it actually makes life a lot easier for all of the developers on a given project. Regardless of who created that variable, every other developer can see the difference between a variable that holds data and one that are instantiates an object. Think about the shapes of signs you encounter while driving ... You know what a stop sign looks like without reading the word "STOP" on it, so when you see a red octagon (in the United States, at least), you know what to do when you approach it in your car. Seeing a capitalized variable name would tell us about its function.

The example I gave of capitalizing instantiated objects is just an example. When it comes to coding standards, the most effective rules your team can incorporate are the ones that make the most sense to you. While there are a few best practices in terms of formatting and commenting in code, the most important characteristics of coding standards for a given team is consistency and clarity.

So how do you go about creating a coding standard? Most developers dislike doing unnecessary work, so the easiest way to create a coding standard is to use an already-existing one. Take a look at any libraries or frameworks you are using in your current project. Do they use any coding standards? Are those coding standards something you can live with or use as a starting point? You are free to make any changes to it you wish in order to best facilitate your team's needs, and you can even set how strict specific coding standards must be adhered to. Take for example left-hand comparisons:

if ( $a == 12 ) {} // right-hand comparison
if ( 12 == $a ) {} // left-hand comparison

Both of these statements are valid but one may be preferred over the other. Consider the following statements:

if ( $a = 12 ) {} // supposed to be a right-hand comparison but is now an assignment
if ( 12 = $a ) {} // supposed to be a left-hand comparison but is now an assignment

The first statement will now evaluate to true due to $a being assigned the value of 12 which will then cause the code within the if-statement to execute (which is not the desired result). The second statement will cause an error, therefore making it obvious a mistake in the code has occurred. Because our team couldn't come to a consensus, we decided to allow both of these standards ... Either of these two formats are acceptable and they'll both pass code review, but they are the only two acceptable variants. Code that deviates from those two formats would fail code review and would not be allowed in the code base.

Coding standards play an important role in efficient development of a project when you have several programmers working on the same code. By adopting coding standards and following them, you'll avoid a free-for-all in your code base, and you'll be able to look at every line of code and know more about what that line is telling you than what the literal code is telling you ... just like seeing a red octagon posted on the side of the road at an intersection.

-@SoftLayerDevs

April 29, 2013

Web Development - Installing mod_security with OWASP

You want to secure your web application, but you don't know where to start. A number of open-source resources and modules exist, but that variety is more intimidating than it is liberating. If you're going to take the time to implement application security, you don't want to put your eggs in the wrong basket, so you wind up suffering from analysis paralysis as you compare all of the options. You want a powerful, flexible security solution that isn't overly complex, so to save you the headache of making the decision, I'll make it for you: Start with mod_security and OWASP.

ModSecurity (mod_security) is an open-source Apache module that acts as a web application firewall. It is used to help protect your server (and websites) from several methods of attack, most common being brute force. You can think of mod_security as an invisible layer that separates users and the content on your server, quietly monitoring HTTP traffic and other interactions. It's easy to understand and simple to implement.

The challenge is that without some advanced configuration, mod_security isn't very functional, and that advanced configuration can get complex pretty quickly. You need to determine and set additional rules so that mod_security knows how to respond when approached with a potential threat. That's where Open Web Application Security Project (OWASP) comes in. You can think of the OWASP as an enhanced core ruleset that the mod_security module will follow to prevent attacks on your server.

The process of getting started with mod_security and OWASP might seem like a lot of work, but it's actually quite simple. Let's look at the installation and configuration process in a CentOS environment. First, we want to install the dependencies that mod_security needs:

## Install the GCC compiler and mod_security dependencies ##
$ sudo yum install gcc make
$ sudo yum install libxml2 libxml2-devel httpd-devel pcre-devel curl-devel

Now that we have the dependencies in place, let's install mod_security. Unfortunately, there is no yum for mod_security because it is not a maintained package, so you'll have to install it directly from the source:

## Get mod_security from its source ##
$ cd /usr/src
$ git clone https://github.com/SpiderLabs/ModSecurity.git

Now that we have mod_security on our server, we'll install it:

## Install mod_security ##
$ cd ModSecurity
$ ./configure
$ make install

And we'll copy over the default mod_security configuration file into the necessary Apache directory:

## Copy configuration file ##
$ cp modsecurity.conf-recommended /etc/httpd/conf.d/modsecurity.conf

We've got mod_security installed now, so we need to tell Apache about it ... It's no use having mod_security installed if our server doesn't know it's supposed to be using it:

## Apache configuration for mod_security ##
$ vi /etc/httpd/conf/httpd.conf

We'll need to load our Apache config file to include our dependencies (BEFORE the mod_security module) and the mod_security file module itself:

## Load dependencies ##
LoadFile /usr/lib/libxml2.so
LoadFile /usr/lib/liblua5.1.so
## Load mod_security ##
LoadModule security2_module modules/mod_security2.so

We'll save our configuration changes and restart Apache:

## Restart Apache! ##
$ sudo /etc/init.d/httpd restart

As I mentioned at the top of this post, our installation of mod_security is good, but we want to enhance our ruleset with the help of OWASP. If you've made it this far, you won't have a problem following a similar process to install OWASP:

## OWASP ##
$ cd /etc/httpd/
$ git clone https://github.com/SpiderLabs/owasp-modsecurity-crs.git
$ mv owasp-modsecurity-crs modsecurity-crs

Just like with mod_security, we'll set up our configuration file:

## OWASP configuration file ##
$ cd modsecurity-crs
$ cp modsecurity_crs_10_setup.conf.example modsecurity_crs_10_config.conf

Now we have mod_security and the OWASP core ruleset ready to go! The last step we need to take is to update the Apache config file to set up our basic ruleset:

## Apache configuration ##
$ vi /etc/httpd/conf/httpd.conf

We'll add an IfModule and point it to our new OWASP rule set at the end of the file:

<IfModule security2_module>
    Include modsecurity-crs/modsecurity_crs_10_config.conf
    Include modsecurity-crs/base_rules/*.conf
</IfModule>

And to complete the installation, we save the config file and restart Apache:

## Restart Apache! ##
$ sudo /etc/init.d/httpd restart

And we've got mod_security installed with the OWASP core ruleset! With this default installation, we're leveraging the rules the OWASP open source community has come up with, and we have the flexibility to tweak and enhance those rules as our needs dictate. If you have any questions about this installation or you have any other technical blog topics you'd like to hear from us about, please let us know!

-Cassandra

April 16, 2013

iptables Tips and Tricks - Track Bandwidth with iptables

As I mentioned in my last post about CSF configuration in iptables, I'm working on a follow-up post about integrating CSF into cPanel, but I thought I'd inject a simple iptables use-case for bandwidth tracking. You probably think about iptables in terms of firewalls and security, but it also includes a great diagnostic tool for counting bandwidth for individual rules or set of rules. If you can block it, you can track it!

The best part about using iptables to track bandwidth is that the tracking is enabled by default. To see this feature in action, add the "-v" into the command:

[root@server ~]$ iptables -vnL
Chain INPUT (policy ACCEPT 2495 packets, 104K bytes)

The output includes counters for both the policies and the rules. To track the rules, you can create a new chain for tracking bandwidth:

[root@server ~]$ iptables -N tracking
[root@server ~]$ iptables -vnL
...
Chain tracking (0 references)
 pkts bytes target prot opt in out source           destination

Then you need to set up new rules to match the traffic that you wish to track. In this scenario, let's look at inbound http traffic on port 80:

[root@server ~]$ iptables -I INPUT -p tcp --dport 80 -j tracking
[root@server ~]$ iptables -vnL
Chain INPUT (policy ACCEPT 35111 packets, 1490K bytes)
 pkts bytes target prot opt in out source           destination
    0   0 tracking    tcp  --  *  *   0.0.0.0/0        0.0.0.0/0       tcp dpt:80

Now let's generate some traffic and check it again:

[root@server ~]$ iptables -vnL
Chain INPUT (policy ACCEPT 35216 packets, 1500K bytes)
 pkts bytes target prot opt in out source           destination
  101  9013 tracking    tcp  --  *  *   0.0.0.0/0        0.0.0.0/0       tcp dpt:80

You can see the packet and byte transfer amounts to track the INPUT — traffic to a destination port on your server. If you want track the amount of data that the server is generating, you'd look for OUTPUT from the source port on your server:

[root@server ~]$ iptables -I OUTPUT -p tcp --sport 80 -j tracking
[root@server ~]$ iptables -vnL
...
Chain OUTPUT (policy ACCEPT 26149 packets, 174M bytes)
 pkts bytes target prot opt in out source           destination
  488 3367K tracking    tcp  --  *  *   0.0.0.0/0        0.0.0.0/0       tcp spt:80

Now that we know how the tracking chain works, we can add in a few different layers to get even more information. That way you can keep your INPUT and OUTPUT chains looking clean.

[root@server ~]$ iptables –N tracking
[root@server ~]$ iptables –N tracking2
[root@server ~]$ iptables –I INPUT –j tracking
[root@server ~]$ iptables –I OUTPUT –j tracking
[root@server ~]$ iptables –A tracking –p tcp --dport 80 –j tracking2
[root@server ~]$ iptables –A tracking –p tcp --sport 80 –j tracking2
[root@server ~]$ iptables -vnL
 
Chain INPUT (policy ACCEPT 96265 packets, 4131K bytes)
 pkts bytes target prot opt in out source           destination
 4002  184K tracking    all  --  *  *   0.0.0.0/0        0.0.0.0/0
 
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source           destination
 
Chain OUTPUT (policy ACCEPT 33751 packets, 231M bytes)
 pkts bytes target prot opt in out source           destination
 1399 9068K tracking    all  --  *  *   0.0.0.0/0        0.0.0.0/0
 
Chain tracking (2 references)
 pkts bytes target prot opt in out source           destination
 1208 59626 tracking2   tcp  --  *  *   0.0.0.0/0        0.0.0.0/0       tcp dpt:80
  224 1643K tracking2   tcp  --  *  *   0.0.0.0/0        0.0.0.0/0       tcp spt:80
 
Chain tracking2 (2 references)
 pkts bytes target prot opt in out source           destination

Keep in mind that every time a packet passes through one of your rules, it will eat CPU cycles. Diverting all your traffic through 100 rules that track bandwidth may not be the best idea, so it's important to have an efficient ruleset. If your server has eight processor cores and tons of overhead available, that concern might be inconsequential, but if you're running lean, you could conceivably run into issues.

The easiest way to think about making efficient rulesets is to think about eating the largest slice of pie first. Understand iptables rule processing and put the rules that get more traffic higher in your list. Conversely, save the tiniest pieces of your pie for last. If you run all of your traffic by a rule that only applies to a tiny segment before you screen out larger segments, you're wasting processing power.

Another thing to keep in mind is that you do not need to specify a target (in our examples above, we established tracking and tracking2 as our targets). If you're used to each rule having a specific purpose of either blocking, allowing, or diverting traffic, this simple tidbit might seem revolutionary. For example, we could use this rule:

[root@server ~]$ iptables -A INPUT

If that seems a little bare to you, don't worry ... It is! The output will show that it is a rule that tracks all traffic in the chain at that point. We're appending the data to the end of the chain in this example ("-A") but we could also insert it ("-I") at the top of the chain instead. This command could be helpful if you are using a number of different chains and you want to see the exact volume of packets that are filtered at any given point. Additionally, this strategy could show how much traffic a potential rule would filter before you run it on your production system. Because having several of these kinds of commands can get a little messy, it's also helpful to add comments to help sort things out:

[root@server ~]$ iptables -A INPUT -m comment --comment "track all data"
 
[root@server ~]$ iptables -vnL
Chain INPUT (policy ACCEPT 11M packets, 5280M bytes)
 pkts bytes target prot opt in out source           destination
   98  9352        all  --  *  *   0.0.0.0/0        0.0.0.0/0       /* track all data */

Nothing terribly complicated about using iptables to count bandwidth, right? If you have iptables rulesets and you want to get a glimpse at how your traffic is being affected, this little trick could be useful. You can rely on the information iptables gives you about your bandwidth usage, and you won't be the only one ... cPanel actually uses iptables to track bandwidth.

-Mark

April 12, 2013

Catalyst at SXSW 2013: Mentorship and Meaningfulness

In the Community Development group, our mission is simple: Create the industry's most substantially helpful startup program that assists participants in a MEANINGFUL way. Meaningfulness is a subjective goal, but when it comes to fueling new businesses, numbers and statistics can't tell the whole story. Sure, we could run Catalyst like some of the other startup programs in the infrastructure world and gauge our success off of the number of partners using the hosting credits we provide, but if we only focused on hosting credits, we'd be leaving a significant opportunity on the table.

SoftLayer is able to offer the entrepreneurial community so much more than cloud computing instances and powerful servers. As a startup ourselves not so long ago, our team knows all about the difficulties of being an entrepreneur, and now that we're able to give back to the startup community, we want to share battle stories and lessons learned. Mentorship is one of the most valuable commodities for entrepreneurs and business founders, and SoftLayer's mentors are in a unique position to provide feedback about everything from infrastructure planning to hiring your first employees to engaging with your board of advisors to negotiating better terms on a round of funding.

The Catalyst team engages in these kinds discussions with our clients every day, and we've had some pretty remarkable success. When we better understand a client's business, we can provide better feedback and insight into the infrastructure that will help that business succeed. In other words, we build meaningful relationships with our Catalyst clients, and as a result, those clients are able to more efficiently leverage the hosting credits we provide them.

The distinction between Catalyst and other startup programs in the hosting industry has never been so apparent than after South by Southwest (SXSW) in Austin this year. I had the opportunity to meet with entrepreneurs, investors, and industry experts who have been thirsting for a program like Catalyst for years, and when they hear about what we're doing, they know they've found their oasis. I had a chance to sit down with Paul Ford in the Catalyst Startup Lounge at SXSW to talk about the program and some of the insights and feedback we'd gotten at the show:

Paul was quick to point out that being a leader in the startup community has more impact when you provide the best technology and pair that with a team that can deliver for startups what they need: meaningful support.

Later, I had an impromptu coffee with one of the world's largest, most prestigious Silicon Valley-based venture capital firms — probably THE most respected venture capital firm in the world, actually. As we chatted about the firm's seed-funding practices, the investment partner told me, "There is no better insurance policy for an infrastructure company than what SoftLayer is doing to ensure success for its startup clients." And I thought that was a pretty telling insight.

That simple sentence drove home the point that success in a program like Catalyst is not guaranteed by a particular technology, no matter how innovative or industry-leading that technology may be. Success comes from creating value BEYOND that technology, and when I sat down with George Karidis, he shared a few insights how the Catalyst vision came to be along with how the program has evolved to what it is today:

Catalyst is special. The relationships we build with entrepreneurs are meaningful. We've made commitments to have the talented brainpower within our own walls to be accessible to the community already. After SXSW, I knew I didn't have to compare what we were doing from what other programs are doing because that would be like comparing apples and some other fruit that doesn't do nearly as much for you as apples do.

I was told once on the campaign trail for President Clinton in '96 that so long as you have a rock-solid strategy, you cannot be beaten if you continue to execute on that strategy. Execute, Execute, Execute. If you waiver and react to the competition, you're dead in the water. With that in mind, we're going to keep executing on our strategy of being available to our Catalyst clients and actively helping them solve their problems. The only question that remains is this:

How can we help you?

-@JoshuaKrammes

April 1, 2013

SoftLayer Mobile: Now a Universal iOS Application

Last month, we put SoftLayer Mobile HD out to pasture. That iPad-specific application performed amazingly, and we got a lot of great feedback from our customers, so we doubled-down on our efforts to support iPad users by merging SoftLayer Mobile HD functionality with our standard SoftLayer Mobile app to provide a singular, universal application for all iOS devices.

By merging our two iOS applications into a single, universal app, we can provide better feature parity, maintain coherent architecture and increase code reuse and maintainability because we're only working with a single feature-rich binary app that provides a consistent user experience on the iPhone and the iPad at the same. Obviously, this meant we had to retool much of the legacy iPhone-specific SoftLayer Mobile app in order to provide the same device-specific functionality we had for the iPad in SoftLayer Mobile HD, but I was surprised at how straightforward that process ended up being. I thought I'd share a few of the resources iOS includes that simplify the process of creating a universal iOS application.

iOS supports development of universal applications via device-specific resource loading and device-specific runtime checks, and we leveraged those tools based on particular situations in our code base.

Device-specific resource loading allows iOS to choose the appropriate resource for the device being used. For example, if we have two different versions of an image called SoftLayerOnBlack.png to fit either an iPhone or an iPad, we simply call one SoftLayerOnBlack~iphone.png and call the other one SoftLayerOnBlack~ipad.png. With those two images in our application bundle, we let the system choose which image to use with a simple line of code:

UIImage* image = [UIImage imageNamed: @"SoftLayerOnBlack.png"];

In addition to device-specific resource loading, iOS also included device-specific runtime checks. With these runtime checks, we're able to create conditional code paths depending on the underlying device type:

if (UI_USER_INTERFACE_IDIOM() == UIUserInterfaceIdiomPad) {
    // The device is an iPad running iOS 3.2 or later.
} else {
    // The device is an iPhone or iPod touch.
}

These building blocks allow for a great deal of flexibility when it comes to creating a universal iOS application. Both techniques enable simple support based on what device is running the application, but they're used in subtly different ways. With those device-specific tools, developers are able to approach their universal applications in a couple of distinct ways:

Device-Dependent View Controller:
If we want users on the iPhone and iPad applications to have the same functionality but have the presentation tailored to their specific devices, we would create separate iPhone and iPad view controllers. For example, let's look at how our Object Storage browser appears on the iPhone and the iPad in SoftLayer Mobile:

Object Storage - iPhoneObject Storage - iPad

We want to take advantage of the additional real estate the iPad provides, so at runtime, the appropriate view controller is be selected based on the devices' UI context. The technique would look a little like this:

@implementation SLMenuController
...
 
- (void) navigateToStorageModule: (id) sender {
UIViewController<SLApplicationModule> *storageModule = nil;
    if (UI_USER_INTERFACE_IDIOM() == UIUserInterfaceIdiomPad) {
        storageModule = [SLStorageModule_iPad storageModule];
    } else {
        storageModule = [SLStorageModule storageModule];
    }
    [self navigateToModule: storageModule];
}
...
@end

"Universal" View Controller
In other situations, we didn't need for the viewing experience to differ between the iPhone and the iPad, so we used a single view controller for all devices. We don't compromise the user experience or presentation of data because the view controller either re-scales or reconfigures the layout at runtime based on screen size. Take a look at the "About" module on the iPhone and iPad:

About Module - iPhoneAbout Module - iPad

The code for the universal view controller of the "About" module looks something like this:

@implementation SLAboutModuleNavigationViewController
…
 
- (id) init {
    self = [super init];
    if (self) {
      _navigationHidden = YES;
_navigationWidth = [[UIScreen mainScreen] bounds].size.width * 0.5;
    }
    return self;
}@end

There are plenty of other iOS features and tricks in the universal SoftLayer Mobile app. If you've got a SoftLayer account and an iOS devices, download the app to try it out and let us know what you think. If you were a SoftLayer Mobile HD user, do you notice any significant changes in the new app from the legacy app?

-Pawel

P.S. If you're not on iOS but you still want some SoftLayer love on your mobile device, check out the other SoftLayer Mobile Apps on Android and Windows Phone.

March 22, 2013

Social Media for Brands: Monitor Twitter Search via Email

If you're responsible for monitoring Twitter for conversations about your brand, you're faced with a challenge: You need to know what people are saying about your brand at all times AND you don't want to live your entire life in front of Twitter Search.

Over the years, a number of social media applications have been released specifically for brand managers and social media teams, but most of those applications (especially the free/inexpensive ones) differentiate themselves only by the quality of their analytics and how real-time their data is reported. If that's what you need, you have plenty of fantastic options. Those differentiators don't really help you if you want to take a more passive role in monitoring Twitter search ... You still have to log into the application to see your fancy dashboards with all of the information. Why can't the data come to you?

About three weeks ago, Hazzy stopped by my desk and asked if I'd help build a tool that uses the Twitter Search API to collect brand keywords mentions and send an email alert with those mentions in digest form every 30 minutes. The social media team had been using Twilert for these types of alerts since February 2012, but over the last few months, messages have been delayed due to issues connecting to Twitter search ... It seems that the service is so popular that it hits Twitter's limits on API calls. An email digest scheduled to be sent every thirty minutes ends up going out ten hours late, and ten hours is an eternity in social media time. We needed something a little more timely and reliable, so I got to work on a simple "Twitter Monitor" script to find all mentions of our keyword(s) on Twitter, email those results in a simple digest format, and repeat the process every 30 minutes when new mentions are found.

With Bear's Python-Twitter library on GitHub, connecting to the Twitter API is a breeze. Why did we use Bear's library in particular? Just look at his profile picture. Yeah ... 'nuff said. So with that Python wrapper to the Twitter API in place, I just had to figure out how to use the tools Twitter provided to get the job done. For the most part, the process was very clear, and Twitter actually made querying the search service much easier than we expected. The Search API finds all mentions of whatever string of characters you designate, so instead of creating an elaborate Boolean search for "SoftLayer OR #SoftLayer OR @SoftLayer ..." or any number of combinations of arbitrary strings, we could simply search for "SoftLayer" and have all of those results included. If you want to see only @ replies or hashtags, you can limit your search to those alone, but because "SoftLayer" isn't a word that gets thrown around much without referencing us, we wanted to see every instance. This is the code we ended up working with for the search functionality:

def status_by_search(search):
    statuses = api.GetSearch(term=search)
    results = filter(lambda x: x.id > get_log_value(), statuses)
    returns = []
    if len(results) > 0:
        for result in results:
            returns.append(format_status(result))
 
        new_tweets(results)
        return returns, len(returns)
    else:
        exit()

If you walk through the script, you'll notice that we want to return only unseen Tweets to our email recipients. Shortly after got the Twitter Monitor up and running, we noticed how easy it would be to get spammed with the same messages every time the script ran, so we had to filter our results accordingly. Twitter's API allows you to request tweets with a Tweet ID greater than one that you specify, however when I tried designating that "oldest" Tweet ID, we had mixed results ... Whether due to my ignorance or a fault in the implementation, we were getting fewer results than we should. Tweet IDs are unique and numerically sequential, so they can be relied upon as much as datetime (and far easier to boot), so I decided to use the highest Tweet ID from each batch of processed messages to filter the next set of results. The script stores that Tweet ID and uses a little bit of logic to determine which Tweets are newer than the last Tweet reported.

def new_tweets(results):
    if get_log_value() < max(result.id for result in results):
        set_log_value(max(result.id for result in results))
        return True
 
 
def get_log_value():
    with open('tweet.id', 'r') as f:
        return int(f.read())
 
 
def set_log_value(messageId):
    with open('tweet.id', 'w+') as f:
        f.write(str(messageId))

Once we culled out our new Tweets, we needed our script to email those results to our social media team. Luckily, we didn't have to reinvent the wheel here, and we added a few lines that enabled us to send an HTML-formatted email over any SMTP server. One of the downsides of the script is that login credentials for your SMTP server are stored in plaintext, so if you can come up with another alternative that adds a layer of security to those credentials (or lets you send with different kinds of credentials) we'd love for you to share it.

From that point, we could run the script manually from the server (or a laptop for that matter), and an email digest would be sent with new Tweets. Because we wanted to automate that process, I added a cron job that would run the script at the desired interval. As a bonus, if the script doesn't find any new Tweets since the last time it was run, it doesn't send an email, so you won't get spammed by "0 Results" messages overnight.

The script has been in action for a couple of weeks now, and it has gotten our social media team's seal of approval. We've added a few features here and there (like adding the number of Tweets in an email to the email's subject line), and I've enlisted the help of Kevin Landreth to clean up the code a little. Now, we're ready to share the SoftLayer Twitter Monitor script with the world via GitHub!

SoftLayer Twitter Monitor on GitHub

The script should work well right out of the box in any Python environment with the required libraries after a few simple configuration changes:

  • Get your Twitter Customer Secret, Access Token and Access Secret from https://dev.twitter.com/
  • Copy/paste that information where noted in the script.
  • Update your search term(s).
  • Enter your mailserver address and port.
  • Enter your email account credentials if you aren't working with an open relay.
  • Set the self.from_ and self.to values to your preference.
  • Ensure all of the Python requirements are met.
  • Configure a cron job to run the script your desired interval. For example, if you want to send emails every 10 minutes: */10 * * * * <path to python> <path to script> 2>&1 /dev/null

As soon as you add your information, you should be in business. You'll have an in-house Twitter Monitor that delivers a simple email digest of your new Twitter mentions at whatever interval you specify!

Like any good open source project, we want the community's feedback on how it can be improved or other features we could incorporate. This script uses the Search API, but we're also starting to play around with the Stream API and SoftLayer Message Queue to make some even cooler tools to automate brand monitoring on Twitter.

If you end up using the script and liking it, send SoftLayer a shout-out via Twitter and share it with your friends!

-@SoftLayerDevs

Subscribe to development