Development Posts

November 11, 2013

Sysadmin Tips and Tricks - Using the ‘for’ Loop in Bash

Ever have a bunch of files to rename or a large set of files to move to different directories? Ever find yourself copy/pasting nearly identical commands a few hundred times to get a job done? A system administrator's life is full of tedious tasks that can be eliminated or simplified with the proper tools. That's right ... Those tedious tasks don't have to be executed manually! I'd like to introduce you to one of the simplest tools to automate time-consuming repetitive processes in Bash — the for loop.

Whether you have been programming for a few weeks or a few decades, you should be able to quickly pick up on how the for loop works and what it can do for you. To get started, let's take a look at a few simple examples of what the for loop looks like. For these exercises, it's always best to use a temporary directory while you're learning and practicing for loops. The command is very powerful, and we wouldn't want you to damage your system while you're still learning.

Here is our temporary directory:

rasto@lmlatham:~/temp$ ls -la
total 8
drwxr-xr-x 2 rasto rasto 4096 Oct 23 15:54 .
drwxr-xr-x 34 rasto rasto 4096 Oct 23 16:00 ..
rasto@lmlatham:~/temp$

We want to fill the directory with files, so let's use the for loop:

rasto@lmlatham:~/temp$ for cats_are_cool in {a..z}; do touch $cats_are_cool; done;
rasto@lmlatham:~/temp$

Note: This should be typed all in one line.

Here's the result:

rasto@lmlatham:~/temp$ ls -l
total 0
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 a
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 b
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 c
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 d
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 e
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 f
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 g
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 h
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 i
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 j
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 k
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 l
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 m
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 n
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 o
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 p
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 q
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 r
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 s
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 t
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 u
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 v
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 w
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 x
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 y
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 z
rasto@lmlatham:~/temp$

How did that simple command populate the directory with all of the letters in the alphabet? Let's break it down.

for cats_are_cool in {a..z}

The for is the command we are running, which is built into the Bash shell. cats_are_cool is a variable we are declaring. The specific name of the variable can be whatever you want it to be. Traditionally people often use f, but the variable we're using is a little more fun. Hereafter, our variable will be referred to as $cats_are_cool (or $f if you used the more boring "f" variable). Aside: You may be familiar with declaring a variable without the $ sign, and then using the $sign to invoke it when declaring environment variables.

When our command is executed, the variable we declared in {a..z}, will assume each of the values of a to z. Next, we use the semicolon to indicate we are done with the first phase of our for loop. The next part starts with do, which say for each of a–z, do <some thing>. In this case, we are creating files by touching them via touch $cats_are_cool. The first time through the loop, the command creates a, the second time through b and so forth. We complete that command with a semicolon, then we declare we are finished with the loop with "done".

This might be a great time to experiment with the command above, making small changes, if you wish. Let's do a little more. I just realized that I made a mistake. I meant to give the files a .txt extension. This is how we'd make that happen:

for dogs_are_ok_too in {a..z}; do mv $dogs_are_ok_too $dogs_are_ok_too.txt; done;
Note: It would be perfectly okay to re-use $cats_are_cool here. The variables are not persistent between executions.

As you can see, I updated the command so that a would be renamed a.txt, b would be renamed b.txt and so forth. Why would I want to do that manually, 26 times? If we check our directory, we see that everything was completed in that single command:

rasto@lmlatham:~/temp$ ls -l
total 0
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 a.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 b.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 c.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 d.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 e.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 f.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 g.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 h.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 i.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 j.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 k.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 l.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 m.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 n.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 o.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 p.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 q.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 r.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 s.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 t.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 u.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 v.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 w.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 x.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 y.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 z.txt
rasto@lmlatham:~/temp$

Now we have files, but we don't want them to be empty. Let's put some text in them:

for f in `ls`; do cat /etc/passwd > $f; done

Note the backticks around ls. In Bash, backticks mean, "execute this and return the results," so it's like you executed ls and fed the results to the for loop! Next, cat /etc/passwd is redirecting the results to $f, in filenames a.txt, b.txt, etc. Still with me?

So now I've got a bunch of files with copies of /etc/passwd in them. What if I never wanted files for a, g, or h? First, I'd get a list of just the files I want to get rid of:

rasto@lmlatham:~/temp$ ls | egrep 'a|g|h'
a.txt
g.txt
h.txt

Then I could plug that command into the for loop (using backticks again) and do the removal of those files:

for f in `ls | egrep 'a|g|h'`; do rm $f; done

I know these examples don't seem very complex, but they give you a great first-look at the kind of functionality made possible by the for loop in Bash. Give it a whirl. Once you start smartly incorporating it in your day-to-day operations, you'll save yourself massive amounts of time ... Especially when you come across thousands or tens of thousands of very similar tasks.

Don't do work a computer should do!

-Lee

September 24, 2013

Four Rules for Better Code Documentation

Last month, Jeremy shared some valuable information regarding technical debt on SLDN. In his post, he discussed how omitting pertinent information when you're developing for a project can cause more work to build up in the future. One of the most common areas developers overlook when it comes to technical debt is documentation. This oversight comes in two forms: A complete omission of any documentation and inadequate information when documentation does exist. Simply documenting the functionality of your code is a great start, but the best way to close the information gap and avoid technical debt that stems from documentation (or lack thereof) is to follow four simple rules.

1. Know Your Audience

When we're talking about code, it's safe to say you'll have a fairly technical audience; however, it is important to note the level of understanding your audience has on the code itself. While they should be able to grasp common terms and development concepts, they may be unfamiliar with the functionality you are programming. Because of this, it's a good idea to provide a link to an internal, technical knowledgebase or wiki that will provide in-depth details on the functionality of the technology they'll be working with. We try to use a combination of internal and external references that we think will provide the most knowledge to developers who may be looking at our code. Here's an example of that from our Dns_Domain class:

 * @SLDNDocumentation Service Overview <<< EOT
 * SoftLayer customers have the option of hosting DNS domains on the SoftLayer
 * name servers. Individual domains hosted on the SoftLayer name servers are
 * handled through the SoftLayer_Dns_Domain service.
 *
 * Domain changes are applied automatically by our nameservers, but changes may
 * not be received by the other name servers on the Internet for 72 hours after
 * your change. The SoftLayer_Dns_Domain service does not apply to customers who
 * run their own nameservers on servers purchased from SoftLayer.
 *
 * SoftLayer provides secondary DNS hosting services if you wish to maintain DNS
 * records on your name server, but have records replicated on SoftLayer's name
 * servers. Use the [[SoftLayer_Dns_Secondary]] service to manage secondary DNS
 * zones and transfers.
 * EOT
 *
 * @SLDNDocumentation Service External Link http://en.wikipedia.org/wiki/Domain_name_system Domain Name System at Wikipedia
 * @SLDNDocumentation Service External Link http://tools.ietf.org/html/rfc1035 RFC1035: Domain Names - Implementation and Specification at ietf.org
 * @SLDNDocumentation Service See Also SoftLayer_Dns_Domain_ResourceRecord
 * @SLDNDocumentation Service See Also SoftLayer_Dns_Domain_Reverse
 * @SLDNDocumentation Service See Also SoftLayer_Dns_Secondary
 *

Enabling the user to learn more about a topic, product, or even a specific call alleviates the need for users to ask multiple questions regarding the "what" or "why" and will also minimize the need for you to explain more basic concepts regarding the technology supported by your code.

2. Be Consistent - Terminology

There are two main areas developers should focus on when it comes to consistency: Formatting and terminology.

Luckily, formatting is pretty simple. Most languages have a set of standards attached to them that extend to the Docblock, which is where the documentation portion of the code normally takes place. Docblocks can be used to provide an overview of the class, identify authors or product owners and provide additional reference to those using the code. The example below uses PHP's standards for documentation tagging and allows users to quickly identify the parameters and return value for the createObject method in the Dns_Domain class:

*
     * @param string $objectType
     * @param object $templateObject
     *
     * @return SoftLayer_Dns_Domain
     */
   public static function createObject($objectType = __CLASS__, $templateObject)

Keeping consistent when it comes to terminology is a bit more difficult; especially if there have been no standards in place before. As an example, we can look to one of the most common elements of hosting: the server. Some people call this a "box," a "physical instance" or simply "hardware." The server may be a name server, a mail server, a database server or a web server.

If your company has adopted a term, use that term. If they haven't, decide on a term with your coworkers and stick to it. It's important to be as specific as possible in your documentation to avoid any confusion, and when you adopt specific terms in your documentation, you'll also find that this consistency will carry over into conversations and meetings. As a result, training new team members on your code will go more smoothly, and it will be easier for other people to assist in maintaining your code's documentation.

Bonus: It's much easier to search and replace when you only have to search for one term.

3. Forget What You Know About Your Code ... But Only Temporarily

Regardless of the industry, people who write their own documentation tend to omit pertinent information about the topic. When I train technical writers, I use the peanut butter and jelly example: How would you explain the process of making a peanut butter and jelly sandwich? Many would-be instructors omit things that would result in a very poorly made sandwich ... if one could be made at all. If you don't tell the reader to get the jelly from the cupboard, how can they put jelly on the sandwich? It's important to ask yourself when writing, "Is there anything that I take for granted about this piece of code that other users might need or want to know?"

Think about a coding example where a method calls one or more methods automatically in order to do its job or a method acts like another method. In our API, the createObjects method uses the logic of the createObject method that we just discussed. While some developers may pick up on the connection based on the method's name, it is still important to reference the similarities so they can better understand exactly how the code works. We do this in two ways: First, we state that createObjects follows the logic of createObject in the overview. Second, we note that createObject is a related method. The code below shows exactly how we've implemented this:

     * @SLDNDocumentation Service Description Create multiple domains at once.
     *
     * @SLDNDocumentation Method Overview <<< EOT
     * Create multiple domains on the SoftLayer name servers. Each domain record
     * passed to ''createObjects'' follows the logic in the SoftLayer_Dns_Domain
     * ''createObject'' method.
     * EOT
     *
     * @SLDNDocumentation Method Associated Method SoftLayer_Dns_Domain::createObject

4. Peer Review

The last rule, and one that should not be skipped, is to always have a peer look over your documentation. There really isn't a lot of depth behind this one. In Development, we try to peer review documentation during the code review process. If new content is written during code changes or additions, developers can add content reviewers, who have the ability to add notes with revisions, suggestions and questions. Once all parties are satisfied with the outcome, we close out the review in the system and the content is updated in the next code release. With peer review of documentation, you'll catch typos, inconsistencies and gaps. It always helps to have a second set of eyes before your content hits its users.

Writing better documentation really is that easy: Know your audience, be consistent, don't take your knowledge for granted, and use the peer review process. I put these four rules into practice every day as a technical writer at SoftLayer, and they make my life so much easier. By following these rules, you'll have better documentation for your users and will hopefully eliminate some of that pesky technical debt.

Go, and create better documentation!

-Sarah

September 20, 2013

Building a Mobile App with jQuery Mobile: The Foundation

Based on conversations I've had in the past, at least half of web developers I've met have admitted to cracking open an Objective-C book at some point in their careers with high hopes of learning mobile development ... After all, who wouldn't want to create "the next big thing" for a market growing so phenomenally every year? I count myself among that majority: I've been steadily learning Objective-C over the past year, dedicating a bit of time every day, and I feel like I still lack skill-set required to create an original, complex application. Wouldn't it be great if we web developers could finally get our shot in the App Store without having to unlearn and relearn the particulars of coding a mobile application?

Luckily for us: There is!

The rock stars over at jQuery have created a framework called jQuery Mobile that allows developers to create cross-platform, responsive applications on a HTML5-based jQuery foundation. The framework allows for touch and mouse event support, so you're able to publish across multiple platforms, including iOS, Android, Blackberry, Kindle, Nook and on and on and on. If you're able to create web applications with jQuery, you can now create an awesome cross-platform app. All you have to do is create an app as if it was a dynamic HTML5 web page, and jQuery takes care of the rest.

Let's go through a real-world example to show this functionality in action. The first thing we need to do is fill in the <head> content with all of our necessary jQuery libraries:

<!DOCTYPE html>
<html>
<head>
    <title>SoftLayer Hello World!</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="http://code.jquery.com/mobile/1.3.2/jquery.mobile-1.3.2.min.css" />
    <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
    <script src="http://code.jquery.com/mobile/1.3.2/jquery.mobile-1.3.2.min.js"></script>
</head>

Now let's create a framework for our simplistic app in the <body> section of our page:

<body>
    <div data-role="page">
        <div data-role="header">
            <h1>My App!</h1>
        </div>
 
        <div data-role="content">
            <p>This is my application! Pretty cool, huh?</p>
        </div>
 
        <div data-role="footer">
            <h1>Bottom Footer</h1>
        </div>
 
    </div>
</body>
</html>

Even novice web developers should recognize the structure above. You have a header, content and a footer just as you would in a regular web page, but we're letting jQuery apply some "native-like" styling to those sections with the data-role attributes. This is what our simple app looks like so far: jQuery Mobile App Screenshot #1

While it's not very fancy (yet), you see that the style is well suited to the iPhone I'm using to show it off. Let's spice it up a bit and add a navigation bar. Since we want the navigation to be a part of the header section of our app, let's add an unordered list there:

<div data-role="header">
    <h1>My App!</h1>
        <div data-role="navbar">
            <ul>
                <li><a href="#home" class="ui-btn-active" data-icon="home" data-theme="b">Home</a></li>
                <li><a href="#softlayer_cool_news" data-icon="grid" data-theme="b">SL Cool News!</a></li>
                <li><a href="#softlayer_cool_stuff" data-icon="star" data-theme="b">SL Cool Stuff!</a></li>
            </ul>
        </div>
    </div>

You'll notice again that it's not much different from regular HTML. We've created a navbar div with an unordered list of menu items we'd like to add to the header: Home, SL Cool News and SL Cool Stuff. Notice in the anchor tag of each that there's an attribute called data-icon which defines which graphical icon we want to represent the navigation item. Let's have a peek at what it looks like now: jQuery Mobile App Screenshot #2

Our app isn't doing a whole lot yet, but you can see from our screenshot that the pieces are starting to come together nicely. Because we're developing our mobile app as an HTML5 app first, we're able to make quick changes and see those changes in real time from our phone's browser. Once we get the functionality we want to into our app, we can use a tool such as PhoneGap or Cordova to package our app into a ready-to-use standalone iPhone app (provided you're enrolled in the Apple Development Program, of course), or we can leave the app as-is for a very nifty mobile browser application.

In my next few blogs, I plan to expand on this topic by showing you some of the amazingly easy (and impressive) functionality available in jQuery Mobile. In the meantime, go grab a copy of jQuery Mobile and start playing around with it!

-Cassandra

September 16, 2013

Sysadmin Tips and Tricks - Using strace to Monitor System Calls

Linux admins often encounter rogue processes that die without explanation, go haywire without any meaningful log data or fail in other interesting ways without providing useful information that can help troubleshoot the problem. Have you ever wished you could just see what the program is trying to do behind the scenes? Well, you can — strace (system trace) is very often the answer. It is included with most distros' package managers, and the syntax should be pretty much identical on any Linux platform.

First, let's get rid of a misconception: strace is not a "debugger," and it isn't a programmer's tool. It's a system administrator's tool for monitoring system calls and signals. It doesn't involve any sophisticated configurations, and you don't have to learn any new commands ... In fact, the most common uses of strace involve the bash commands you learned the early on:

  • read
  • write
  • open
  • close
  • stat
  • fork
  • execute (execve)
  • chmod
  • chown

 

You simply "attach" strace to the process, and it will display all the system calls and signals resulting from that process. Instead of executing the command's built-in logic, strace just makes the process's normal calls to the system and returns the results of the command with any errors it encountered. And that's where the magic lies.

Let's look an example to show that behavior in action. First, become root — you'll need to be root for strace to function properly. Second, make a simple text file called 'test.txt' with these two lines in it:

# cat test.txt
Hi I'm a text file
there are only these two lines in me.

Now, let's execute the cat again via strace:

$ strace cat test.txt 
execve("/bin/cat", ["cat", "test.txt"], [/* 22 vars */]) = 0
brk(0)  = 0x9b7b000
uname({sys="Linux", node="ip-208-109-127-49.ip.secureserver.net", ...}) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=30671, ...}) = 0
mmap2(NULL, 30671, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f35000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000_\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1594552, ...}) = 0
mmap2(NULL, 1320356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7df2000
mmap2(0xb7f2f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13c) = 0xb7f2f000
mmap2(0xb7f32000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f32000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7df1000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7df0000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7df1b80, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7f2f000, 8192, PROT_READ) = 0
mprotect(0xb7f57000, 4096, PROT_READ) = 0
munmap(0xb7f35000, 30671) = 0
brk(0)  = 0x9b7b000
brk(0x9b9c000) = 0x9b9c000
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
open("test.txt", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=57, ...}) = 0
read(3, "Hi I'm a text file\nthere are onl"..., 4096) = 57
write(1, "Hi I'm a text file\nthere are onl"..., 57Hi I’m a text file
there are only these two lines in me.
) = 57
read(3, "", 4096) = 0
close(3) = 0
close(1) = 0
exit_group(0) = ?

Now that return may look really arcane, but if you study it a little bit, you'll see that it includes lots of information that even an ordinary admin can easily understand. The first line returned includes the execve system call where we'd execute /bin/cat with the parameter of test.txt. After that, you'll see the cat binary attempt to open some system libraries, and the brk and mmap2 calls to allocate memory. That stuff isn't usually particularly useful in the context we're working in here, but it's important to understand what's going on. What we're most interested in are often open calls:

open("test.txt", O_RDONLY|O_LARGEFILE) = 3

It looks like when we run cat test.txt, it will be opening "test.txt", doesn't it? In this situation, that information is not very surprising, but imagine if you are in a situation were you don't know what files a given file is trying to open ... strace immediately makes life easier. In this particular example, you'll see that "= 3" at the end, which is a temporary sort of "handle" for this particular file within the strace output. If you see a "read" call with '3' as the first parameter after this, you know it's reading from that file:

read(3, "Hi I'm a text file\nthere are onl"..., 4096) = 57

Pretty interesting, huh? strace defaults to just showing the first 32 or so characters in a read, but it also lets us know that there are 57 characters (including special characters) in the file! After the text is read into memory, we see it writing it to the screen, and delivering the actual output of the text file. Now that's a relatively simplified example, but it helps us understand what's going on behind the scenes.

Real World Example: Finding Log Files

Let's look at a real world example where we'll use strace for a specific purpose: You can't figure out where your Apache logs are being written, and you're too lazy to read the config file (or perhaps you can't find it). Wouldn't it be nice to follow everything Apache is doing when it starts up, including opening all its log files? Well you can:

strace -Ff -o output.txt -e open /etc/init.d/httpd restart

We are executing strace and telling it to follow all forks (-Ff), but this time we'll output to a file (-o output.txt) and only look for 'open' system calls to keep some of the chaff out of the output (-e open), and execute '/etc/init.d/httpd restart'. This will create a file called "output.txt" which we can use to find references to our log files:

#cat output.txt | grep log
[pid 13595] open("/etc/httpd/modules/mod_log_config.so", O_RDONLY) = 4
[pid 13595] open("/etc/httpd/modules/mod_logio.so", O_RDONLY) = 4
[pid 13595] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 10
[pid 13595] open("/etc/httpd/logs/ssl_error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 11
[pid 13595] open("/etc/httpd/logs/access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 12
[pid 13595] open("/etc/httpd/logs/cm4msaa7.com", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 13
[pid 13595] open("/etc/httpd/logs/ssl_access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 14
[pid 13595] open("/etc/httpd/logs/ssl_request_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 15
[pid 13595] open("/etc/httpd/modules/mod_log_config.so", O_RDONLY) = 9
[pid 13595] open("/etc/httpd/modules/mod_logio.so", O_RDONLY) = 9
[pid 13596] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 10
[pid 13596] open("/etc/httpd/logs/ssl_error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 11
open("/etc/httpd/logs/access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 12
open("/etc/httpd/logs/cm4msaa7.com", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 13
open("/etc/httpd/logs/ssl_access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 14
open("/etc/httpd/logs/ssl_request_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 15

The log files jump out at you don't they? Because we know that Apache will want to open its log files when it starts, all we have to do is we follow all the system calls it makes when it starts, and we'll find all of those files. Easy, right?

Real World Example: Locating Errors and Failures

Another valuable use of strace involves looking for errors. If a program fails when it makes a system call, you'll want to be able pinpoint any errors that might have caused that failure as you troubleshoot. In all cases where a system call fails, strace will return a line with "= -1" in the output, followed by an explanation. Note: The space before -1 is very important, and you'll see why in a moment.

For this example, let's say Apache isn't starting for some reason, and the logs aren't telling ua anything about why. Let's run strace:

strace -Ff -o output.txt -e open /etc/init.d/httpd start

Apache will attempt to restart, and when it fails, we can grep our output.txt for '= -1' to see any system calls that failed:

$ cat output.txt | grep '= -1'
[pid 13748] open("/etc/selinux/config", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libnsl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/etc/gai.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

With experience, you'll come to understand which errors matter and which ones don't. Most often, the last error is the most significant. The first few lines show the program trying different libraries to see if they are available, so they don't really matter to us in our pursuit of what's going wrong with our Apache restart, so we scan down and find that the last line:

[pid 13748] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

After a little investigation on that file, I see that some maniac as set the immutable attribute:

lsattr /etc/httpd/logs/error_log
----i-------- /etc/httpd/logs/error_log

Our error couldn't be found in the log file because Apache couldn't open it! You can imagine how long it might take to figure out this particular problem without strace, but with this useful tool, the cause can be found in minutes.

Go and Try It!

All major Linux distros have strace available — just type strace at the command line for the basic usage. If the command is not found, install it via your distribution's package manager. Get in there and try it yourself!

For a fun first exercise, bring up a text editor in one terminal, then strace the editor process in another with the -p flag (strace -p <process_id>) since we want to look at an already-running process. When you go back and type in the text editor, the system calls will be shown in strace as you type ... You see what's happening in real time!

-Lee

August 29, 2013

HTML5 Tips and Tricks - Local Storage

As I'm sure you've heard by now: HTML5 is all the rage. People are creating amazing games with canvases, media interactivity with embeds and mobile/response sites with viewports. We've come a long way since 1990s iFrames! In this blog, I wanted to introduce you to an HTML5 tool that you might find useful: Local Web Storage — quite possibly the holy grail of web development!

In the past (and still most of the present), web sites store information about a surfer's preferences/choices via cookies. With that information, a site can be customized for a specific user, and that customization makes for a much better user experience. For example, you might select your preferred language when you first visit a website, and when you return, you won't have to make that selection again. You see similar functionality at work when you select themes/colors on a site or when you enlist help from one of those "remember me" checkboxes next to where you log into an account. The functionality that cookies enable is extremely valuable, but it's often inefficient.

You might be aware of some of the drawbacks to using cookies (such as size limitation (4KB) and privacy issues with unencrypted cookies), but I believe the most significant problem with cookies is overhead. Even if you limit your site to just a few small cookies per user, as your userbase grows into the thousands and tens of thousands, you'll notice that you're transferring a LOT data of over HTTP (and those bandwidth bills might start adding up). Cookies are stored on the user's computer, so every time that user visits your domain, the browser is transferring cookies to your server with every HTTP request. The file size for each of these transactions is tiny, but at scale, it can feel like death by a thousand cuts.

Enter HTML5 and local storage.

Rather than having to transmit data (cookies) to a remote web server, HTML5 allows a site to store information within the client web browser. The information you need to customize your user's experience doesn't have to travel from the user's hard drive to your server because the customization is stored in (and applied by) the user's browser. And because data in local storage isn't sent with every HTTP request like it is with cookies, the capacity of local storage is a whopping 5MB per domain (though I wouldn't recommend pushing that limit).

Let's check out how easy it is to use HTML5's local storage with JavaScript:

<script type="text/javascript">
    localStorage.setItem('preferredLanguage', 'EN');
</script>

Boom! We just set our first variable. Once that variable has been set in local storage for a given user, that user can close his or her browser and return to see the correct variable still selected when we retrieve it on our site:

<script type="text/javascript">
    localStorage.getItem('preferredLanguage');
</script>

All of the lead-up in this post, you're probably surprised by the simplicity of the actual coding, but that's one of the biggest reasons HTML local storage is such an amazing tool to use. We set our user's preferred language in local storage and retrieved it from local storage with a few simple lines. If want to set an "expiration" for a given variable in local storage the way you would for a cookie, you can script in an expiration variable that removes an entry when you say the time's up:

<script type="text/javascript">
    localStorage.removeItem('preferredLanguage');
</script>

If we stopped here, you'd have a solid fundamental understanding of how HTML5 local storage works, but I want to take you beyond the standard functionality of local storage. You see, local storage is intended primarily to store only strings, so if we wanted to store an object, we'd be out of luck ... until we realized that developers can find workarounds for everything!

Using some handy JSON, we can stringify and parse any object we want to store as local storage:

<script type="text/javascript">
    var user = {};
    user.name = 'CWolff';
    user.job = 'Software Engineer II';
    user.rating = 'Awesome';
 
    //If we were to stop here, the entry would only read as [object Object] when we try to retrieve it, so we stringify with JSON!
    localStorage.setItem('user', JSON.stringify(user));
 
    //Retrieve the object and assign it to a variable
    var getVar = JSON.parse(localStorage.getItem('user'));
 
    //We now have our object in a variable that we can play with, let's try it out
    alert(getVar.name);
    alert(getVar.job);
    alert(getVar.rating);
</script>

If you guys have read any of my other blogs, you know that I tend to write several blogs in a series before I move on to the next big topic, and this won't be an exception. Local storage is just the tip of the iceberg of what HTML5 can do, so buckle up and get ready to learn more about the crazy features and functionality of this next-generation language.

Try local storage for yourself ... And save yourself from the major headache of trying to figure out where all of your bandwidth is going!

-Cassandra

July 16, 2013

Riak Performance Analysis: Bare Metal v. Virtual

In December, I posted a MongoDB performance analysis that showed the quantitative benefits of using bare metal servers for MongoDB workloads. It should come as no surprise that in the wake of SoftLayer's Riak launch, we've got some similar data to share about running Riak on bare metal.

To run this test, we started by creating five-node clusters with Riak 1.3.1 on SoftLayer bare metal servers and on a popular competitor's public cloud instances. For the SoftLayer environment, we created these clusters using the Riak Solution Designer, so the nodes were all provisioned, configured and clustered for us automatically when we ordered them. For the public cloud virtual instance Riak cluster, each node was provisioned indvidually using a Riak image template and manually configured into a cluster after all had come online. To optimize for Riak performance, I made a few tweaks at the OS level of our servers (running CentOS 64-bit):

Noatime
Nodiratime
barrier=0
data=writeback
ulimit -n 65536

The common Noatime and Nodiratime settings eliminate the need for writes during reads to help performance and disk wear. The barrier and writeback settings are a little less common and may not be what you'd normally set. Although those settings present a very slight risk for loss of data on disk failure, remember that the Riak solution is deployed in five-node rings with data redundantly available across multiple nodes in the ring. With that in mind and considering each node also being deployed with a RAID10 storage array, you can see that the minor risk for data loss on the failure of a single disk in the entire solution would have no impact on the entire data set (as there are plenty of redundant copies for that data available). Given the minor risk involved, the performance increases of those two settings justify their use.

With all of the nodes tweaked and configured into clusters, we set up Basho's test harness — Basho Bench — to remotely simulate load on the deployments. Basho Bench allows you to create a configurable test plan for a Riak cluster by configuring a number of workers to utilize a driver type to generate load. It comes packaged as an Erlang application with a config file example that you can alter to create the specifics for the concurrency, data set size, and duration of your tests. The results can be viewed as CSV data, and there is an optional graphics package that allows you to generate the graphs that I am posting in this blog. A simplified graphic of our test environment would look like this:

Riak Test Environment

The following Basho Bench config is what we used for our testing:

{mode, max}.
{duration, 120}.
{concurrent, 8}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator,{int_to_bin,{uniform_int,1000000}}}.
{value_generator,{exponential_bin,4098,50000}}.
{riakc_pb_ips, [{10,60,68,9},{10,40,117,89},{10,80,64,4},{10,80,64,8},{10,60,68,7}]}.
{riakc_pb_replies, 2}.
{operations, [{get, 10},{put, 1}]}.

To spell it out a little simpler:

Tests Performed

Data Set: 400GB
10:1 Query-to-Update Operations
8 Concurrent Client Connections
Test Duration: 2 Hours

You may notice that in the test cases that use SoftLayer "Medium" Servers, the virtual provider nodes are running 26 virtual compute units against our dual proc hex-core servers (12 cores total). In testing with Riak, memory is important to the operations than CPU resources, so we provisioned the virtual instances to align with the 36GB of memory in each of the "Medium" SoftLayer servers. In the public cloud environment, the higher level of RAM was restricted to packages with higher CPU, so while the CPU counts differ, the RAM amounts are as close to even as we could make them.

One final "housekeeping" note before we dive into the results: The graphs below are pulled directly from the optional graphics package that displays Basho Bench results. You'll notice that the scale on the left-hand side of graphs differs dramatically between the two environments, so a cursory look at the results might not tell the whole story. Click any of the graphs below for a larger version. At the end of each test case, we'll share a few observations about the operations per second and latency results from each test. When we talk about latency in the "key observation" sections, we'll talk about the 99th percentile line — 99% of the results had latency below this line. More simply you could say, "This is the highest latency we saw on this platform in this test." The primary reason we're focusing on this line is because it's much easier to read on the graphs than the mean/median lines in the bottom graphs.

Riak Test 1: "Small" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Small Riak Server Node
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
4 x 500GB SATAII – RAID10
1Gb Bonded Network
Virtual Provider Node
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
4 x 500GB Network Storage – RAID10
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

The SoftLayer environment showed much more consistency in operations per second with an average throughput around 450 Op/sec. The virtual environment throughput varied significantly between about 50 operations per second to more than 600 operations per second with the trend line fluctuating slightly between about 220 Op/sec and 350 Op/sec.

Comparing the latency of get and put requests, the 99th percentile of results in the SoftLayer environment stayed around 50ms for gets and under 200ms for puts while the same metric for the virtual environment hovered around 800ms in gets and 4000ms in puts. The scale of the graphs is drastically different, so if you aren't looking closely, you don't see how significantly the performance varies between the two.

Riak Test 2: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 300GB 15K SAS – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

Similar to the results of Test 1, the throughput numbers from the bare metal environment are more consistent (and are consistently higher) than the throughput results from the virtual instance environment. The SoftLayer environment performed between 1500 and 1750 operations per second on average while the virtual provider environment averaged around 1200 operations per second throughout the test.

The latency of get and put requests in Test 2 also paints a similar picture to Test 1. The 99th percentile of results in the SoftLayer environment stayed below 50ms and under 400ms for puts while the same metric for the virtual environment averaged about 250ms in gets and over 1000ms in puts. Latency in a big data application can be a killer, so the results from the virtual provider might be setting off alarm bells in your head.

Riak Test 3: "Medium" Bare Metal 5-Node Cluster vs Virtual 5-Node Cluster

Servers

SoftLayer Medium Riak Server Node
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
4 x 128GB SSD – RAID10
1Gb Network – Bonded
Virtual Provider Node
26 Virtual Compute Units
64-bit CentOS
30GB RAM
4 x 300GB Network Storage
1Gb Network
 

Results

Riak Performance Analysis

Riak Performance Analysis

Key Observations

In Test 3, we're using the same specs in our virtual provider nodes, so the results for the virtual node environment are the same in Test 3 as they are in Test 2. In this Test, the SoftLayer environment substitutes SSD hard drives for the 15K SAS drives used in Test 2, and the throughput numbers show the impact of that improved I/O. The average throughput of the bare metal environment with SSDs is between 1750 and 2000 operations per second. Those numbers are slightly higher than the SoftLayer environment in Test 2, further distancing the bare metal results from the virtual provider results.

The latency of gets for the SoftLayer environment is very difficult to see in this graph because the latency was so low throughout the test. The 99th percentile of puts in the SoftLayer environment settled between 500ms and 625ms, which was a little higher than the bare metal results from Test 2 but still well below the latency from the virtual environment.

Summary

The results show that — similar to the majority of data-centric applications that we have tested — Riak has more consistent, better performing, and lower latency results when deployed onto bare metal instead of a cluster of public cloud instances. The stark differences in consistency of the results and the latency are noteworthy for developers looking to host their big data applications. We compared the 99th percentile of latency, but the mean/median results are worth checking out as well. Look at the mean and median results from the SoftLayer SSD Node environment: For gets, the mean latency was 2.5ms and the median was somewhere around 1ms. For puts, the mean was between 7.5ms and 11ms and the median was around 5ms. Those kinds of results are almost unbelievable (and that's why I've shared everything involved in completing this test so that you can try it yourself and see that there's no funny business going on).

It's commonly understood that local single-tenant resources that bare metal will always perform better than network storage resources, but by putting some concrete numbers on paper, the difference in performance is pretty amazing. Virtualizing on multi-tenant solutions with network attached storage often introduces latency issues, and performance will vary significantly depending on host load. These results may seem obvious, but sometimes the promise of quick and easy deployments on public cloud environments can lure even the sanest and most rational developer. Some applications are suited for public cloud, but big data isn't one of them. But when you have data-centric apps that require extreme I/O traffic to your storage medium, nothing can beat local high performance resources.

-Harold

July 10, 2013

The Importance of Providing Startups a Sandbox

With the global economy in its current state, it's more important than ever to help inspired value-creators acquire the tools needed to realize their ideas, effect change in the world, and create impact — now. I've had the privilege of working with hundreds of young, innovative companies through Catalyst and our relationships with startup accelerators, incubators and competitions, and I've noticed that the best way for entrepreneurs to create change is to simply let them play! Stick them in a sandbox with a wide variety of free products and services that they can use however they want so that they may find the best method of transitioning from idea to action.

Any attention that entrepreneurs divert from their core business ideas is wasted attention, so the most successful startup accelerators build a bridge for entrepreneurs to the resources they need — from access to hosting service, investors, mentors, and corporate partners to recommendations about summer interns and patent attorneys. That all sounds good in theory, and while it's extremely difficult to bring to reality, startup-focused organizations like MassChallenge make it look easy.

During a recent trip to Boston, I was chatting with Kara Shurmantine and Jibran Malek about what goes on behind the scenes to truly empower startups and entrepreneurs, and they gave me some insight. Startups' needs are constantly shifting, changing and evolving, so MassChallenge prioritizes providing a sandbox chock-full of the best tools and toys to help make life easier for their participants ... and that's where SoftLayer helps. With Kara and Jibran, I got in touch with a few MassChallenge winners to get some insight into their experience from the startup side.

Tish Scolnik, the CEO of Global Research Innovation & Technology (GRIT), described the MassChallenge experience perfectly: "You walk in and you have all these amazing opportunities in front of you, and then in a pretty low pressure environment you can decide what you need at a specific moment." Tish calls it a "buffet table" — an array of delectable opportunities, some combination of which will be the building blocks of a startup's growth curve. Getting SoftLayer products and services for free (along with a plethora of other valuable resources) has helped GRIT create a cutting-edge wheelchair for disabled people in developing countries.

The team from Neumitra, a Silver Winner of MassChallenge 2012, chose to use SoftLayer as an infrastructure partner, and we asked co-founder Rob Goldberg about his experience. He explained that his team valued the ability to choose tools that fit their ever-changing and evolving needs. Neumitra set out to battle stress — the stress you feel every day — and they've garnered significant attention while doing so. With a wearable watch, Neumitra's app tells you when your stress levels are too high and you need to take a break.

Jordan Fliegal, the founder and CEO of CoachUp, another MassChallenge winner, also benefited from playing around in the sandbox. This environment, he says, is constantly "giving to you and giving to you and giving to you without asking for anything in return other than that you work hard and create a company that makes a difference." The result? CoachUp employs 20 people, has recruited thousands of judges, and has raised millions in funding — and is growing at breakneck speed.

If you give inspired individuals a chance and then give them not only the resources that they need, but also a diverse range of resources that they could need, you are guaranteed to help create global impact.

In short: Provide a sandbox. Change the world.

-@KelleyHilborn

July 9, 2013

When to Consider Riak for Your Big Data Architecture

In my Breaking Down 'Big Data' – Database Models, I briefly covered the most common database models, their strengths, and how they handle the CAP theorem — how a distributed storage system balances demands of consistency and availability while maintaining partition tolerance. Here's what I said about Dynamo-inspired databases:

What They Do: Distributed key/value stores inspired by Amazon's Dynamo paper. A key written to a dynamo ring is persisted in several nodes at once before a successful write is reported. Riak also provides a native MapReduce implementation.
Horizontal Scaling: Dynamo-inspired databases usually provide for the best scale and extremely strong data durability.
CAP Balance: Prefer availability over consistency
When to Use: When the system must always be available for writes and effectively cannot lose data.
Example Products: Cassandra, Riak, BigCouch

This type of key/value store architecture is very unique from the document-oriented MongoDB solutions we launched at the end of last year, so we worked with Basho to prioritize development of high-performance Riak solutions on our global platform. Since you already know about MongoDB, let's take a few minutes to meet the new kid on the block.

Riak is a distributed database architected for availability, fault tolerance, operational simplicity and scalability. Riak is masterless, so each node in a Riak cluster is the same and contains a complete, independent copy of the Riak package. This design makes the Riak environment highly fault tolerant and scalable, and it also aids in replication — if a node goes down, you can still read, write and update data.

As you approach the daunting prospect of choosing a big data architecture, there are a few simple questions you need to answer:

  1. How much data do/will I have?
  2. In what format am I storing my data?
  3. How important is my data?

Riak may be the choice for you if [1] you're working with more than three terabytes of data, [2] your data is stored in multiple data formats, and [3] your data must always be available. What does that kind of need look like in real life, though? Luckily, we've had a number of customers kick Riak's tires on SoftLayer bare metal servers, so I can share a few of the use cases we've seen that have benefited significantly from Riak's unique architecture.

Use Case 1 – Digital Media
An advertising company that serves over 10 billion ads per month must be able to quickly deliver its content to millions of end users around the world. Meeting that demand with relational databases would require a complex configuration of expensive, vertically scaled hardware, but it can be scaled out horizontally much easier with Riak. In a matter of only a few hours, the company is up and running with an ad-serving infrastructure that includes a back-end Riak cluster in Dallas with a replication cluster in Singapore along with an application tier on the front end with Web servers, load balancers and CDN.

Use Case 2 – E-commerce
An e-commerce company needs 100-percent availability. If any part of a customer's experience fails, whether it be on the website or in the shopping cart, sales are lost. Riak's fault tolerance is a big draw for this kind of use case: Even if one node or component fails, the company's data is still accessible, and the customer's user experience is uninterrupted. The shopping cart structure is critical, and Riak is built to be available ... It's a perfect match.

As an additional safeguard, the company can take advantage of simple multi-datacenter replication in their Riak Enterprise environment to geographically disperse content closer to its customers (while also serving as an important tool for disaster recovery and backup).

Use Case 3 – Gaming
With customers like Broken Bulb and Peak Games, SoftLayer is no stranger to the gaming industry, so it should come as no surprise that we've seen interesting use cases for Riak from some of our gaming customers. When a game developer incorporated Riak into a new game to store player data like user profiles, statistics and rankings, the performance of the bare metal infrastructure blew him away. As a result, the game's infrastructure was redesigned to also pull gaming content like images, videos and sounds from the Riak database cluster. Since the environment is so easy to scale horizontally, the process on the infrastructure side took no time at all, and the multimedia content in the game is getting served as quickly as the player data.

Databases are common bottlenecks for many applications, but they don't have to be. Making the transition from scaling vertically (upgrading hardware, adding RAM, etc.) to scaling horizontally (spreading the work intelligently across multiple nodes) alleviates many of the pain points for a quickly growing database environment. Have you made that transition? If not, what's holding you back? Have you considered implementing Riak?

-@marcalanjones

May 10, 2013

Understanding and Implementing Coding Standards

Coding standards provide a consistent framework for development within a project and across projects in an organization. A dozen programmers can complete a simple project in a dozen different ways by using unique coding methodologies and styles, so I like to think of coding standards as the "rules of the road" for developers.

When you're driving in a car, traffic is controlled by "standards" such as lanes, stoplights, yield signs and laws that set expectations around how you should drive. When you take a road trip to a different state, the stoplights might be hung horizontally instead of vertically or you'll see subtle variations in signage, but because you're familiar with the rules of the road, you're comfortable with the mechanics of driving in this new place. Coding standards help control development traffic and provide the consistency programmers need to work comfortably with a team across projects. The problem with allowing developers to apply their own unique coding styles to a project is the same as allowing drivers to drive as they wish ... Confusion about lane usage, safe passage through intersections and speed would result in collisions and bottlenecks.

Coding standards often seem restrictive or laborious when a development team starts considering their adoption, but they don't have to be ... They can be implemented methodically to improve the team's efficiency and consistency over time, and they can be as simple as establishing that all instantiations of an object must be referenced with a variable name that begins with a capital letter:

$User = new User();

While that example may seem overly simplistic, it actually makes life a lot easier for all of the developers on a given project. Regardless of who created that variable, every other developer can see the difference between a variable that holds data and one that are instantiates an object. Think about the shapes of signs you encounter while driving ... You know what a stop sign looks like without reading the word "STOP" on it, so when you see a red octagon (in the United States, at least), you know what to do when you approach it in your car. Seeing a capitalized variable name would tell us about its function.

The example I gave of capitalizing instantiated objects is just an example. When it comes to coding standards, the most effective rules your team can incorporate are the ones that make the most sense to you. While there are a few best practices in terms of formatting and commenting in code, the most important characteristics of coding standards for a given team is consistency and clarity.

So how do you go about creating a coding standard? Most developers dislike doing unnecessary work, so the easiest way to create a coding standard is to use an already-existing one. Take a look at any libraries or frameworks you are using in your current project. Do they use any coding standards? Are those coding standards something you can live with or use as a starting point? You are free to make any changes to it you wish in order to best facilitate your team's needs, and you can even set how strict specific coding standards must be adhered to. Take for example left-hand comparisons:

if ( $a == 12 ) {} // right-hand comparison
if ( 12 == $a ) {} // left-hand comparison

Both of these statements are valid but one may be preferred over the other. Consider the following statements:

if ( $a = 12 ) {} // supposed to be a right-hand comparison but is now an assignment
if ( 12 = $a ) {} // supposed to be a left-hand comparison but is now an assignment

The first statement will now evaluate to true due to $a being assigned the value of 12 which will then cause the code within the if-statement to execute (which is not the desired result). The second statement will cause an error, therefore making it obvious a mistake in the code has occurred. Because our team couldn't come to a consensus, we decided to allow both of these standards ... Either of these two formats are acceptable and they'll both pass code review, but they are the only two acceptable variants. Code that deviates from those two formats would fail code review and would not be allowed in the code base.

Coding standards play an important role in efficient development of a project when you have several programmers working on the same code. By adopting coding standards and following them, you'll avoid a free-for-all in your code base, and you'll be able to look at every line of code and know more about what that line is telling you than what the literal code is telling you ... just like seeing a red octagon posted on the side of the road at an intersection.

-@SoftLayerDevs

April 29, 2013

Web Development - Installing mod_security with OWASP

You want to secure your web application, but you don't know where to start. A number of open-source resources and modules exist, but that variety is more intimidating than it is liberating. If you're going to take the time to implement application security, you don't want to put your eggs in the wrong basket, so you wind up suffering from analysis paralysis as you compare all of the options. You want a powerful, flexible security solution that isn't overly complex, so to save you the headache of making the decision, I'll make it for you: Start with mod_security and OWASP.

ModSecurity (mod_security) is an open-source Apache module that acts as a web application firewall. It is used to help protect your server (and websites) from several methods of attack, most common being brute force. You can think of mod_security as an invisible layer that separates users and the content on your server, quietly monitoring HTTP traffic and other interactions. It's easy to understand and simple to implement.

The challenge is that without some advanced configuration, mod_security isn't very functional, and that advanced configuration can get complex pretty quickly. You need to determine and set additional rules so that mod_security knows how to respond when approached with a potential threat. That's where Open Web Application Security Project (OWASP) comes in. You can think of the OWASP as an enhanced core ruleset that the mod_security module will follow to prevent attacks on your server.

The process of getting started with mod_security and OWASP might seem like a lot of work, but it's actually quite simple. Let's look at the installation and configuration process in a CentOS environment. First, we want to install the dependencies that mod_security needs:

## Install the GCC compiler and mod_security dependencies ##
$ sudo yum install gcc make
$ sudo yum install libxml2 libxml2-devel httpd-devel pcre-devel curl-devel

Now that we have the dependencies in place, let's install mod_security. Unfortunately, there is no yum for mod_security because it is not a maintained package, so you'll have to install it directly from the source:

## Get mod_security from its source ##
$ cd /usr/src
$ git clone https://github.com/SpiderLabs/ModSecurity.git

Now that we have mod_security on our server, we'll install it:

## Install mod_security ##
$ cd ModSecurity
$ ./configure
$ make install

And we'll copy over the default mod_security configuration file into the necessary Apache directory:

## Copy configuration file ##
$ cp modsecurity.conf-recommended /etc/httpd/conf.d/modsecurity.conf

We've got mod_security installed now, so we need to tell Apache about it ... It's no use having mod_security installed if our server doesn't know it's supposed to be using it:

## Apache configuration for mod_security ##
$ vi /etc/httpd/conf/httpd.conf

We'll need to load our Apache config file to include our dependencies (BEFORE the mod_security module) and the mod_security file module itself:

## Load dependencies ##
LoadFile /usr/lib/libxml2.so
LoadFile /usr/lib/liblua5.1.so
## Load mod_security ##
LoadModule security2_module modules/mod_security2.so

We'll save our configuration changes and restart Apache:

## Restart Apache! ##
$ sudo /etc/init.d/httpd restart

As I mentioned at the top of this post, our installation of mod_security is good, but we want to enhance our ruleset with the help of OWASP. If you've made it this far, you won't have a problem following a similar process to install OWASP:

## OWASP ##
$ cd /etc/httpd/
$ git clone https://github.com/SpiderLabs/owasp-modsecurity-crs.git
$ mv owasp-modsecurity-crs modsecurity-crs

Just like with mod_security, we'll set up our configuration file:

## OWASP configuration file ##
$ cd modsecurity-crs
$ cp modsecurity_crs_10_setup.conf.example modsecurity_crs_10_config.conf

Now we have mod_security and the OWASP core ruleset ready to go! The last step we need to take is to update the Apache config file to set up our basic ruleset:

## Apache configuration ##
$ vi /etc/httpd/conf/httpd.conf

We'll add an IfModule and point it to our new OWASP rule set at the end of the file:

<IfModule security2_module>
    Include modsecurity-crs/modsecurity_crs_10_config.conf
    Include modsecurity-crs/base_rules/*.conf
</IfModule>

And to complete the installation, we save the config file and restart Apache:

## Restart Apache! ##
$ sudo /etc/init.d/httpd restart

And we've got mod_security installed with the OWASP core ruleset! With this default installation, we're leveraging the rules the OWASP open source community has come up with, and we have the flexibility to tweak and enhance those rules as our needs dictate. If you have any questions about this installation or you have any other technical blog topics you'd like to hear from us about, please let us know!

-Cassandra

Subscribe to development