Posts Tagged 'Database'

December 20, 2012

MongoDB Performance Analysis: Bare Metal v. Virtual

Developers can be cynical. When "the next great thing in technology" is announced, I usually wait to see how it performs before I get too excited about it ... Show me how that "next great thing" compares apples-to-apples with the competition, and you'll get my attention. With the launch of MongoDB at SoftLayer, I'd guess a lot of developers outside of SoftLayer and 10gen have the same "wait and see" attitude about the new platform, so I put our new MongoDB engineered servers to the test.

When I shared MongoDB architectural best practices, I referenced a few of the significant optimizations our team worked with 10gen to incorporate into our engineered servers (cheat sheet). To illustrate the impact of these changes in MongoDB performance, we ran 10gen's recommended benchmarking harness (freely available for download and testing of your own environment) on our three tiers of engineered servers alongside equivalent shared virtual environments commonly deployed by the MongoDB community. We've made a pretty big deal about the performance impact of running MongoDB on optimized bare metal infrastructure, so it's time to put our money where our mouth is.

The Testing Environment

For each of the available SoftLayer MongoDB engineered servers, data sets of 512kb documents were preloaded onto single MongoDB instances. The data sets were created with varying size compared to available memory to allow for data sets that were both larger (2X) and smaller than available memory. Each test also ensured that the data set was altered during the test run frequently enough to prevent the queries from caching all of the data into memory.

Once the data sets were created, JMeter server instances with 4 cores and 16GB of RAM were used to drive 'benchrun' from the 10gen benchmarking harness. This diagram illustrates how we set up the testing environment (click for a better look):

MongoDB Performance Analysis Setup

These Jmeter servers function as the clients generating traffic on the MongoDB instances. Each client generated random query and update requests with a ratio of six queries per update (The update requests in the test were to ensure that data was not allowed to fully cache into memory and never exercise reads from disk). These tests were designed to create an extreme load on the servers from an exponentially increasing number of clients until the system resources became saturated, and we recorded the resulting performance of the MongoDB application.

At the Medium (MD) and Large (LG) engineered server tiers, performance metrics were run separately for servers using 15K SAS hard drive data mounts and servers using SSD hard drive data mounts. If you missed the post comparing the IOPS statistics between different engineered server hard drive configurations, be sure to check it out. For a better view of the results in a given graph, click the image included in the results below to see a larger version.

Test Case 1: Small MongoDB Engineered Servers vs Shared Virtual Instance

Servers

Small (SM) MongoDB Engineered Server
Single 4-core Intel 1270 CPU
64-bit CentOS
8GB RAM
2 x 500GB SATAII - RAID1
1Gb Network
Virtual Provider Instance
4 Virtual Compute Units
64-bit CentOS
7.5GB RAM
2 x 500GB Network Storage - RAID1
1Gb Network
 

Tests Performed

Small Data Set (8GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 32
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Test Case 2: Medium MongoDB Engineered Servers vs Shared Virtual Instance

Servers (15K SAS Data Mount Comparison)

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
4 x 300GB 15K SAS - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage - RAID1 (Journal Mount)
4 x 300GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Servers (SSD Data Mount Comparison)

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
64-bit CentOS
36GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
4 x 400GB SSD - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
30GB RAM
2 x 64GB Network Storage - RAID1 (Journal Mount)
4 x 300GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (32GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Test Case 3: Large MongoDB Engineered Servers vs Shared Virtual Instance

Servers (15K SAS Data Mount Comparison)

Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
6 x 600GB 15K SAS - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage - RAID1 (Journal Mount)
6 x 600GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Servers (SSD Data Mount Comparison)

Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
64-bit CentOS
128GB RAM
2 x 64GB SSD - RAID1 (Journal Mount)
6 x 400GB SSD - RAID10 (Data Mount)
1Gb Network - Bonded
Virtual Provider Instance
26 Virtual Compute Units
64-bit CentOS
64GB RAM (Maximum available on this provider)
2 x 64GB Network Storage - RAID1 (Journal Mount)
6 x 600GB Network Storage - RAID10 (Data Mount)
1Gb Network
 

Tests Performed

Small Data Set (64GB of .5mb documents)
200 iterations of 6:1 query-to-update operations
Concurrent client connections exponentially increased from 1 to 128
Test duration spanned over 48 hours
Average Read Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Read Operations per Second
by Concurrent ClientMongoDB Performance Analysis
Average Write Operations per Second
by Concurrent Client
MongoDB Performance Analysis
Peak Write Operations per Second
by Concurrent ClientMongoDB Performance Analysis

Impressions from Performance Testing

The results speak for themselves. Running a Mongo DB big data solution on a shared virtual environment has significant drawbacks when compared to running MongoDB on a single-tenant bare metal offering. Disk I/O is by far the most limiting resource for MongoDB, and relying on shared network-attached storage (with much lower disk I/O) makes this limitation very apparent. Beyond the average and peak statistics above, performance varied much more significantly in the virtual instance environment, so it's not as consistent and predictable as a bare metal.

Highlights:

  • When a working data set is smaller than available memory, query performance increases.
  • The number of clients performing queries has an impact on query performance because more data is being actively cached at a rapid rate.
  • The addition of a separate Journal Mount volume significantly improves performance. Because the Small (SM) engineered server does not include a secondary mount for Journals, whenever MongoDB began to journal, the disk I/O associated with journalling was disruptive to the query and update operations performed on the Data Mount.
  • The best deployments in terms of operations per second, stability and control were the configurations with a RAID10 SSD Data Mount and a RAID1 SSD Journal Mount. These configurations are available in both our Medium and Large offerings, and I'd highly recommend them.

-Harold

December 17, 2012

Big Data at SoftLayer: The Importance of IOPS

The jet flow gates in the Hoover Dam can release up to 73,000 cubic feet — the equivalent of 546,040 gallons — of water per second at 120 miles per hour. Imagine replacing those jet flow gates with a single garden hose that pushes 25 gallons per minute (or 0.42 gallons per second). Things would get ugly pretty quickly. In the same way, a massive "big data" infrastructure can be crippled by insufficient IOPS.

IOPS — Input/Output Operations Per Second — measure computer storage in terms of the number of read and write operations it can perform in a second. IOPS are a primary concern for database environments where content is being written and queried constantly, and when we take those database environments to the extreme (big data), the importance of IOPS can't be overstated: If you aren't able perform database reads and writes quickly in a big data environment, it doesn't matter how many gigabytes, terabytes or petabytes you have in your database ... You won't be able to efficiently access, add to or modify your data set.

As we worked with 10gen to create, test and tweak SoftLayer's MongoDB engineered servers, our primary focus centered on performance. Since the performance of massively scalable databases is dictated by the read and write operations to that database's data set, we invested significant resources into maximizing the IOPS for each engineered server ... And that involved a lot more than just swapping hard drives out of servers until we found a configuration that worked best. Yes, "Disk I/O" — the amount of input/output operations a given disk can perform — plays a significant role in big data IOPS, but many other factors limit big data performance. How is performance impacted by network-attached storage? At what point will a given CPU become a bottleneck? How much RAM should included in a base configuration to accommodate the load we expect our users to put on each tier of server? Are there operating system changes that can optimize the performance of a platform like MongoDB?

The resulting engineered servers are a testament to the blood, sweat and tears that were shed in the name of creating a reliable, high-performance big data environment. And I can prove it.

Most shared virtual instances — the scalable infrastructure many users employ for big data — use network-attached storage for their platform's storage. When data has to be queried over a network connection (rather than from a local disk), you introduce latency and more "moving parts" that have to work together. Disk I/O might be amazing on the enterprise SAN where your data lives, but because that data is not stored on-server with your processor or memory resources, performance can sporadically go from "Amazing" to "I Hate My Life" depending on network traffic. When I've tested the IOPS for network-attached storage from a large competitor's virtual instances, I saw an average of around 400 IOPS per mount. It's difficult to say whether that's "not good enough" because every application will have different needs in terms of concurrent reads and writes, but it certainly could be better. We performed some internal testing of the IOPS for the hard drive configurations in our Medium and Large MongoDB engineered servers to give you an apples-to-apples comparison.

Before we get into the tests, here are the specs for the servers we're using:

Medium (MD) MongoDB Engineered Server
Dual 6-core Intel 5670 CPUs
CentOS 6 64-bit
36GB RAM
1Gb Network - Bonded
Large (LG) MongoDB Engineered Server
Dual 8-core Intel E5-2620 CPUs
CentOS 6 64-bit
128GB RAM
1Gb Network - Bonded
 

The numbers shown in the table below reflect the average number of IOPS we recorded with a 100% random read/write workload on each of these engineered servers. To measure these IOPS, we used a tool called fio with an 8k block size and iodepth at 128. Remembering that the virtual instance using network-attached storage was able to get 400 IOPS per mount, let's look at how our "base" configurations perform:

Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 2937
Random Write IOPS - /var/lib/mongo/logs 1306
Random Read IOPS - /var/lib/mongo/data 1720
Random Write IOPS - /var/lib/mongo/data 772
Random Read IOPS - /var/lib/mongo/data/journal 19659
Random Write IOPS - /var/lib/mongo/data/journal 8869
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 4 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 30269
Random Write IOPS - /var/lib/mongo/logs 13124
Random Read IOPS - /var/lib/mongo/data 33757
Random Write IOPS - /var/lib/mongo/data 14168
Random Read IOPS - /var/lib/mongo/data/journal 19644
Random Write IOPS - /var/lib/mongo/data/journal 8882
   
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 4820
Random Write IOPS - /var/lib/mongo/logs 2080
Random Read IOPS - /var/lib/mongo/data 2461
Random Write IOPS - /var/lib/mongo/data 1099
Random Read IOPS - /var/lib/mongo/data/journal 19639
Random Write IOPS - /var/lib/mongo/data/journal 8772
 
Large - 2 x 64GB SSD RAID1 (Journal) - 6 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32403
Random Write IOPS - /var/lib/mongo/logs 13928
Random Read IOPS - /var/lib/mongo/data 34536
Random Write IOPS - /var/lib/mongo/data 15412
Random Read IOPS - /var/lib/mongo/data/journal 19578
Random Write IOPS - /var/lib/mongo/data/journal 8835

Clearly, the 400 IOPS per mount results you'd see in SAN-based storage can't hold a candle to the performance of a physical disk, regardless of whether it's SAS or SSD. As you'd expect, the "Journal" reads and writes have roughly the same IOPS between all of the configurations because all four configurations use 2 x 64GB SSD drives in RAID1. In both configurations, SSD drives provide better Data mount read/write performance than the 15K SAS drives, and the results suggest that having more physical drives in a Data mount will provide higher average IOPS. To put that observation to the test, I maxed out the number of hard drives in both configurations (10 in the 2U MD server and 34 in the 4U LG server) and recorded the results:

Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 300GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 7175
Random Write IOPS - /var/lib/mongo/logs 3481
Random Read IOPS - /var/lib/mongo/data 6468
Random Write IOPS - /var/lib/mongo/data 1763
Random Read IOPS - /var/lib/mongo/data/journal 18383
Random Write IOPS - /var/lib/mongo/data/journal 8765
   
Medium - 2 x 64GB SSD RAID1 (Journal) - 10 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 32160
Random Write IOPS - /var/lib/mongo/logs 12181
Random Read IOPS - /var/lib/mongo/data 34642
Random Write IOPS - /var/lib/mongo/data 14545
Random Read IOPS - /var/lib/mongo/data/journal 19699
Random Write IOPS - /var/lib/mongo/data/journal 8764
   
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 600GB 15k SAS RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 17566
Random Write IOPS - /var/lib/mongo/logs 11918
Random Read IOPS - /var/lib/mongo/data 9978
Random Write IOPS - /var/lib/mongo/data 6526
Random Read IOPS - /var/lib/mongo/data/journal 18522
Random Write IOPS - /var/lib/mongo/data/journal 8722
 
Large - 2 x 64GB SSD RAID1 (Journal) - 34 x 400GB SSD RAID10 (Data)
Random Read IOPS - /var/lib/mongo/logs 34220
Random Write IOPS - /var/lib/mongo/logs 15388
Random Read IOPS - /var/lib/mongo/data 35998
Random Write IOPS - /var/lib/mongo/data 17120
Random Read IOPS - /var/lib/mongo/data/journal 17998
Random Write IOPS - /var/lib/mongo/data/journal 8822

It should come as no surprise that by adding more drives into the configuration, we get better IOPS, but you might be wondering why the results aren't "betterer" when it comes to the IOPS in the SSD drive configurations. While the IOPS numbers improve going from four to ten drives in the medium engineered server and six to thirty-four drives in the large engineered server, they don't increase as significantly as the IOPS differences in the SAS drives. This is what I meant when I explained that several factors contribute to and potentially limit IOPS performance. In this case, the limiting factor throttling the (ridiculously high) IOPS is the RAID card we are using in the servers. We've been working with our RAID card vendor to test a new card that will open a little more headroom for SSD IOPS, but that replacement card doesn't provide the consistency and reliability we need for these servers (which is just as important as speed).

There are probably a dozen other observations I could point out about how each result compares with the others (and why), but I'll stop here and open the floor for you. Do you notice anything interesting in the results? Does anything surprise you? What kind of IOPS performance have you seen from your server/cloud instance when running a tool like fio?

-Kelly

December 4, 2012

Big Data at SoftLayer: MongoDB

In one day, Facebook's databases ingest more than 500 terabytes of data, Twitter processes 500 million Tweets and Tumblr users publish more than 75 million posts. With such an unprecedented volume of information, developers face significant challenges when it comes to building an application's architecture and choosing its infrastructure. As a result, demand has exploded for "big data" solutions — resources that make it possible to process, store, analyze, search and deliver data from large, complex data sets. In light of that demand, SoftLayer has been working in strategic partnership with 10gen — the creators of MongoDB — to develop a high-performance, on-demand, big data solution. Today, we're excited to announce the launch of specialized MongoDB servers at SoftLayer.

If you've configured an infrastructure to accommodate big data, you know how much of a pain it can be: You choose your hardware, you configure it to run NoSQL, you install an open source NoSQL project that you think will meet your needs, and you keep tweaking your environment to optimize its performance. Assuming you have the resources (and patience) to get everything running efficiently, you'll wind up with the horizontally scalable database infrastructure you need to handle the volume of content you and your users create and consume. SoftLayer and 10gen are making that process a whole lot easier.

Our new MongoDB solutions take the time and guesswork out of configuring a big data environment. We give you an easy-to-use system for designing and ordering everything you need. You can start with a single server or roll out multiple servers in a single replica set across multiple data centers, and in under two hours, an optimized MongoDB environment is provisioned and ready to be used. I stress that it's an "optimized" environment because that's been our key focus. We collaborated with 10gen engineers on hardware and software configurations that provide the most robust performance for MongoDB, and we incorporated many of their MongoDB best practices. The resulting "engineered servers" are big data powerhouses:

MongoDB Configs

From each engineered server base configuration, you can customize your MongoDB server to meet your application's needs, and as you choose your upgrades from the base configuration, you'll see the thresholds at which you should consider upgrading other components. As your data set's size and the number of indexes in your database increase, you'll need additional RAM, CPU, and storage resources, but you won't need them in the same proportions — certain components become bottlenecks before others. Sure, you could upgrade all of the components in a given database server at the same rate, but if, say, you update everything when you only need to upgrade RAM, you'd be adding (and paying for) unnecessary CPU and storage capacity.

Using our new Solution Designer, it's very easy to graphically design a complex multi-site replica set. Once you finalize your locations and server configurations, you'll click "Order," and our automated provisioning system will kick into high gear. It deploys your server hardware, installs CentOS (with OS optimizations to provide MongoDB performance enhancements), installs MongoDB, installs MMS (MongoDB Monitoring Service) and configures the network connection on each server to cluster it with the other servers in your environment. A process that may have taken days of work and months of tweaking is completed in less than four hours. And because everything is standardized and automated, you run much less risk of human error.

MongoDB Configs

One of the other massive benefits of working so closely with 10gen is that we've been able to integrate 10gen's MongoDB Cloud Subscriptions into our offering. Customers who opt for a MongoDB Cloud Subscription get additional MongoDB features (like SSL and SNMP support) and support direct from the MongoDB authority. As an added bonus, since the 10gen team has an intimate understanding of the SoftLayer environment, they'll be able to provide even better support to SoftLayer customers!

You shouldn't have to sacrifice agility for performance, and you shouldn't have to sacrifice performance for agility. Most of the "big data" offerings in the market today are built on virtual servers that can be provisioned quickly but offer meager performance levels relative to running the same database on bare metal infrastructure. To get the performance benefits of dedicated hardware, many users have chosen to build, roll out and tweak their own configurations. With our MongoDB offering, you get the on-demand availability and flexibility of a cloud infrastructure with the raw power and full control of dedicated hardware.

If you've been toying with the idea of rolling out your own big data infrastructure, life just got a lot better for you.

-Duke

March 27, 2012

Tips and Tricks - How to Secure WordPress

As a hobby, I dabble in WordPress, so I thought I'd share a few security features I use to secure my WordPress blogs as soon as they're installed. Nothing in this blog will be earth-shattering, but because security is such a priority, I have no doubt that it will be useful to many of our customers. Often, the answer to the question, "How much security do I need on my site?" is simply, "More," so even if you have a solid foundation of security, you might learn a new trick or two that you can incorporate into your next (or current) WordPress site.

Move wp-config.php

The first thing I do is change the location of my wp-config.php. By default, it's installed in the WordPress parent directory. If the config file is in the parent directory, it can be viewed and accessed by Apache, so I move it out of web/root. Because you're changing the default location of a pretty significant file, you need to tell WordPress how to find it in wp-load.php. Let's say my WordPress runs out of /webroot on my host ... I'd need to make a change around Line 26:

if ( file_exists( ABSPATH . 'wp-config.php') ) {
 
        /** The config file resides in ABSPATH */
        require_once( ABSPATH . 'wp-config.php' );
 
} elseif ( file_exists( dirname(ABSPATH) . '/wp-config.php' ) && ! file_exists( dirname(ABSPATH) . '/wp-settings.php' ) ) {
 
        /** The config file resides one level above ABSPATH but is not part of another install*/
        require_once( dirname(ABSPATH) . '/wp-config.php' );

The code above is the default setup, and the code below is the version with my subtle update incorporated.

if ( file_exists( ABSPATH . 'wp-config.php') ) {
 
        /** The config file resides in ABSPATH */
        require_once( ABSPATH . '../wp-config.php' );
 
} elseif ( file_exists( dirname(ABSPATH) . '..//wp-config.php' ) && ! file_exists( dirname(ABSPATH) . '/wp-settings.php' ) ) {
 
        /** The config file resides one level above ABSPATH but is not part of another install*/
        require_once( dirname(ABSPATH) . '../wp-config.php' );

All we're doing is telling the application that the wp-config.php file is one directory higher. By making this simple change, you ensure that only the application can see your wp-config.php script.

Turn Down Access to /wp-admin

After I make that change, I want to turn down access to /wp-admin. I allow users to contribute on some of my blogs, but I don't want them to do so from /wp-admin; only users with admin rights should be able to access that panel. To limit access to /wp-admin, I recommend the plugin uCan Post. This plugin creates a page that allows users to write posts and submit them within your theme.

But won't a user just be able to navigate to http://site.com/wp-admin? Yes ... Until we add a simple function to our theme's functions.php file to limit that access. At the bottom of your functions.php file, add this:

############ Disable admin access for users ############

add_action('admin_init', 'no_more_dashboard');
function no_more_dashboard() {
  if (!current_user_can('manage_options') && $_SERVER['DOING_AJAX'] != '/wp-admin/admin-ajax.php') {
  wp_redirect(site_url()); exit;
  }
}
 
###########################################################

Log in as a non-admin user, and you'll get redirected to the blog's home page if you try to access the admin panel. Voila!

Start Securing the WordPress Database

Before you go any further, you need to look at WordPress database security. This is the most important piece in my opinion, and it's not just because I'm a DBA. WordPress never needs all permissions. The only permissions WordPress needs to function are ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, INDEX, INSERT, LOCK TABLES, SELECT and UPDATE.

If you run WordPress and MySQL on the same server the permissions grant would look something like:

GRANT ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, INDEX, INSERT, LOCK TABLES, SELECT, UPDATE ON <DATABASE>.* TO <USER>@'localhost' IDENTIFIED BY '<PASSWORD>';

If you have a separate database server, make sure the host of the webserver is allowed to connect to the database server:

GRANT ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, INDEX, INSERT, LOCK TABLES, SELECT, UPDATE ON <DATABASE>.* TO <USER>@'<ip of web server' IDENTIFIED BY '<PASSWORD>';

The password you use should be random, and you should not need to change this. DO NOT USE THE SAME PASSWORD AS YOUR ADMIN ACCOUNT.

By taking those quick steps, we're able to go a long way to securing a default WordPress installation. There are other plugins out there that are great tools to enhance your blog's security, and once you've got the fundamental security updates in place, you might want to check some of them out. Login LockDown is designed to stop brute force login attempts, and Secure WordPress has some great additional features.

What else do you do to secure your WordPress sites?

-Lee

February 16, 2012

Cloudant: Tech Partner Spotlight

This is a guest blog from our featured Technology Partners Marketplace company, Cloudant. Cloudant enables you to build next-generation data-driven applications without having to worry about developing, managing, and scaling your data layer.

Company Website: https://cloudant.com/
Tech Partners Marketplace: http://www.softlayer.com/marketplace/cloudant

Cloudant: Data Layer for the Big Data Era

The recipe for big data app success: Start small. Iterate fast. Grow to epic proportions.

Unfortunately, most developers' databases come up short when they try to simultaneously "iterate fast" and "grow to epic proportions" — those two steps are most often at odds. I know ... I've been there. In a recent past life, I attacked petabyte-per-second data problems as a particle physicist at the Large Hadron Collider together with my colleagues and Cloudant co-founders, Alan Hoffman and Adam Kocoloski. Here are some lessons we learned the hard way:

  1. Scaling a database yourself is brutally hard (both application level sharding and the master-slave model). It is harder with SQL than it is with NoSQL databases, but either way, the "scale it yourself" approach is loaded with unknowns, complications and operational expense.
  2. Horizontal scaling on commodity hardware is a must. We got very good at this and ended up embedding Apache CouchDB behind a horizontal scaling framework to scale arbitrarily and stay running 24x7 with a minimal operational load.
  3. The data layer must scale. It should be something that applications grow into, not out of.

That last point inspired Alan, Adam and me to co-found Cloudant.

What is Cloudant?
Cloudant is a scalable data layer (as a service) for Big Data apps. Built on CouchDB, JSON, and MapReduce, it lets developers focus on new features instead of the drudgery of growing or migrating databases. The Cloudant Data Layer is already big: It collects, stores, analyzes and distributes application data across a global network of secure, high-performance data centers, delivering low-latency and non-stop data access to users no matter where they're located. You get to focus on your code; we've got data scalability and availability covered for you.

Scaling Your App on Cloudant
Cloudant is designed to support fast app iteration by developers. It's based on the CouchDB NoSQL database where data is encapsulated and transferred as JSON documents. You don't need to design and redesign SQL data models or migrate databases in order to create new app features. You don't need to write object-relational mapping code either. The database resides behind an HTTP layer and provides a rich permission model, so you can access, secure and share your data via a RESTful API.

Your app is a tenant within a multi-tenant data layer that is already big and scalable. You get a URL end point for your data layer, get data in and out of it via HTTP, and we scale and secure it around the globe. Global data distribution and intelligent routing minimizes latency between your users and the data, which can add 100s of milliseconds per request (we've measured!). Additionally, Cloudant has an advanced system for prioritizing requests so that apps aren't affected by 'noisy neighbors' in a multi-tenant system. We also offer a single-tenant data layer to companies who want it — your very own white-labeled data cloud. As your data volume and IO requests rise (or fall), Cloudant scales automatically, and because your data is replicated to multiple locations, it's always available. Start small and grow to epic proportions? Check.

Other Data Management Gymnastics
The Cloudant Data Layer also makes it easy to add advanced functionality to your apps:

  • Replicate data (all of it or sub-sets) to data centers, computers or even mobile devices for local processing (great for analytics) or off-line access (great for mobile users). Re-synching is automatic.
  • Perform advanced analytics with built-in MapReduce and full-text indexing and search.
  • Distribute your code with data — Cloudant can distribute and serve any kind of document, even HTML5 and other browser-based code, which makes it easy to scale your app and move processing from your back-end to the browser.

Why We Run on SoftLayer
Given the nature of our service, people always ask us where we have our infrastructure, and we're quick to tell them we chose SoftLayer because we're fanatical about performance. We measured latencies for different data centers run by other cloud providers, and it's no contest: SoftLayer provides the lowest and most predictable latencies. Data centers that are thousands of miles apart perform almost as if they are on the same local area network. SoftLayer's rapidly expanding global presence allows Cloudant to replicate data globally throughout North America, Europe and Asia (with plans to continue that expansion as quickly as SoftLayer can build new facilities).

The other major draw to SoftLayer was the transparency they provide about our infrastructure. If you run a data layer, IO matters! SoftLayer provisions dedicated hardware for us (rather than just virtual machines), and they actually tell us exactly what hardware we are running on, so we can tweak our systems to get the most bang for our buck.

Get Started with Cloudant for Free
If you're interested to see what the Cloudant Data Layer could do for your app, sign up at cloudant.com to get your FREE global data presence created in an instant.

-Michael Miller, Cloudant

This guest blog series highlights companies in SoftLayer's Technology Partners Marketplace.
These Partners have built their businesses on the SoftLayer Platform, and we're excited for them to tell their stories. New Partners will be added to the Marketplace each month, so stay tuned for many more come.
October 4, 2011

An Introduction to Redis

I recently had the opportunity to get re-acquainted with Redis while evaluating solutions for a project on the Product Innovation team here at SoftLayer. I'd actually played with it a couple of times before, but this time it "clicked." Or my brain broke. Either way, I see a lot of potential for Redis now.

No one product is a perfect fit for all of your data storage needs, of course. There are such fundamental tradeoffs to be made in designing storage architectures that you should be immediately suspicious of any product that claims to fit every need.

The best solutions tend to be products that actually embrace these tradeoffs. Redis, for instance, has sacrificed a small amount of data durability in exchange for being awesome.

What is it?

Redis is a key/value store, but describing it that way is sort of like calling a helicopter a "vehicle." It's a technically correct description, but it leaves out some important stuff.

You can think of it like a sophisticated older brother of Memcached. It presents a flat keyspace, and you can set those keys to string values. Another feature of Memcached is the ability to perform remote atomic operations, like "incr" and "append." These are really handy, because you have the ability to modify remote data without fetching, and you have an assurance that you're the only one performing that operation at that instant.

Redis takes this concept of remote commands on data and goes completely nuts with it. The database is aware of data structures like hashes, lists and sets in addition to simple string values. You can sort, union, intersect, slice and dice to your heart's content without fetching any data. Redis is a data structure server. You can treat it like remote memory, and this has an awesome immediate benefit for a programmer: your code and brain are already optimized for these data types.

But it's not just about making storage simpler. It's fast, too. Crazy fast. If you make intelligent use of its data structures, it's possible to serve a lot of traffic from relatively modest hardware. Redis 2.4 can easily handle ~50k list appends a second on my notebook. With batching, it can append 2 million items to a list on a remote host in about 1.28 seconds.

It allows the remote, atomic and performant manipulation of data structures. It took me a little while to realize exactly how useful that is.

What's wrong with it?

Nothing. Move along.

OK, it's a little short on durability. Redis uses memory as its primary store and periodically flushes to disk. A common configuration is to do so every second.

That sounds pretty reasonable. If a server goes down, you could lose a second of data. Keep in mind, however, how many operations Redis can perform in a second. If you're in a high-volume environment, that could be a lot of data. It's not for your financial transactions.

It also supports relatively limited availability options. Currently, it only supports master/slave replication. Clustering support is planned for an upcoming release. It's looking pretty powerful, but it will take some real-world testing to know its performance impact.

These challenges should be taken into consideration, and it's probably clear if you're in a situation where the current tradeoffs aren't a good fit.

In my experience, a lot of developers seriously overestimate the consequences of their application losing small amounts of data. Also consider whether or not the chance of losing a second (or less) of data genuinely represents a bigger threat to your application than any other compromises you might have made.

More Information
You can check out the slightly aging docs or browse the impressively simple source. There are probably already bindings for your language of choice as well.

-Tim

September 25, 2011

Learning the Language of Hosting

It's been a little over a month since I started at SoftLayer ... And what a difference a month makes. In the course of applying for the Social Media Coordinator position I now hold, I was asked to write a few sample blogs. One was supposed to be about what SoftLayer does, and I answered it to the best of my abilities at the time. Looking back on my answer, I must admit I had no idea what I was getting into.

On the plus side, comparing what I know now with what I thought I knew then shows how much a person with zero background in hosting can learn in a short period of time. To give you an idea of where I came from, let's look at a few theoretical conversations:

Pre-SoftLayer

Friend: What does SoftLayer do?
Rachel: They are a hosting provider.
Friend: What is a hosting provider?
Rachel: It's sort of like an Internet landlord that rents data space to clients ... I think.

Present Day

Friend: What is it you do?
Rachel: I'm the Social Media Coordinator for SoftLayer Technologies.
Friend: What does SoftLayer do?
Rachel: SoftLayer is a hosting provider, however that is a generalization. We have data centers around the country and are expanding worldwide. The company offers dedicated, cloud and hybrid environments that allow us to handle companies outsourced IT. We are infrastructure experts.

That would be a little bit of a cookie cutter explanation, but it gives a lot more context to the business, and it would probably soar above the head of my non-technical inquisitive friend.

During my first week on the job, I visited one of SoftLayer's data centers ... And that "data center" term turned out to be a little tricky for me to remember. For some reason, I always wanted to call the data center a "database center." It got to the point where Kevin challenged me to a piggy bank deal.

SoftLayer is raising money for the American Heart Association, and everyone has a little piggy bank at their desk. One of the piggy banks essentially became a "swear jar" ... except not for swearing. Every time I said "database center," I had to put a dollar in the piggy bank. The deal was extended when I was trying to remember that 1 byte (big B) = 8 bits (little b):

AHA Piggy Bank

With money on the line, I'm happy to say that I haven't confused "database centers" or bits and bytes again ... And the piggy bank on the left-hand side of the picture above proves it!

Back to the DC (data center!) tour: I learned about how CRAC units are used to pull air underneath the floor and cool the "cold aisles" in the DC. I learned about the racks and how our network architecture provides private, public, and out–of–band management networks on the back end to customers in a way unique to SoftLayer. Most importantly, I learned the difference between managed, dedicated, cloud and hosting environments that incorporate all of those different kinds of hosting. This is a far cry from focusing on getting the terminology correct.

I'm still not an expert on all things SoftLayer, and I'm pretty sure I'll end up with my very own acronym dictionary, but I must admit that I absorbed more information in the past month than I thought possible. I have to thank my ninja sensei, Kevin, for taking the time to answer my questions. It felt like school again ... especially since there was a whiteboard in use!

Kevin, enjoy your empty piggy bank!

-Rachel

June 8, 2011

MySQL Slow? Check for Fragmentation.

Let's say you have a website and you notice that any calls to your MySQL database take longer to render. If you don't have a Database Administrator (DBA), this can be pretty frustrating. SoftLayer's Managed Hosting line of business employs some of the best DBAs in the country and is one of the only managed hosting providers that offers MySQL and MsSQL DBA services, and I don't just say that because I'm one of them ... We've got the certifications to prove it. :-)

Given my area of expertise, I wanted to share a few some simple tips with you to help you tweak variables and improve the performance of your MySQL server. Given that every application is different, this isn't necessarily a one-size-fits-all solution, but it'll at least give you a starting point for troubleshooting.

First: Get mysqltuner.pl. This is a fine script by Major Hayden that will give you some valuable information regarding the performance of your MySQL server.

Second: Look for fragmented tables. What are fragmented tables? If there are random insertions into or deletions from the indexes of a table, the indexes may become fragmented. Fragmentation means that the physical ordering of the index pages on the disk is not close to the index ordering of the records on the pages or that there are many unused pages in the 64-page blocks that were allocated to the index. The symptoms of fragmented tables can be that table can take more disk space than needed or the results may return slower with more disk I/O than needed. INNODB users need to check the fragmentation often because when INNODB marks data as deleted, it never overwrites the blocks with new data ... It just marks them as unusable. As a result, the data size is artificially inflated and data retrieval is slowed.

Fortunately, there is a way to see your table fragmentation and that is to run a query against the information_schemea to show all tables that are fragmented and the percentage of fragmentation:

SELECT TABLE_SCHEMA, TABLE_NAME, CONCAT(ROUND(data_length / ( 1024 * 1024 ), 2), 'MB') DATA, CONCAT(ROUND(data_free  / ( 1024 * 1024 ), 2), 'MB')FREE from information_schema.TABLES where TABLE_SCHEMA NOT IN ('information_schema','mysql') and Data_free < 0;

Fixing the fragmentation is easy, but there are a few caveats. When defragmenting a table, it will lock the table, so make sure you can afford the lock. To fix fragmented tables, you can simply run optimize table <table name>; to rebuild the table and all indexes or you can change the engine of the table with alter table <table name> engine = INNODB;

I have written a simple bash script in bash to go through, defragment and optimize your tables:

#!/bin/bash
 
MYSQL_LOGIN='-u<user name> --password=<passowrd>'
 
for db in $(echo "SHOW DATABASES;" | mysql $MYSQL_LOGIN | grep -v -e "Database" -e "information_schema")
do
        TABLES=$(echo "USE $db; SHOW TABLES;" | mysql $MYSQL_LOGIN |  grep -v Tables_in_)
        echo "Switching to database $db"
        for table in $TABLES
        do
                echo -n " * Optimizing table $table ... "
                echo "USE $db; OPTIMIZE TABLE $table" | mysql $MYSQL_LOGIN >/dev/null
                echo "done."
        done
done

You'd be surprised how much of an impact table fragmentation has on MySQL performance, and this is an easy way to quickly troubleshoot your database that "isn't as fast as it used to be." If you follow the above steps and still can't make sense of what's causing your database to lag, our Managed Hosting team is always here to work with you to get your servers back in shape ... And with the flexibility of month-to-month contract terms and the ability to add managed capabilities to specific pieces of your infrastructure, we have to earn your business every month with spectacular service.

-Lee

April 20, 2011

3 Bars | 3 Questions: SoftLayer Managed Hosting

I know you expected to see a video interview with Paul Ford the next time a 3 Bars | 3 Questions episode rolled across your desk, but I snuck past him for a chance in the spotlight this week. Kevin and I jumped on a quick video chat to talk about the Sales Engineering team, and because of our recent release of SoftLayer Managed Hosting, two of the three questions ended up being about that news:

You should be seeing a blog from Nathan in the next half hour or so with more detail about how we approached managed hosting, so you'll have all the background you need to springboard into that post after you watch this video.

If you've heard everything you need to hear about managed hosting and want to start the process of adding it to servers on your account, visit http://www.softlayer.com/solutions/managed-hosting/ or chat with a sales rep, and they can help you get squared away. If you're not sure whether it's a good fit, ask for a sales engineer to consult ... They're a great group with a pretty awesome manager. :-)

Paul, sorry for stealing your spot in the 3 Bars | 3 Questions rotation! I'm handing the baton back over to you to talk about TechWildcatters and the Technology Partners Marketplace in the next episode.

-Tam

April 12, 2010

How Much Is a Trillion?

Since the budget surpluses of the seemingly long ago Clinton administration have vanished, the government as managed by both major political parties has been on a spending spree. These folks have been throwing down trillions like rappers throw down Benjamins!
Speaking of Benjamins, how big is a pile of a trillion dollars made up of Benjamins? Click here for a quick look.
I did something yesterday that gave me a whole new view of how large of a number a trillion really is. I work with some big spreadsheets of data that I query from our database systems here at work. Using Excel 2007, I can analyze worksheets that are a million records long. So I needed to use the VLOOKUP function in Excel to grab a field of data in one worksheet of a million records and move it into another worksheet of a million records. I run these types of jobs on a server (not my desktop machine) and this process brought my server to its knees.
This particular VLOOKUP job involved a trillion comparisons. For each of the million rows in worksheet #1, it had to search a specified range of one million fields in spreadsheet #2, find the exact match, and bring a specific field of data back into spreadsheet #1. A million rows searching a million fields = 1,000,000 x 1,000,000. And a million times a million equals a trillion.
Now, my server is not a wimpy machine. It has Nehalem processors with 16 total processing cores running at 2.93 GHz. It doesn’t have a stupendous amount of RAM because Excel can only use just so much RAM. But many functions in Excel such as VLOOKUP can utilize all the processing cores you can throw at them.
So when I hit the return key at 5:12 PM yesterday to kick off this VLOOKUP, I noticed that it was taking a while. So I went to Task Manager and I saw that all 16 cores were maxed out at 100% utilization. All sixteen cores remained maxed out until 5:47 PM at which time the job finished successfully.
So, for all the techies out there, that’s your representation of how big a trillion really is. It takes 16 processing cores running at 2.93 GHz each maxed out for 35 straight minutes to run a VLOOKUP involving a trillion comparisons.

Categories: 
Subscribe to database