Development Posts

May 11, 2016

Adventures in Bluemix: Migrating to MQ Light

One of my pet projects at SoftLayer is looking at a small collection of fancy scripts that scan through all registered Internet domain names to see how many of them are hosted on SoftLayer’s infrastructure. There are a lot of fun little challenges involved, but one of the biggest challenges is managing the distribution of work so that this scan doesn’t take all year. Queuing services are great for task distribution, and for my initial implementation I decided to give running a RabbitMQ instance a try, since at the time it was the only queuing service I was familiar with. Overall, it took me about a week and one beefy server to go from “I need a queue,” to “I have a queue that is actually doing what I need it to.”

While what I had set up worked, looking back, there is a lot about RabbitMQ that I didn’t really have the time to figure out properly. Around the time I finished the first run of this project, Bluemix announced that its MQLight service would allow connections from non-Bluemix resources. So when I got some free time, I decided to move the project to a Bluemix-hosted MQ Light queue, and take some notes on how the migration went.

Project overview

To better understand how much work was involved, let me quickly explain how the whole “scanning through every registered domain for SoftLayer hosted domains” thing works.

There are three main moving parts in the project:

  1. The Parser, which is responsible for reading through zone files (which are obtained from the various registrars), filtering out duplicates, and putting nicely formatted domains into a queue.
  2. The Resolver, which is responsible from taking the nicely formatted domains from queue #1, looking up the domain’s IP address, and putting the result into queue #2.
  3. The Checker, which takes the domains from queue #2, checks to see if the domains’ IPs belong to SoftLayer or not, and saves the result in a database.

Each queue entry is a package of about 500 domains, which is roughly 200Kb of text data consisting of the domain and some meta-data that I used to see how well everything was performing. There are around 160 million domains I need to review, and resolving a single domain can take anywhere from .001 seconds to four seconds, so being able to push domains quickly through queues is very important.

Things to be aware of

Going into this migration, I made a lot of assumptions about how things worked that caused me grief. So if you are in a similar situation, here is what I wish someone had told me.

AMQP 1.0: MQLight implements the AMQP 1.0 protocol, which is great, because it is the newest and greatest. As everyone knows, newer is usually better. The problem is that my application was using the python-pika library to connect to RabbitMQ, both of which implement AMQP 0.9, which isn’t fully compatible with AMQP 1.0. The Python library I was using gave me a version error when trying to connect to MQ Light. This required a bit of refactoring of my code in order to get everything working properly. The core ideas are the same, but some of the specific API calls are slightly different.

Persistence: Messages sent to a MQ Light queue without active subscribers will be lost, which took me a while to figure out. The UI indicates when this happens, so this is likely just a problem of me not reading the documentation properly and assuming MQ Light worked like RabbitMQ.



Messages sent to a MQLight queue without active subscribers will be lost.

Threads: The python-mqlight library uses threads fairly heavily, which is great for performance, but it makes programming a little more thought intensive. Make sure you wait for the connection to initialize before sending any messages, and make sure all your messages have been sent in before exiting.

Apache Proton: MQ Light is built on the Apache Qpid Proton project, and the Python library python-mqlight also uses this.

Setting up MQ Light

Aside from those small issues I mentioned, MQ Light was really easy to set up and start using, especially when compared to running my own RabbitMQ instance.



MQLight was really easy to set up and start using, especially when compared to running my own RabbitMQ instance.

  1. Set up the MQ Light Service in Bluemix.
  2. Install the python-mqlight library (or whatever library supports your language of choice). There are a variety of MQ Light Libraries.
  3. Try the send/receive examples.
  4. Write some code.
  5. Watch the messages come in, and profit.

That’s all there is to it. As a developer, the ease with which I can set up services to try is one of the best things about Bluemix, with MQ Light making a great addition to its portfolio of services.

Some real numbers

After I re-factored my code to be able to use either the pika or python-mqlight libraries interchangeably, I ran a sample set of data through each library to see what impact they had on overall performance, and I was pleasantly surprised to see the results.

Doing a full run-through of all domains would take about seven hours, so I ran this test with only 10,364 domains. Below are the running times for each section, in seconds.

Local RabbitMQ

This server was running on a 4 core, 49G Ram VSI.

Parser: 0.054s

Resolver: 90.485s

Checker: 0.0027s

Bluemix MQLight

Parser: 1.593s

Resolver: 86.756s

Checker: 6.766s

Since I am using the free, shared tier of MQ Light, I was honestly expecting much worse results. Having only a few seconds increase in runtime was a really big win for MQ Light.

Overall, I was very pleased working with MQ Light, and I highly suggest it as a starting place for anyone wanting to check out queuing services. It was easy to set up, free to try out, and pretty simple once I started to understand the basics.

-Chris

Categories: 
March 4, 2016

Adventures with Bluemix

Keeping up with the rapid evolution of web programming is frighteningly difficult—especially when you have a day job. To ensure I don’t get left behind, I like to build a small project every year or so with a collection of the most buzzworthy technologies I can find. Nothing particularly impressive, of course, but just a collection of buttons that do things. This year I am trying to get a good grasp on “as a Service,” which seems to be everywhere these days. Hopefully this adventure will prove educational.

Why use services when I can do it myself?

The main idea behind “as a Service” is that somewhere out there in the cloud, someone has figured out how to do a particular task really well. This someone is willing to provide you access to that for a small service fee—thereby letting you, the developer, focus as much time as possible on your code and not so much time worrying about optimal configurations of things that you need to work efficiently.

SoftLayer is an Infrastructure as a Service (IaaS) provider, which is what will be the home for my little application—due in large part because I already have a ton of experience running servers myself.

I’m a big fan of Python, so I’m going to start programing with the Pyramids framework as the base for my new application. Like the “as a Service” offerings, programming frameworks and libraries exist to help the developer focus on their code and leverage the expertise of others for the auxiliary components.

To make everything pretty, I am going to use Bootstrap.js, which is apparently the de facto front-end library these days.

For everything else I want to use, there will be an attached Bluemix service. For the uninitiated, Bluemix is a pretty awesome collection of tools for developing and deploying code. At its core, Bluemix uses Cloud Foundry to provision cloud resources and deploy code. For now, I’m going to deploy my own code, but what I’m really interested in are the add-on services that I can just drop into my application and get going. The first service I want to try out is going to be Cloudant nosql, which is a managed couchDB instance with a few added features like a pretty neat dashboard.

Welcome to Bluemix

Combining Bluemix services with SoftLayer servers

One of the great things about services in Bluemix is that they can be provisioned in a standalone deployment—meaning Bluemix services can be used by any computer with an Internet connection and therefore, so can my SoftLayer servers. Since Bluemix services are deployed on SoftLayer hardware (in general, but there are some exceptions), the latency between SoftLayer servers and Bluemix services should be minimal, which is nice.

Creating a Cloudant service in Bluemix is as easy as hitting the Create button in the console. Creating a simple web application in Pyramid took a bit longer, but the quick tutorial helped me learn about all the cool things the Pyramid project can do. I also got to skip all the mess with SQLAlchemy, since I’m storing all the data in Cloudant. All that’s required is a sane ID system (I am using uuid) and some json. No need to get bogged down with a rigid table structure since Cloudant is a document store. If I want to change the data format, I just need to upload a new copy of the data, and a new revision of that document will be automatically created.

After cobbling together a basic application that can publish and edit content, all I had to do to make everything look like it was designed intentionally was to add a few bootstrap classes to my templates. And then I had a ready to use website!

Conclusion

Although making a web application is still as intensive as it’s always been, at least using technology in an “as a Service” fashion helps cut down on all the tertiary technologies you need to become an expert on to get anything to work. Even though the application I created here was pretty simple, I hope to expand it to include some of the more interesting Bluemix services to see what kind of Frankenstein application I can manage to produce. There are currently 100 Bluemix services, so I think the hardest part is going to be figuring out which one to use next.

-Chris

February 5, 2016

Enable SSD caching on Bare Metal Server for 10X IOPS Improvements

Have you ever wondered how you could leverage the benefits of an SSD at the cost of cheap SATA hard drives?

SSDs provide extremely high IOPS for read and writes and are really tempting for creating volumes, which are IOPS centric. However, because SSD prices are significantly higher than SATA drives, IT managers are at a crossroad and must decide whether to go for SSDs and burn a fortune on them or stay with SATA drives.

But there is a way to use SATA drives and experience SSD performance using some intelligent caching techniques. If you have the right PCI RAID card installed on bare metal servers, you can leverage certain SSD caching feature benefits.

Make sure when configuring a bare metal server, which has sufficient drives bays (8+ the least), to have a LSI (AVAGO) MegaRAID card as the chosen RAID card. You can select the appropriate RAID configuration for OS and other workload data during the order process itself so that the RAIDs come preconfigured with them. As an additional resource for high speed cache device, consider ordering at least two or more SSDs. You can add this to your server even after deployment. These drives are the SSD caching drives that can be used to improve the overall performance of the cheap SATA drives from which one has carved out the volume. 

Install MSM for Easy Management of the RAID Card

Once the server is deployed, consider installing AVAGO MegaRAID Storage Manager (MSM) for the OS that has been installed in the server. (You can also perform a remote management of the RAID controller from a local machine by providing the IP of the server where the controller is installed).

Users can directly download MegaRAID Store Manager from the AVAGO website for the installed card in the machine. For the most popular MegaRAID SAS 9361-8i card download the MSM from the AVAGO website here.

How to Create CacheCade - SSD Caching Volumes and Attach to the Volume Drives

Follow these three steps to improve the IOPS on the existing Volumes on the bare metal server.

Step 1: Creating CacheCade Volumes

Once SSDs are deployed on bare metal servers and Regular Volumes are created, users can create a CacheCade volumes to perform SSD Caching. This can be easily achieved by right clicking AVAGO Controller and selecting the Create Cachecade – SSD Caching option.

Create Cachecade

Step 2: Choosing the right RAID Level and Write Policy for CacheCade Volumes

It is recommended to use a RAID 1 SSD Cache Cade Volume. This will eliminate a single point of failure at the SSD device level. This can be done by selecting available SSDs on the system and choosing RAID 1 as the RAID level. Click Add to add all available disks and Create Drive Group. Also, be sure to select Write Back as the Write Policy for increased IO performance for both Read and Writes to a Volume that needs to be cached. 

RAID Level and Write Policy for CacheCade Volumes

Step 3: Enabling SSD Caching For Volumes

If the Virtual Drives were created without SSD caching enabled, then this is the right time to enable them as shown below—selectively enable or disable set of Virtual drives which needs SSD caching.

Right click on the volume and select Enable SSD Caching.

Enable SSD Caching

Performance Comparison

We tried a simple comparison here on a 3.6TB RAID 50 (3 Drive with 2 Spans) volume with and without SSD caching using IOmeter tool (available here). The workload was a 50/50 (Read/Write) 4kb Pure Random IO workload subjected for about an hour on the volumes. 

Without SSD Caching – IOPS 970

Without SSD Caching IOPS 970

With SSD Caching – IOPS 9000 (10X Improvement)

With SSD Caching IOPS 9000 (10X Improvement)

The result shows a 10X IOPS and workload dependent benefit. Results also show how repeatable the Read/Writes are happening with the same LBA.

This could certainly help a database application or IO centric workloads, which are hungry for IOPS, get an instant boost in performance. Try this today at Softlayer, and see the difference!!

-Subramanian 

 

December 21, 2015

Introducing API release notes and examples library

The website to find out what new and exciting changes are happening on the SoftLayer platform is now softlayer.github.io. Specifically, this website highlights any changes to the customer portal, the API, and any supporting systems. Please continue to rely on tickets created on your account for information regarding any upcoming maintenances and other service impacting events.

At SoftLayer, we follow agile development principles and release code in small but frequent iterations—usually about two every week. The changes featured in release notes on softlayer.github.io only cover what is publicly accessible. So while they may seem small, there are usually a greater number of behind-the-scenes changes happening.

Along with the release notes are a growing collection of useful example scripts on how to actually use the API in a variety of popular languages. While the number of examples is currently small, we are constantly adding examples as they come up, so keep checking back. We are generally inspired to add examples by the questions posted on Stack Overflow that have the SoftLayer tag, so keep posting your questions there, too.

-Chris

November 19, 2015

SoftLayer and Koding join forces to power a Global Virtual Hackathon


This guest blog post is written by Cole Fox, director of partnerships at Koding.

Koding is excited to partner with SoftLayer on its upcoming Global Virtual Hackathon, happening December 12–13, 2015. The event builds on last year’s Hackathon, where more than 60,000 developers participated from all over the world. The winners took home over $35,000 in prizes! This year, we’ve upped the ante to make the event even larger than the last time: the winner will take home a $100,000 grand prize.

“We are working with Koding for this virtual hackathon as part of our commitment to promote open source technology and support the talented community of developers who are dispersed all over the globe,” said Sandy Carter, general manager of Cloud Ecosystem and Developers at IBM. “Cloud-based open source development platforms like Koding make it easier to get software projects started, and hackathons are a great place to show how these kinds of platforms make software development easier and more fun.”


Why a virtual hackathon?
Hackathons are awesome. They allow developers to solve problems in a very short amount of time. The challenge with traditional hackathons is that they require you to be physically present in a room. With more and more of our lives moving online, why be tied to a physical location to solve problems? Virtual hackathons allow talented individuals from all over the world to participate, collaborate, and showcase their skills, regardless of their physical location. Our Global Virtual Hackathon levels the playing field.

Who won last year?
Educational games, especially those that teach programming, were popular to build—and a few actually won! Want to see what the winners built? Click here to check out a fun yet effective game teaching students to program. Learn more about the team of developers and see their code here. Last year, nine winners across three categories took home a prize. To see a list of last year’s winners, see the blog post here.

Tips to be successful and win this year
Here’s some motivation for you: the grand prize is $100,000. (That’s seed capital for your startup idea!)

So how do you win? First and foremost, apply now! Then talk to some friends and maybe even team up. You can also use Koding to find teammates once you’re accepted. Teammates aren’t a requirement but can definitely make for a fun experience and improve your chances of making something amazing.

Once you’re in, get excited! And be sure to start thinking about what you want to build around this year’s themes.

And the 2015 themes are…
Ready to build something and take home $100,000? Here are this year’s themes:

  • Data Visualization
    Data is everywhere, but how can we make sense of it? Infographics and analytics can bring important information to light that wasn’t previously accessible when stuck in a spreadsheet or database. We challenge you to use some of the tools out there to help articulate some insights.
  • Enterprise Productivity
    The workplace can always be improved and companies are willing to pay a lot of money for great solutions. Build an application that helps employees do their jobs better and you could win big.
  • Educational Games
    Last year’s winning team, WunderBruders, created an educational game. But games aren’t just for children. Studies have shown that games not only improve motor skills, but they are also a great way to learn something new.

Wait a second. What is Koding anyway?
In short, Koding is a developer environment as a service. The Koding platform provides you with what you need to move your software development to the cloud. Koding’s cloud-based software development service provides businesses with the ability to formulate the most productive, collaborative, and efficient development workflows. Businesses, both small and large, face three common challenges: on-boarding new team members, workflow efficiency, and knowledge retention. These pain points impact companies across all industries, but for companies involved in software development, these are often the most expensive and critical problems that continue to remain unresolved. Koding was built to tackle these inefficiencies head on. Learn more about Koding for Teams.

Can I use my SoftLayer virtual servers with Koding?
Koding’s technical architecture is very flexible. If you have a SoftLayer virtual server, you can easily connect it to your Koding account. The feature is described in detail here.

Think you can hack it? APPLY NOW!

-Cole Fox

November 2, 2015

The multitenant problem solver is here: VMWare 6 NSX on SoftLayer

We’re very excited to tell you about what’s coming down the pike here at SoftLayer: VMWare NSX 6! This is something that I’ve personally been anticipating for a while now, because it solves so many issues that are confronted on the multitenant platform. Here’s a diagram to explain exactly how it works:

As you can see, it uses the SoftLayer network, the underlay network and fabric, and uses NSX as the overlay network to create the SDN (Software Defined Network).

What is it?
VMware NSX is a virtual networking and security software product from VMware's vCloud Networking and Security (vCNS) and Nicira Network Virtualization Platform (NVP). NSX software-defined networking is part of VMware's software-defined data center concept, which offers cloud computing on VMware virtualization technologies. VMware's stated goal with NSX is to provision virtual networking environments without command line interfaces or other direct administrator intervention. Network virtualization abstracts network operations from the underlying hardware onto a distributed virtualization layer, much like server virtualization does for processing power and operating systems. VMware vCNS (formerly called vShield) virtualizes L4-L7 of the network. Nicira's NVP virtualizes the network fabric, L2 and L3. VMware says that NSX will expose logical firewalls, switches, routers, ports, and other networking elements to allow virtual networking among vendor-agnostic hypervisors, cloud management systems, and associated network hardware. It also will support external networking and security ecosystem services.

How does it work?
NSX network virtualization is an architecture that enables the full potential of a software-defined data center (SDDC), making it possible to create and run entire networks in parallel on top of existing network hardware. This results in faster deployment of workloads and greater agility in creating dynamic data centers.

This means you can create a flexible pool of network capacity that can be allocated, utilized, and repurposed on demand. You can decouple the network from underlying hardware and apply virtualization principles to network infrastructure. You’re able to deploy networks in software that are fully isolated from each other, as well as from other changes in the data center. NSX reproduces the entire networking environment in software, including L2, L3 and L4–L7 network services within each virtual network. NSX offers a distributed logical architecture for L2–L7 services, provisioning them programmatically when virtual machines are deployed and moving them with the virtual machines. With NSX, you already have the physical network resources you need for a next-generation data center.

What are some major features?
NSX brings an SDDC approach to network security. Its network virtualization capabilities enable the three key functions of micro-segmentation: isolation (no communication across unrelated networks), segmentation (controlled communication within a network), and security with advanced services (tight integration with leading third-party security solutions).

The key benefits of micro-segmentation include:

  1. Network security inside the data center: Fine-grained policies enable firewall controls and advanced security down to the level of the virtual NIC.
  2. Automated security for speed and agility in the data center: Security policies are automatically applied when a virtual machine spins up, moved when a virtual machine is migrated, and removed when a virtual machine is deprovisioned—eliminating the problem of stale firewall rules.
  3. Integration with the industry’s leading security products: NSX provides a platform for technology partners to bring their solutions to the SDDC. With NSX security tags, these solutions can adapt to constantly changing conditions in the data center for enhanced security.

As you can see, there are lots of great features and benefits for our customers.

You can find more great resources about NSX on SoftLayer here. Make sure to keep your eyes peeled for more great NSX news!

-Cheeku

October 20, 2015

What’s in a hypervisor? More than you think

Virtualization has always been a key tenet of enabling cloud-computing services. From the get-go, SoftLayer has offered a variety of options, including Citrix XenServer, Microsoft Hyper-V, and Parallels Cloud Server, just to name a few. It’s all about enabling choice.

But what about VMware—the company that practically pioneered virtualization, making it commonplace?

Well, we have some news to share. SoftLayer has always supported VMware ESX and ESXi—your basic, run-of-the mill hypervisor—but now we’re enabling enterprise customers to run VMware vSphere on our bare metal servers.

This collaboration is significant for SoftLayer and IBM because it gives our customers tremendous flexibility and transparency when moving workloads into the public cloud. Enterprises already familiar with VMware can easily extend their existing on-premises VMware infrastructure into the IBM Cloud with simplified, monthly pricing. This makes transitioning into a hybrid model easier because it results in greater workload mobility and application continuity.

But the real magic happens when you couple our bare metal performance with VMware vSphere. Users can complete live workload migrations between data centers across continents. Users can easily move and implement enterprise applications and disaster recovery solutions across our global network of cloud data centers—with just a few clicks of a mouse. Take a look at this demo and judge for yourself.

What’s in a hypervisor? For some, it’s an on-ramp to the cloud and a way to make hybrid computing a reality. When you pair the flexibility of VMware with our bare metal servers, users get a combination that’s hard to beat.

We’re innovating to help companies make the transition to hybrid cloud, one hypervisor at a time. For more details, visit http://www.softlayer.com/virtualization-options.

-Jack Beech, VP of Business Development

September 2, 2015

Backup and Restore in a Cloud and DevOps World

Virtualization has brought many improvements to the compute infrastructure, including snapshots and live migration1. When an infrastructure moves to the cloud, these options often become a client’s primary backup strategy. While snapshots and live migration are also part of a successful strategy, backing up on the cloud may need additional tools.

First, a basic question: Why do we take backups? They’re taken to recover from

  • The loss of an entire machine
  • Partially corrupted files
  • A complete data loss (either through hardware or human error)

While losing an entire machine is frightening, corrupted files or data loss are the more common reasons for data backups.

Snapshots are useful when the snapshot and restore occur in close proximity to each other, e.g., when you’re migrating middleware or an operating system and want to fall back quickly if something goes wrong. If you need to restore after extensive changes (hardware or data), a snapshot isn’t an adequate resource. The restore may require restoring to a new machine, selecting files to be restored, and moving data back to the original machine.

So if a snapshot isn’t the silver bullet for backing up in the cloud, what are the effective backup alternatives? The solution needs to handle a full system loss, partial data loss, or corruption, and ideally work for both virtualized and non-virtualized environments.

What to back up

There are three types of files that you’ll want to consider when backing up an active machine’s disks:

  • Binary files: Changed by operating system and middleware updates; can be easily stored and recovered.
  • Configuration files: Defined by how the binary files are connected, configured, and what data is accessible to them.
  • Data files: Generated by users and unrecoverable if not backed up. Data files are the most precious part of the disk content and losing them may result in a financial impact on the client’s business.

Keep in mind when determining your backup strategy that each file type has a different change rate—data files change faster than configuration files, which are more fluid than binary files. So, what are your options for backing up and restoring each type of file?

Binary files
In the case of a system failure, DevOps advocates (see Phoenix Servers from Martin Fowler) propose getting a new machine, which all cloud providers can automatically provision, including middleware. Automated provisioning processes are available for both bare metal and virtual machines.

Note that most Open Source products only require an Internet connection and a single command line for installation, while commercial products can be provisioned through automation.

Configuration files
Cloud-centric operations have a distinct advantage over traditional operations when it comes to backing up configuration files. With traditional operations, each element is configured manually, which has several drawbacks such as being time-consuming and error-prone. Cloud-centric operations, or DevOps, treat each configuration as code, which allows an environment to be built from a source configuration via automated tools and procedures. Tools such as Chef, Puppet, Ansible, and SaltStack show their power with central configuration repositories that are used to drive the composition of an environment. A central repository works well with another component of automated provisioning—changing the IP address and hostname.

You have limited control of how the cloud will allocate resources, so you need an automated method to collect the information and apply it to all the machines being provisioned.

In a cloud context, it’s suboptimal to manage machines individually; instead, the machines have to be seen as part of a cluster of servers, managed via automation. Cluster automation is one the core tenants of solutions like CoreOS’ Fleet and Apache Mesos. Resources are allocated and managed as a single entity via API, configuration repositories, and automation.

You can attain automation in small steps. Start by choosing an automation tool and begin converting your existing environment one file at a time. Soon, your entire configuration is centrally available and recovering a machine or deploying a full environment is possible with a single automated process.

In addition to being able to quickly provision new machines with your binary and configuration files, you are also able to create parallel environments, such as disaster recovery, test and development, and quality assurance. Using the same provisioning process for all of your environments assures consistent environments and early detection of potential production problems. Packages, binaries, and configuration files can be treated as data and stored in something similar to object stores, which are available in some form with all cloud solutions.

Data files
The final files to be backed up and restored are the data files. These files are the most important part of a backup and restore and the hardest ones to replace. Part of the challenge is the volume of data as well as access to it. Data files are relatively easy to back up; the exception being files that are in transition, e.g., files being uploaded. Data file backups can be done with several tools, including synchronization tools or a full file backup solution. Another option is object stores, which is the natural repository for relatively static files, and allows for a pay–as-you-go model.

Database content is a bit harder to back up. Even with instant snapshots on storage, backing up databases can be challenging. A snapshot at the storage level is an option, but it doesn’t allow for a partial database restore. Also, a snapshot can capture inflight transactions that can cause issues during a restore; which is why most database systems provide a mechanism for online backups. The online backups should be leveraged in combination with tools for file backups.

Something to remember about databases: many solutions end up accumulating data even after the data is no longer used by users. The data within an active database includes data currently being used and historical data. Having current and historical data allows for data analytics on the same database, it also increases the size of the database, making database-related operations harder. It may make sense to archive older data in either other databases or flat files, which makes the database volumes manageable.

Summary

To recap, because cloud provides rapid deployment of your operating system and convenient places to store data (such as object stores), it’s easy to factor cloud into your backup and recovery strategy. By leveraging the containerization approach, you should split the content of your machines—binary, configuration, and data. Focus on automating the deployment of binaries and configuration; it allows easier delivery of an environment, including quality assurance, test, and disaster recovery. Finally, use traditional backup tools for backing up data files. These tools make it possible to rapidly and repeatedly recover complete environments while controlling the amount of backed up data that has to be managed.

-Thomas

1 Snapshots are not available on bare metal servers that have no virtualization capability.

July 14, 2015

Preventative Maintenance and Backups

Has your cPanel server ever gone down only to not come back online because the disk failed?

At SoftLayer, data migration is in the hands of our customers. That means you must save your data and move it to a new server. Well, thanks to a lot of slow weekends, I’ve had time to write a bash script that automates the process for you. It’s been tested in a dev environment of my own working with the data center to simulate the dreaded DRS (data retention service) when a drive fails and in a live environment to see what new curveballs could happen. In this three-part series, we’ll discuss how to do server preventative maintenance to prevent a total disaster, how to restore your backed up data (if you have backups), and finally we’ll go over the script itself to fully automate a process to backup, move, and restore all of your cPanel data safely (if the prior two aren’t options for you).

Let’s start off with some preventative maintenance first and work on setting up backups in WHM itself.

First thing you’ll need to do is log into your WHM, and then go to Home >> Backup >> Backup Configuration. You will probably have an information box at the top that says “The legacy backups system is currently disabled;” that’s fine, let it stay disabled. The legacy backup system is going away soon anyway, and the newer system allows for more customization. If you haven’t clicked “Enable” under the Global Settings, now would be the time to do so, so that the rest of the page becomes visible. Now, you should be able to modify the rest of the backup configuration, so let’s start with the type.

In my personal opinion, compressed is the only way to go. Yes, it takes longer, but uses less disk space in the end. Uncompressed uses up too much space, but it’s faster. Incremental is also not a good choice, as it only allows for one backup and it does not allow for users to include additional destinations.

The next section is scheduling and retention, and, personally, I like my backups done daily with a five-day retention plan. Yes it does use up a bit more space, but it’s also the safest because you’ll have backups from literally the day prior in case something happens.

The next section, Files, is where you will pick the users you want to backup along with what type of data you want to include. I prefer to just leave the defaulted settings here in this section and only choose my users that I want to backup instead. It’s your server though, so you’re free to enable/disable the various options as you see fit. I would definitely leave the options for backing up system files checked though as it is highly recommended to keep that option checked.

The next section deals with databases, and again, this one’s up to you. Per Account is your bare minimum option and is still safe regardless. Entire MySQL directory will just blanket backup the entire MySQL directory instead. The last option encompasses the two prior options, which to me is a bit overkill as the Per Account Only option works well enough on its own.

Now let’s start the actual configuration of the backup service. From here, we’ll choose the backup directory as well as a few other options regarding the retention and additional destinations. The best practice here is to have a drive specifically for backups, and not just another partition or a folder, but a completely separate drive. Wherever you want the backups to reside, type that path in the box. I usually have a secondary drive mounted as /backup to put them in so the pre-filled option works fine for me. The option for mounting the drive as needed should be enabled if you have a separate mount point that is not always mounted. As for the additional destination part, that’s up to you if you want to make backups of your backups. This will allow you to keep backups of the backups offsite somewhere else just in case your server decides to divide by zero or some other random issue that causes everything to go down without being recoverable. Clicking the “Create New Destination” option will bring up a new section to fill in all the data relevant to what you chose.

Once you’ve done all of this, simply click “Save Configuration.” Now you’re done!

But let’s say you’re ready to make a full backup right now instead of waiting for it to automatically run. For this, we’ll need to log in to the server via SSH and run a certain command instead. Using whatever SSH tool you prefer, PuTTY for me, connect to your server using the root username and password that you used to log into WHM. From there, we will run one simple command to backup everything - “/usr/local/cpanel/bin/backup --force” ← This will force a full backup of every user that you selected earlier when you configured the backup in WHM.

That’s pretty much it as far as preventative maintenance and backups go. Next time, we’ll go into how to restore all this content to a new drive in case something happens like someone accidentally deleting a database or a file that they really need back.

-Shawn

April 27, 2015

Good Documentation: A How-to Guide

As part of my job in Development Support, I write internal technical documentation for employee use only. My department is also the last line of support before a developer is called in for customer support issues, so we manage a lot of the troubleshooting documentation. Some of the documentation I write and use is designed for internal use for my position, but some of it is troubleshooting documents for other job positions within the company. I have a few guidelines that I use to improve the quality of my documentation. These are by no means definitive, but they’re some helpful tips that I’ve picked up over the years.

Readability

I’m sure everyone has met the frustration of reading a long-winded sentence that should have been three separate sentences. Keeping your sentences as short as possible helps ensure that your advice won’t go in one ear and out the other. If you can write things in a simpler way, you should do so. The goal of your documentation is to make your readers smarter.

Avoid phrasing things in a confusing way. A good example of this is how you employ parentheses. Sometimes it is necessary to use them to convey important beneficial tidbits to your readers. If you write something with parentheses in it, and you can’t read it out loud without it sounding confusing, try to re-word it, or run it by someone else.

Good: It should have "limited connectivity" (the computer icon with the exclamation point) or "active" status (the green checkmark) and NOT "retired" (the red X).
Bad: It should have the icon “limited connectivity” (basically the computer icon with the exclamation point that appears in the list) (you can see the “limited connectivity” text if you hover over it) or “active” (the green checkmark) status and NOT the red “retired” X icon.

Ideally, you should use the same formatting for all of your documentation. At the very least, you should make your formatting consistent within your document. All of our transaction troubleshooting documentation at SoftLayer uses a standardized error formatting that is consistent and easy to read. Sometimes it might be necessary to break the convention if readability is improved. For example: Collapsible menus make it hard to search the entire page using ctrl+F, but very often, it makes things more difficult.

And finally, if people continually have a slew of questions, it’s probably time to revise your documentation and make it clearer. If it’s too complex, break it down into simpler terms. Add more examples to help clarify things so that it makes sense to your end reader.

Simplicity

Use bullet points or numbered lists when listing things instead of a paragraph block. I mention this because good formatting saves man-hours. There’s a difference between one person having to search a document for five minutes, versus 100 people having to search a document for five minutes each. That’s over eight man-hours lost. Bullet points are much faster to skim through when you are looking for something specific in the middle of a page somewhere. Avoid the “TL;DR” effect and don’t send your readers a wall of text.

Avoid superfluous information. If you have extra information beyond what is necessary, it can have an adverse effect on your readers. Your document may be the first your readers have read on your topic, so don’t overload them with too much information.

Don’t create duplicate information. If your documentation source is electronic, keep your documentation from repeating information, and just link to it in a central location. If you have the same information in five different places, you’ll have to update it in five different places if something changes.

Break up longer documents into smaller, logical sections. Organize your information first. Figure out headings and main points. If your page seems too long, try to break it down into smaller sections. For example, you might want to separate a troubleshooting section from the product information section. If your troubleshooting section grows too large, consider moving it to its own page.

Thoroughness

Don’t make assumptions about what the users already know. If it wasn’t covered in your basic training when you were hired, consider adding it to the documentation. This is especially important when you are documenting things for your own job position. Don’t leave out important details just because you can remember them offhand. You’re doing yourself a favor as well. Six months from now, you may need to use your documentation and you may not remember those details.

Bad:SSH to the image server and delete the offending RGX folder.
Good:SSH to the image server (imageserver.mycompany.local), and run ls -al /dev/rgx_files/ | grep blah to find the offending RGX folder and then use rm -rf /dev/rgx_files/<folder> to delete it.

Make sure your documentation covers as much ground as possible. Cover every error and every possible scenario that you can think of. Collaborate with other people to identify any areas you may have missed.

Account for errors. Error messages often give very helpful information. The error might be as straightforward as “Error: You have entered an unsupported character: ‘$.’” Make sure to document the cause and fix for it in detail. If there are unsupported characters, it might be a good idea to provide a list of unsupported characters.

If something is confusing, provide a good example. It’s usually pretty easy to identify the pain points—the things you struggle with are probably going to be difficult for your readers as well. Sometimes things can be explained better in an example than they can in a lengthy paragraph. If you were documenting a command, it might be worthwhile to provide a good example first and then break it down and explain it in detail. Images can also be very helpful in getting your point across. In documenting user interfaces, an image can be a much better choice than words. Draw red boxes or arrows to guide the reader on the procedure.

-Mark

Subscribe to development