Author Archive: Subramanian Parameswaran

March 25, 2016

Be an Expert: Handle Drive Failures with Ease

Bare metal servers at SoftLayer employ best-in-class and industry proven SAS, SATA, or SSD disks, which are extensively tested and qualified in-house by the data center technicians. They are reliable and are enterprise grade hardware. However, single-point device failure cannot be neglected for unforeseen circumstances. HDD or device failures could happen for various reasons like power surge, mechanical/internal failure, drive firmware bugs, overheating, aging, etc. Though all efforts are made to mitigate these issues by selecting the best-in-class hard drives and pre-tested devices before making them available to customer, one could still run into drive failures occasionally.

Is having RAID protection just good enough?

Drive failures on dedicated bare metal servers may cause data loss, downtime, and service interruptions if they are not adequately deployed with a risk mitigation plan. As a first line of defense, users choose to have RAID at various levels. This may seem sufficient but may have the following problems:

  • Volume associated with the failed drive becomes degraded. This brings the VD performance below acceptable level. A degraded volume is most likely to disable write-back caching and further degrades write performance as well.
  • There is always a chance of another disk failing in the meantime. Unless a new disk is inserted and a rebuild is completed, a second disk failure could be catastrophic.    

Today a manual response to disk failure may take quite some time between when the user gets notified or becomes aware that the disks have failed and when a technician is involved to change the disks at the servers. During this time, a second disk failure is looming large over the user—while the system is in a degraded state.

To mitigate this risk, SoftLayer recommends that users always have a Global Hot Spare or Dedicated Hot Spare Disks wherever available on the bare metal servers. Users can choose one or more Hot Spare disks per server. This typically requires the user to earmark a drive slot for hot spares. It is recommended while ordering bare metal servers to take into consideration having empty drive slots for global hot spare drives.

Adding Hot Spare on a LSI MegaRAID Adaptor

Users can use WebBIOS utility or MegaRAID Storage Manager to add Hot Spare drive.

It is easiest to configure using MegaRAID Storage Manager Software,  available on the AVAGO website

Once logged in, you’ll will want to choose the Logical tab to view the unused disks under the “Unconfigured Drives.” Right-clicking and selecting “Assign Global Hot Spare” will make sure this drive is standby for any drive failure for any of the RAID volumes configured in the system. You can also choose to have Dedicated Hot Spare for specific volumes, which are critical. Figure 1 shows how to add a Global Hot Space using MSM. MegaRAID Storage Manager can also be used to access the server from a third-party machine or service laptops by providing the server IP address.

Figure 1 shows how to add a Global Hot Space using MSM.

You can also use the WebBios interface to add Hot Spare drives. This is done by breaking into the card BIOS at the early stage of booting by using Ctrl+R to access the BIOS Configuration Utility. As a prerequisite for accessing the KVM screen to see the boot time messages, you’ll need to VPN into the SoftLayer network and use KVM under the “Actions” dropdown in the customer portal.

Once inside the WebBIOS screen, access the “PD Mgmt” tab and choose a free drive. Pressing F2 on the highlighted drive will display a menu for making the drive as a Global Hot Spare. Figure 2 below provides more details for making a Hot Spare using BIOS interface. We recommend using virtual keyboard while navigating and issuing commands in the KVM viewer.

Figure 2 provides more details for making a Hot Spare using BIOS interface.

Adding Hot Spare Through Adaptec Adaptor

Adaptec also provides the Adaptec Storage Manager and a BIOS option to add Global Hot Spares.

The Adaptec Storage Manager comes preinstalled on SoftLayer servers for the supported chosen OS. This can also be downloaded for the specific Adaptec card from this link. After launching the Adaptec Storage Manager, users can select a specific available free drive and create a global hot spare drive as shown in Figure 3.

After launching the Adaptec Storage Manager, users can select a specific available free drive and create a global hot spare drive as shown in Figure 3.

Adaptec also provides a BIOS-based configuration utility that can be used to add a Hot Spare. To do this, you’ll need to break into the BIOS utility by using Ctrl+A at the early boot. After that, select the Global Hot Spares from the main menu to enter the drive selection page. Select a drive by pressing Insert and Enter to submit changes. Figure 4 below depicts the selection of a Global Hot Spare using BIOS configuration utility.

Figure 4 depicts the selection of a Global Hot Spare using BIOS configuration utility.

Using Hot Spares reduces a risk of further drive failures and also lowers the time the system remains in degraded state. We recommend  SoftLayer customers leverage these benefits on their bare metal servers to be better armed against drive failures.

-Subramanian

February 5, 2016

Enable SSD caching on Bare Metal Server for 10X IOPS Improvements

Have you ever wondered how you could leverage the benefits of an SSD at the cost of cheap SATA hard drives?

SSDs provide extremely high IOPS for read and writes and are really tempting for creating volumes, which are IOPS centric. However, because SSD prices are significantly higher than SATA drives, IT managers are at a crossroad and must decide whether to go for SSDs and burn a fortune on them or stay with SATA drives.

But there is a way to use SATA drives and experience SSD performance using some intelligent caching techniques. If you have the right PCI RAID card installed on bare metal servers, you can leverage certain SSD caching feature benefits.

Make sure when configuring a bare metal server, which has sufficient drives bays (8+ the least), to have a LSI (AVAGO) MegaRAID card as the chosen RAID card. You can select the appropriate RAID configuration for OS and other workload data during the order process itself so that the RAIDs come preconfigured with them. As an additional resource for high speed cache device, consider ordering at least two or more SSDs. You can add this to your server even after deployment. These drives are the SSD caching drives that can be used to improve the overall performance of the cheap SATA drives from which one has carved out the volume. 

Install MSM for Easy Management of the RAID Card

Once the server is deployed, consider installing AVAGO MegaRAID Storage Manager (MSM) for the OS that has been installed in the server. (You can also perform a remote management of the RAID controller from a local machine by providing the IP of the server where the controller is installed).

Users can directly download MegaRAID Store Manager from the AVAGO website for the installed card in the machine. For the most popular MegaRAID SAS 9361-8i card download the MSM from the AVAGO website here.

How to Create CacheCade - SSD Caching Volumes and Attach to the Volume Drives

Follow these three steps to improve the IOPS on the existing Volumes on the bare metal server.

Step 1: Creating CacheCade Volumes

Once SSDs are deployed on bare metal servers and Regular Volumes are created, users can create a CacheCade volumes to perform SSD Caching. This can be easily achieved by right clicking AVAGO Controller and selecting the Create Cachecade – SSD Caching option.

Create Cachecade

Step 2: Choosing the right RAID Level and Write Policy for CacheCade Volumes

It is recommended to use a RAID 1 SSD Cache Cade Volume. This will eliminate a single point of failure at the SSD device level. This can be done by selecting available SSDs on the system and choosing RAID 1 as the RAID level. Click Add to add all available disks and Create Drive Group. Also, be sure to select Write Back as the Write Policy for increased IO performance for both Read and Writes to a Volume that needs to be cached. 

RAID Level and Write Policy for CacheCade Volumes

Step 3: Enabling SSD Caching For Volumes

If the Virtual Drives were created without SSD caching enabled, then this is the right time to enable them as shown below—selectively enable or disable set of Virtual drives which needs SSD caching.

Right click on the volume and select Enable SSD Caching.

Enable SSD Caching

Performance Comparison

We tried a simple comparison here on a 3.6TB RAID 50 (3 Drive with 2 Spans) volume with and without SSD caching using IOmeter tool (available here). The workload was a 50/50 (Read/Write) 4kb Pure Random IO workload subjected for about an hour on the volumes. 

Without SSD Caching – IOPS 970

Without SSD Caching IOPS 970

With SSD Caching – IOPS 9000 (10X Improvement)

With SSD Caching IOPS 9000 (10X Improvement)

The result shows a 10X IOPS and workload dependent benefit. Results also show how repeatable the Read/Writes are happening with the same LBA.

This could certainly help a database application or IO centric workloads, which are hungry for IOPS, get an instant boost in performance. Try this today at Softlayer, and see the difference!!

-Subramanian 

 

Subscribe to Author Archive: %