March 25, 2016

Be an Expert: Handle Drive Failures with Ease

March 25, 2016

Leave a Reply

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Bare metal servers at SoftLayer employ best-in-class and industry proven SAS, SATA, or SSD disks, which are extensively tested and qualified in-house by the data center technicians. They are reliable and are enterprise grade hardware. However, single-point device failure cannot be neglected for unforeseen circumstances. HDD or device failures could happen for various reasons like power surge, mechanical/internal failure, drive firmware bugs, overheating, aging, etc. Though all efforts are made to mitigate these issues by selecting the best-in-class hard drives and pre-tested devices before making them available to customer, one could still run into drive failures occasionally.

Is having RAID protection just good enough?

Drive failures on dedicated bare metal servers may cause data loss, downtime, and service interruptions if they are not adequately deployed with a risk mitigation plan. As a first line of defense, users choose to have RAID at various levels. This may seem sufficient but may have the following problems:

  • Volume associated with the failed drive becomes degraded. This brings the VD performance below acceptable level. A degraded volume is most likely to disable write-back caching and further degrades write performance as well.
  • There is always a chance of another disk failing in the meantime. Unless a new disk is inserted and a rebuild is completed, a second disk failure could be catastrophic.    

Today a manual response to disk failure may take quite some time between when the user gets notified or becomes aware that the disks have failed and when a technician is involved to change the disks at the servers. During this time, a second disk failure is looming large over the user—while the system is in a degraded state.

To mitigate this risk, SoftLayer recommends that users always have a Global Hot Spare or Dedicated Hot Spare Disks wherever available on the bare metal servers. Users can choose one or more Hot Spare disks per server. This typically requires the user to earmark a drive slot for hot spares. It is recommended while ordering bare metal servers to take into consideration having empty drive slots for global hot spare drives.

Adding Hot Spare on a LSI MegaRAID Adaptor

Users can use WebBIOS utility or MegaRAID Storage Manager to add Hot Spare drive.

It is easiest to configure using MegaRAID Storage Manager Software,  available on the AVAGO website

Once logged in, you’ll will want to choose the Logical tab to view the unused disks under the “Unconfigured Drives.” Right-clicking and selecting “Assign Global Hot Spare” will make sure this drive is standby for any drive failure for any of the RAID volumes configured in the system. You can also choose to have Dedicated Hot Spare for specific volumes, which are critical. Figure 1 shows how to add a Global Hot Space using MSM. MegaRAID Storage Manager can also be used to access the server from a third-party machine or service laptops by providing the server IP address.

Figure 1 shows how to add a Global Hot Space using MSM.

You can also use the WebBios interface to add Hot Spare drives. This is done by breaking into the card BIOS at the early stage of booting by using Ctrl+R to access the BIOS Configuration Utility. As a prerequisite for accessing the KVM screen to see the boot time messages, you’ll need to VPN into the SoftLayer network and use KVM under the “Actions” dropdown in the customer portal.

Once inside the WebBIOS screen, access the “PD Mgmt” tab and choose a free drive. Pressing F2 on the highlighted drive will display a menu for making the drive as a Global Hot Spare. Figure 2 below provides more details for making a Hot Spare using BIOS interface. We recommend using virtual keyboard while navigating and issuing commands in the KVM viewer.

Figure 2 provides more details for making a Hot Spare using BIOS interface.

Adding Hot Spare Through Adaptec Adaptor

Adaptec also provides the Adaptec Storage Manager and a BIOS option to add Global Hot Spares.

The Adaptec Storage Manager comes preinstalled on SoftLayer servers for the supported chosen OS. This can also be downloaded for the specific Adaptec card from this link. After launching the Adaptec Storage Manager, users can select a specific available free drive and create a global hot spare drive as shown in Figure 3.

After launching the Adaptec Storage Manager, users can select a specific available free drive and create a global hot spare drive as shown in Figure 3.

Adaptec also provides a BIOS-based configuration utility that can be used to add a Hot Spare. To do this, you’ll need to break into the BIOS utility by using Ctrl+A at the early boot. After that, select the Global Hot Spares from the main menu to enter the drive selection page. Select a drive by pressing Insert and Enter to submit changes. Figure 4 below depicts the selection of a Global Hot Spare using BIOS configuration utility.

Figure 4 depicts the selection of a Global Hot Spare using BIOS configuration utility.

Using Hot Spares reduces a risk of further drive failures and also lowers the time the system remains in degraded state. We recommend  SoftLayer customers leverage these benefits on their bare metal servers to be better armed against drive failures.

-Subramanian

Leave a Reply

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.