Managing RAID Groups and Drives

Creating a RAID Group

VPSA RAID Groups define the level of protection against disk failure of the Pools and Volumes which contain the user’s data. Careful consideration must be given when selecting the RAID level, along with the number and type of drives, in order to avoid potential impact on performance of your data. RAID groups always span across drives from different Storage Nodes, thus a RAID Group is resilient to both a single drive failure, as well as to a complete Storage Node failure.

To create your RAID Groups first select the Drives entity in the Main Navigation Panel (Left Panel) and then click the Create RAID Group button in the Center Panel.

Define the following attributes in the “Create RAID Group” dialog box:

  • Enter the RAID Group name (you will later add it to a Pool so you may want to provide a meaningful name that describes the target usage of the Pool).

Note

Objects names can be up to 128 chars long and can contain letters and digits, dashes “-” and underscores “_”

  • Select Protection Type. Refer to the table below for a description of the various RAID levels.

  • Select whether to allocate a drive as a Hot Spare for this RAID group.  Adding Spare drive for RAID-1 groups is recommended. See more details about managing Hot Spares here.

  • Select the drives that will participate in the RAID Group. As noted in the table below, for RAID-1 a minimum of 2 drives is required.

  • For maximum redundancy drives MUST be selected from different Storage Nodes so the VPSA will prevent you from doing otherwise.

  • It is possible but not recommended to mix drives of different types in a single RAID Group.

Note

  • RAID-5 is no longer supported.

  • From zStorage version 23.09, the system does not allow RAID-6 RAID Group creation and RAID-60 Storage Pools, respectively. VPSAs that are currently using RAID-60 Storage Pools can be upgraded to version 23.09, and will also support storage expansions. For more information, contact https://support.zadarastorage.com.

Understanding RAID levels

The following RAID levels are supported:

RAID level

Description

RAID-1 (Mirroring)

RAID-1 mirrors the contents of one hard drive in the group onto another. If either hard drive fails, the other hard drive takes over and normal system operations are not interrupted. RAID-1, or Drive Mirroring, creates fault tolerance by storing duplicate sets of data on a minimum of two hard drives, and offers an excellent combination of data protection and performance. There must be 2 or 3 drives in a RAID-1 group. RAID-1 and RAID-10 are the most costly fault tolerance methods because they require 50 percent of the total combined drives capacity to store the redundant data.

RAID-10 (Mirroring and Striping)

RAID-10, or Drive Mirroring and Striping, is achieved in a VPSA by creating RAID-1 RAID Groups and striping them together at the Pool level. RAID-10 first mirrors each drive in the array to another, and then stripes the data across the mirrored pair. If a physical drive fails the mirror drive takes over and normal system operations are not interrupted. RAID 10 can withstand multiple simultaneous drive failures, as long as the failed drives are not mirrored to each other. RAID-10 creates fault tolerance by storing duplicate sets of data on a minimum of four hard drives and offers the best combination of data protection and performance. RAID-10 is the most costly fault tolerance method because it requires 50 percent of the total combined drives capacity to store the redundant data.

Deleting a RAID Group

The VPSA administrator can delete delete a specific RAID Group if it not needed by clicking the Delete option under the RAID Group’s top option menu.

Note

A RAID Group can be deleted in case it is not allocated to a storage pool The system will block deletion of alloacted RAID groups.

Viewing RAID Group properties

The RAID-Group’s details (properties and metering), are shown in the South Panel tabs:

Properties

Each RAID Group includes the following properties:

Property

Description

ID

An internally assigned unique ID.

Name

User assigned name. Can be modified anytime.

Comment

User free text comment. Can be used for labels, reminders or any other purpose

Protection

Selected RAID level - RAID-1.

Capacity

Total protected and usable capacity of the RAID Group.

Available Capacity

The RAID Group’s usable capacity that is not allocated to any Pool.

Mirror Number

Number of mirror copies for RAID-1.

Status

  • Normal – All drives are in sync

  • Resyncing X% – The RAID is in an initial rebuild process.

  • Degraded – One of the drives have failed.

  • Degraded Resyncing X% – The RAID is resyncing data following a drive recovery\replacement.

  • Repairing X% – Media Scan is in progress.

  • Repairing Paused – Media Scan is paused.

  • Failed – The array has lost too many drives and cannot serve Server IOs.

Created

Date & time when the object was created.

Modified

Date & time when the object was last modified.

Drives

Lists the disk Drives participating in the selected RAID Group. The following information is displayed per drive:

  • Name

  • Capacity (in GB)

  • Location (Storage Node)

  • Type (SAS/SATA/SSD/TBD)

  • Status (Normal/Failed/TBD)

  • Hot Spare (Yes/No)

Metering

The Metering Charts provide live metering of the IO workload associated with the selected RAID Group.

The charts display the metering data as it was captured in the past 20 intervals. An interval length can be one of the following: 1 second, 10 Seconds, 1 minute, 10 minutes, or 1 hour. The Auto button lets you see continuously-updating live metering info.

Note

The metering info of the RAID Group doesn’t include RAID-generated IOs, such as when doing a rebuild.

The following charts are displayed:

Chart

Description

IOPs

The number of read and write commands issued to the RAID Group per second.

Bandwidth (MB\s)

Total throughput (in MB) of read and write SCSI commands issued to the RAID Group per second.

IO Time (ms)

Average response time of all read and write SCSI commands issued to the RAID Group per selected interval.

Logs

Displays all event logs associated with this RAID Group.


Understanding Hot Spare Drives

image15

When creating a RAID Group you can decide whether you’d like to allocate hot spare drives to the RAID Group or not. You can change this selection at any time by clicking the Add Spare or Remove Spare buttons on a selected RAID Group in the VPSA GUI > RAID Groups page.

Allocating a hot spare drive for a RAID Group allows for immediate and automated drive replacement, with no human intervention, once the VPSA determines that the drive has failed.

If you choose not to allocate a hot spare drive to your RAID group, you can still replace a failed drive with any available drive that is not used in any other RAID Group within the VPSA. You can execute this process manually, or automate it via the VPSA REST APIs. Simply identify and select the failed drive, click the Replace button and select the available drive to use for the replacement. For more details see here: Replacing a Drive.


Managing RAID Group Sync Speed

RAID Group Sync Speed allows you to control the rate with which data is synchronized during a RAID rebuild process on both a newly created RAID group and following a drive replacement.

Setting the Sync Speed is a tradeoff between the need to complete the RAID rebuild as quickly as possible in order to return to full redundancy level and the ability to supply good response time and throughput for application IOs. Therefore, the VPSA allows you to control two parameters impacting the sync Speed:

  • Max Speed During Host I/Os” – Controls the RAID sync speed when there are Server IOs. You will want to set it low if the Server’s IOs are the priority. Set it high if you want to prioritize the RAID rebuild process.

    • Default value: 10 MB/s

    • Range: 1 - 500 MB/s

  • Max Speed w/o Host I/Os” – Controls the sync speed when there are no Server IOs. You would typically set it to max value (500 MB/s), unless it consumes too much of the VPSA’s resources (depending on the Engine type) which impacts the performance of other RAID Groups (which do have active Server IOs).

image16

You can set and modify Sync Speed at any time, and it can vary between RAID groups. The Sync Speed also applies to Media Scan (see below).

Replacing a Drive

image18

Press the Replace button on the Drives page to replace a drive. When selecting the replacement drive you must choose a drive that will not break the RAID Group redundancy (i.e. you cannot have two or more drives from the same Storage Node in a RAID Group). If you select a drive that has a different type or larger size than the other drives in the RAID Group, you will see a warning, but you can continue the operation.

You can replace a drive in any RAID Group whether the drive is healthy (Normal) or unhealthy (Failed).

You cannot replace a drive if the RAID Group is in a Resyncing state.


Shredding a Drive

image19

Shredding is the process of erasing the data on a drive for security and privacy reasons by overwriting the entire drive with random data at least three times. Typically you will shred a drive before returning it to the Zadara Cloud or before deleting your VPSA.

You can only perform Shred on drives in Available status (i.e. not in a RAID group).

The Shredding progress appears in the drive status as “Shredding X%” .

You cannot remove a drive from a VPSA while it is being shredded. You need to either cancel the operation by pressing the Cancel Shred button, or wait until shredding is completed.

Caution

Shredding is irreversible!


Viewing Drive properties

You can view the following properties and metering information in the Drives Details South Panel tabs:

Properties

Each drive displays the following Properties:

Property

Description

ID

An internally assigned unique ID.

Name

Drive name.

Capacity

Drive Capacity in MB.

Storage Node

The name of the Storage Node where the drive is physically located.

Type

SATA, SAS, or SSD

Status

The drive’s status reflects the drive health as sensed by the Storage Node and by the VPSA RAID logic:

  • Available – The drive is healthy and free.

  • Normal – The drive is healthy and belongs to a RAID Group.

  • Absent – No access to the drive.

  • Failed – The Storage Node has reported failure accessing the drive.

  • Faulty – The VPSA RAID object has failed writing to or reading from this drive.

  • Recover Pending – The RAID Group has failed and the drive is awaiting recovery.

  • Shredding – The drive is being shredded.

RAID Group

Name of the RAID group that contains this drive.

Protection Zone

Displays the Protection Zone of the drive.

Usage

In-use or Available

Added

The date and time when the drive was added to the VPSA.

Modified

The date and time when the Drive object was last modified.

Metering

The Metering Charts provide live metering of the IO workload associated with the selected Drive.

The charts display the metering data as it was captured in the past 20 “Intervals.” An interval length can be set to one of the following: 1 second, 10 Seconds, 1 Minute, 10 Minutes, or 1 Hour. The Auto button lets you see continuously-updating live metering info (refreshed every 3 sec).

The Following charts are displayed:

Chart

Description

IOPs

The number of read and write SCSI commands issued to the Drive, per second.

Bandwidth (MB\s)

Total throughput (in MB) of read and write SCSI commands issued to the Drive, per second.

IO Time (ms)

Average response time of all read and write SCSI commands issued to the Drive, per selected interval.

Logs

Displays all event logs associated with this Drive.