RAID Array Failure – Disk Failure
Multi-disk systems usually employ a RAID configuration, or which there are several variants. Different variants offer advantages in read/write speed, capacity and hardware redundancy.
The disk set appears as a logical volume to the operating system, and this is achieved using a RAID controller which connects to each disk physically, and presents them as a logical volume according to the RAID variant specified.
Certain RAID configurations can tolerate one disk falling offline, and once detected, the failed disk should be replaced, in order that the replacement disk can be ‘rebuilt’ automatically, thereby returning a state of redundancy in the system.
A common problem occurs where one disk falls offline, is not dealt with, and a second disk subsequently falls offline, making the RAID inaccessible. The cause of a disk failure could be electronic, service area related, or mechanical.
In this case, the objective is to make each disk operational, and thereafter, to identify which disk was last to fall offline. It is the ‘last fallen offline’ disk combined with the functioning disks in the array that form the basis of the most accurate data re-build. In some cases, it may only be possible to deal successfully with the penultimate failure, in which case, the recovered data may be somewhat less than current in terms of recovered data. For this reason, it is critical for organisations to recognise a disk in offline status at the earliest opportunity.
In an extreme case, more than one disk may have encountered a hardware level disk failure. In this case, it is necessary to repair the underlying drive fault so as to make the required minimum number of disks available for RAID rebuild.
Once the build parameters are understood, the array can be rebuilt from the disk images created during the recovery process. The structures are then verified and tested prior to full server data recovery.
The data retrieved during the server data recovery process is usually saved to external hard drive, from where it can be easily imported into the rebuilt server environment.