RAID Controller Failure
Multi-disk systems usually employ a RAID configuration, of which there are several variants. Different variants offer advantages in read/write speed, capacity and hardware redundancy.
Using RAID, the disk set appears as a logical volume to the operating system, and this is achieved using a RAID controller which connects to each disk physically, and presents them as a logical volume to the operating system according to the RAID variant specified.
Certain RAID configurations can tolerate one disk ‘falling offline’, and once detected, the failed disk should be replaced, in order that the replacement disk can be ‘rebuilt’ automatically, thereby returning a state of redundancy in the system.
A common problem occurs where a once-off write error sent from the RAID controller writes corrupt information to the disk array, making the RAID volume inaccessible. This type of failure has nothing to do with the physical condition of the disks. Instead, the corruption present causes the logical volume not to present. Usually, the majority of the underlying data is unaffected, although the structure of the data (spread over all of the disks in the array) needs to be manually determined at byte-level. Once the build parameters are understood, the array can be rebuilt from the disk images created during the recovery process. The structures are then verified and tested prior to full server data recovery.
The data retrieved during the server data recovery process is usually saved to external hard drive, from where it can be easily imported into the rebuilt server environment.