|
Step 1: Evaluate the chances of success
|
To facilitate the discussion below we introduce a few terms as follows:
- A functional disk is one which is in working order and still holds the original contents.
- A partially corrupt disk is one in working order; however, its contents have been partially overwritten.
- A completely corrupt disk is one whose contents have been completely overwritten.
- A broken disk one that is physically defective and no longer detected by the computer hardware.
- A permanently broken disk is a broken disk that cannot be repaired.
In general RAID 0 recovery requires all disks to be functional. A RAID 5, on the other hand, contains redundant data and
can be recovered without one disk.
A large majority of broken disks can be repaired,
usually for a few thousand dollars. Therefore a RAID failure preceded by a disk failure is usually
recoverable.
Disk corruption, on the other hand, may lead to permanent data loss because corruption may be irreparable.
Disk corruption is by far the number one cause of unrecoverable RAID 5 failure and is usually caused by the user's failed attempts
to rebuild a RAID 5 using incorrect procedures or parameters. A failed attempt, if run to completion, can cause the equivalence
of one corrupted disk. A RAID 5 is still recoverable with one corrupted disk. However, a second failed attempt may change the
RAID beyond repair.
The following table shows the most likely outcome from recovering a failed 4-disk RAID 5. Note that complete recovery requires
having at least three functional disks.
4-disk RAID 5 prognosis based on disk conditions
Functional | Partially corrupt | Completely corrupt | Broken | Permanently broken | Most likely outcome |
4 | 0 | 0 | 0 | 0 | Complete recovery |
3 | 0 | 1 | 0 | 0 | Complete recovery |
3 | 0 | 0 | 0 | 1 | Complete recovery |
2 | 2 | 0 | 0 | 0 | Partial recovery |
2 | 1 | 1 | 0 | 0 | Partial recovery |
2 | 0 | 2 | 0 | 0 | No recovery |
2 | 0 | 0 | 0 | 2 | No recovery |
2 | 0 | 1 | 0 | 1 | No recovery |
2 | 0 | 1 | 1 | 0 | Complete recovery if the broken disk can be fixed |
We discuss a few typical scenarios below.
Scenario 1: A 4-disk hardware RAID 5 was configured as drive D:. After a power surge it
was no longer detected by Windows. All disks seem to be physically working.
Prognosis: The RAID controller may have failed. Since all disks are functional, recovery will be complete.
Scenario 2: A 4-disk hardware RAID 5 was configured as drive D:. One disk physically failed and was replaced
with a new disk. The RAID was then rebuilt using a vendor-supplied utility. The rebuild failed and data is no longer accessible.
Prognosis: If the rebuild procedure was followed correctly, the three remaining disks would not
be overwritten. Because three disks are functional, recovery will be complete.
Scenario 3: This is similar to scenario 3. However, the technician has incorrectly rebuilt the RAID many times
using different parameters.
Prognosis: All disks may have been corrupted beyond repair.
Scenario 4: A 4-disk hardware RAID 5 plus one hot spare (for a total of five disks) was configured as drive D:.
The RAID controller failed and was replaced with a new controller.
The technician did not know about the hot spare and mistakenly rebuilt the RAID as a 5-disk RAID 5.
Prognosis: The incorrect rebuild caused the complete corruption of the equivalence of one disk.
However, with the equivalence of three function disks, recovery will be complete.
Scenario 5: A 4-disk hardware RAID 5 plus one hot spare (for a total of five disks) was configured as drive D:.
One data disk failed and was replaced
with a new disk. The technician did not know about
the hot spare and mistakenly rebuilt the RAID as a 5-disk RAID 5. The rebuild ran to completion.
Prognosis: This is similar to Scenario 4 but with an additional broken disk. With only two functional disks data
is not recoverable unless the broken disk can be repaired.
Scenario 6: A 4-disk RAID 5 experienced a glitch several months ago that caused one disk to be dropped
from the array.
The RAID continued to operate in degraded mode with three disks. The problem was noticed but no actions were taken. Now another
disk has just experienced irreparable mechanical failure.
Prognosis: The dropped disk can be considered a completely corrupt disk because its contents are
out of date. Because only two disks are functional, data is not recoverable.
Scenario 7: A 4-disk hardware RAID 5 was configured as drive C:. The RAID operates normally but drive C: has
been reformatted and Windows reinstalled.
Prognosis: The RAID does not fail and the discussion above does not apply. RAID recovery is not necessary.
The user should follow recovery procedures for ordinary disks.
|
|