Hardware and RAID configuration:
This was an 8x4TB QNAP RAID-5
NAS holding two
thin-provisioned LVM volumes,
each with total capacity of about 13 TB.
Problem:
One drive failed. During multiple rebuilding attempts, the customer encountered other complications.
Eventually the data became inaccessible.
Diagnosis:
The LVM headers were still intact. They contained the configuration of two thin-provisioned logical volumes with capacity 12.7
and 12.6 TB. The thin LVM metadata was stored near the end of the drives.
The metadata contained two B-trees, one for each logical volume.
Solution:
There are three layers of virtualization:
- RAID logical storage space to physical drive space.
- LVM logical space to RAID space.
- Logical volume space to LVM space.
Each layer must be "devirtualized" by a table that maps virtual offsets to logical/physical offsets.
Unfortunately the B-tree structures were partially lost.
Therefore the trees could not be traversed top-down. Instead they were partially reconstructed bottom-up by gathering the remaining
leaf nodes.
The most difficult task was to determine which of the two B-trees a particular leaf node belongs to. We spent two weeks
writing a program that used LVM space allocation patterns and metadata patterns to determine B-tree affiliation. The program
exceeded all expectations. It accurately assigned each leaf node to the correct B-tree.
We quickly determine the RAID settings and devirtualize the RAID layer. Then we were able to reconstruct the first volume. The second volume
was more difficult. The recovery rate was just a little over 20%. We had to apply file carving
techniques to improve the results.
Results:
The recovery rate for the first volume was about 80%. The metadata of the second volume was much more corrupted. After multiple iterations of
file carving operations, we achieved 50% recovery.
More information:
Read the in-depth technical paper Recovering thin-provisioned LVM volumes.
|