Hardware and Volume configuration:
A customer has a Netgear NAS/SAN with ten drives. Two of the drives suffered mechanical problems and brought down the system. The customer believed this was a RAID-50 (a RAID0 spanned over RAID5s). Our analysis shows that this was a ZFS storage pool consisting of three RAIDZ1s. Besides the two damaged drives, one of the drives did not contain any relevant data. Below was the status of the ZPool:
Vol1
raidz1-0: DEGRADED
Missing UNAVAIL
9E 88 69 A0 B4 C1 16 EE ACCESSABLE
4A 50 2D 8B 61 BD B7 96 ACCESSABLE
raidz1-1: NOT RECOVERABLE (insufficient replicas)
56 6F 3F 0D 7B FE 6D D2 ACCESSABLE
Missing UNAVAIL
Missing UNAVAIL
raidz1-2: GOOD
5F 14 EC A8 77 86 36 9B ACCESSABLE
C1 D4 89 81 DC E9 F9 14 ACCESSABLE
7C F6 4E 99 D7 2D 86 24 ACCESSABLE
57 0C 7C CA 4D 29 74 B8 ACCESSABLE
The Zpool provided storage for a number of virtual volumes (VHDX) that held the customer's Microsoft Exchange databases, and backups of users' emails and other documents.
Problem:
Since one of the vdevs (raidz1-1 in the list above) was missing, around 30% of the storage space was no longer accessible. It's almost certain that large files such as databases would have gaps in the data stream. At best, this would be a partial recovery.
Solution:
There were a few lucky break that helped us solving this case. Apparently, ZFS prefers to store data on the vdevs with more drives and/or faster drives, and shies away from vdevs running in degraded mode. For this ZPool, raidz1-2 was added at a later time. Not only it had more drives but also the drives had larger capacity and faster speed. Furthermore, the other two vdevs had been running in degraded mode for sometimes. As a result, most of recent updates were stored on this last vdev. we could read 100% of metadata with double or triple DVAs and rebuild the complete folder structure. At the file data level, around 40% of older data were missing, but newer data is almost 100% recoverable.
Result:
We were able to recover most of newer backup files and documents. Older files were partially recovered. The Exchange databases were recovered with gaps filled in with zeros. The customer were able to extract useful data from the databases using third party software.
|