Quantcast
Channel: Intel Communities : Discussion List - Chipsets
Viewing all articles
Browse latest Browse all 3841

RAID5 failure during migration

$
0
0

So I don't know if anyone will be able to help, because this is a pretty unusual situation, but I though it was worth describing.

 

I have (had) a three drive RAID5 array on the ICH10R chipset.  I wanted to expand the array by adding a fourth drive shortly after upgrading to a new motherboard.  I used the Rapid Storage Technology Interface to add the drive, and the data migration started and was doing well (26% complete) when I went to bed for the evening.

 

Well, the next morning I awoke to find the (brand new) machine in some failed power management state that it would not wake up from.  I perform a hard reset and the BIOS reports that the RAID array is in the middle of a migration.   Hooray, I think.... There's hope yet that the right thing will happen.  Only it turns out that the previous hang was some kind of failed hibernation, and Windows won't boot correctly this time either.  Another hard reset and...  You guessed it -- All four drives in the array are marked failed.

 

After confirming that Windows will boot correctly, I shutdown gracefully and went back into the BIOS.  Apparently the ICH10R decides that since all four drives have failed simultaneously, that maybe there's not really anything wrong, and so it asks if I would like to try and "recover" my array (Y/N).  Having little lose at this point, I pressed "Y" three times in a row, and watched as it added drives 2, 3, and 4 back into the array.  Now my array is spontaneously marked as "degraded", which is definitely a step up from "failed".

 

After booting into Windows, it detects the array and CHKDSK offers to "fix" my corrupted volume...  Ha!  Not falling for that one.  I politely declined. :-)

 

At this point, I read some reviews, did some more research and concluded that my best bet was to try a read-only recovery program to see what could be recovered from the current configuration.  I chose R-Studio based on several positive recommendations and a reasonable price.  When pointed the software at the failed array, it immediately detected practically all of my files, and happily recovered them all to an external drive (over about 24 hours).  Again, I breathe a deep sigh of relief.

 

But sadly, it was premature.  It turns out that some of the recovered files are randomly scrambled, while other files are fine.  It's a little hard to tell for certain what the pattern is, but it appears that older files are ok, and newer files are corrupt.  There are a few exceptions, suggesting that the actual explanation is related to the block order on the drive, and which files were completely migrated to the fourth drive, and which were not.  I figure the migration was probably about 33% complete when the machine decided to demonstrate it's inability to hibernate correctly.

 

Obviously, I'd like to get the other third (or two-thirds) of the files back if possible as well.  I figure that my best bet is to get the array rebuilt in a degraded state using drives 1, 2, and 3 instead of 2, 3, and 4.  Then I can repeat the recovery that worked before, and hopefully the corrupt files will be fine this time, and the files that were previously recovered correctly will now be corrupt...  The question before you fine reader, is whether I:

 

1. Unplug the fourth drive, failing the entire array again in hopes that the BIOS will offer to "recover" again using drives 1, 2, and 3?

 

2. Mark all four drives as "normal" and attempt to create a virtual array from drives 1, 2, and 3.

 

3. Delete the array, and recreate it as new array using the first three drives.

 

I see little harm in trying option one first.  I figure I can always mark the drives as "normal" and fall back to option number 2.  The third sounds dangerous to me.  Does anyone have enough experience with failed arrays to suggest whether any of these approaches are more or less wise than the others?

 

I fullly understand the principles of RAID arrays, but honestly have no experience with what actually happens when you start intentionally failing more than the allowed number of drives in the array.  The official Intel documentation says that all data on a "failed" array is irrecoverably lost, but having four out of four good drives obviously leaves me in a situation that is rarely discussed.

 

Thanks for any comments, feedback or opinions!


Viewing all articles
Browse latest Browse all 3841

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>