Week 32, 2014 – Well, That Escalated Quickly…
In the middle of the night on July 31st, the motherboard in my workstation burst into flames. Things just sorta went downhill from there…
That is not hyperbole, by the way. A slight exaggeration, maybe– it was more of a candle than a conflagration –but in any case I had to ditch my plans to buy more screenprinting supplies (Which would have included a proper t-shirt platen, a poster platen, and some larger screens) with all of my disposable income for August, and fix my computer instead.
Well, with all my money spent, and everything I need to fix my computer– hopefully, I actually still don’t know if the CPU and RAM are okay –my 12 terabyte RAID5 array, where all of my media, as well as everything I’ve worked so hard on for my entire adult life is stored, decided to suffer a multi-disk failure.
15 years of work, gone. Poof! All of my writings, including two series that I’ve been developing since the turn of the millennium, something like 30,000 photographs, all of my product designs, electronic circuits, CAD drawings, artwork… Gone. I don’t know what to do, I’m stuck. Without the RAID array, all my work is gone…I don’t even have anywhere to put NEW work. I’m not even sure I can put an OS on my machine when I fix it, because all our Windows license keys were in a file on the RAID array… I guess poor people just aren’t supposed to even try being productive.
If I can just fix this one drive, I can get the RAID back… I’ve seen the prices for data recovery… It’s more than the rent on my apartment. And I don’t think it covers this situation, because I need the disk or an exact clone of it, for the RAID to recognize it…not just the ‘disk contents’. And paying for a full RAID recovery is totally out of the question. That would cost more than I get in several months put together! I may have to resort to swapping the platters myself, which I should be fully capable of doing, given the right tools.
Drive Problem Details:
The array is RAID level 5 running on a HighPoint RocketRaid 2340, and it consisted of seven 2TB HDDs. (Five WD Greens, and two much newer Seagates) Of those, 1-2 (Array#-Disk#) had been dropped due to the typical Green timeout bullshit, and then timed out at the end of its rebuild…usually it succeeds on the second try, but apparently it wasn’t taking this time, and during rebuild attempt #3 disk 1-5 failed outright.
When I put 1-5 in an external dock on another computer, it showed up as 134217728.00 GB of ‘Unallocated’ space. (Yes, that’s 134 petabytes) I’ve been led to understand that this absurd figure is simply the 48-bit LBA address space limit, and usually means that it can’t read the first sector of the disk. After a few more tries, the disk stopped showing up in the Drive Manager at all when I turned the dock on.
After swapping the PCB from an identical model/stepping drive (WD20EARS-00MVWB0) it worked again, but gave the same figure. Given that the drive doesn’t make any strange noises and it certainly seems to spin up and I think I hear the head un-park, but then I don’t hear it begin to read, I surmise that the fault is in the head assembly. It is, after all, a WD Green that has been in continuous use in a RAID for almost 4 years.
It would seem that the most straightforward course of action to recover 1-5 would be to transplant the platters (Using the appropriate tools from eBay.) into a NOS drive of the same model/stepping…which also seem abundant on eBay. With luck, this will get the RAID array back long enough for me to successfully rebuild a drive, then replace drive 1-5.
Well, at least I can install Windows. I found the official Microsoft download link for the ISO (I always lose the discs I burn when I install…), and I was able to get my Win7 Ultimate N key off my primary machine’s HDD with a key recovery tool. Added the ‘drive problem details’ and ‘next step’ sections. Maybe I can get some freebie professional advice?
IT’S ALIIIIVE! HighPoint support provided me with a DOS tool that let me arbitrarily construct arrays without actually affecting the disk contents. (As well as backing up the original array info) And I was able to get it back up and running by sticking a spare 2TB drive in the 1-5 slot to pad the array out to the full size. (HighPoint support later praised my resourcefulness, because I shouldn’t have been able to recover the array without all the member disks.) It was short 2TB of data, of course, and I did something clever to find any corrupt files/directories:
robocopy "K:" "C:\TMP" /E /CREATE /R:0 /TEE /LOG:"C:\corruptionlog.txt"
Which copied all the contents of the array that it could (some 1.4 million files in 200k directories) to C:\TMP as 0-byte files with zero retry attempts, while logging all console output to a log file. (It took just over an hour.) Scanning the log file turned up any instances of files/folders coming up ‘Corrupt/Inaccessible’.
And with the exception of a single Inventor project (Which consisted almost entirely of parts from other projects) I didn’t lose any work. In fact, the damage was almost entirely contained to some items from the past 3 weeks, and unsorted torrent/newsgroup download directories. (Those last two alone added up to over a terabyte…)
Another nice thing about using robocopy like that is, when I eventually take a crack at restoring 1-5 to working order, and reintroduce it to the array, I’ll be able to run that again, and compare the two logs to get a list of files that weren’t present without the missing disk.
So, after almost 72 hours without sleep, I finally got the array back up and running and can begin recovery! (Though I took a little nap instead.) Which of course means I need to come up with money to buy some WD Reds to make a new– and actually reliable –array. (As well as probably upgrade to a RAID card that supports RAID6, just in case.) Not that I have any idea where that money will come from…
But in the meantime, I’m backing up everything of importance to some spare 1.5TB drives I have, and then backing up the photos and work to 25GB BD-R discs.
Computer works! The new motherboard works like a charm. It took some bit-wrangling to get the old Windows install working as a temporary solution, but I plan to do a reinstall once everything has calmed down. (I still have a god damned headache from the caffeine I ingested to keep my ass up for three days… I’m getting too old for this shit.)