I was thinking for some time about a fitting subject line for this post. Turns out: there is no fitting subject line. “Argh!” is the most fitting I could come up with. “But why, Chris?” you might ask. “Server goodness”, I would respond.
As you all know, the new server came with non-working Hitachi hard-disks that like to pop out the raid when IO stress is made. So Dell send replacement disks to address this issue. They did send Hitachi hard disks again, yet, I should try it. Turns out, the problem persists with those.
Yesterday I sent a enw batch of logfiles to Dell with the persistent error. After analyzing them they came up with a solution: “We need to replace all disks with another brand…” – I heard that line before – “… and also replace the RAID Controller” – that one I have not.
The replacement disks and raid controller are on their merry ways to me. My task is to backup the entire server again, replace all the hard drives with the new ones as well as to replace the RAID controller. Followed by a even more happier complete System rebuild. What does this mean for ..
– you: Downtime tomorrow. Approx noon (WEST) to evening’ish.
– me: Lots and lots of work.
The irony is that the new server with dual Power Supplies and RAID10 was supposed to eliminate any downtimes — we never had that many downtimes ever.
Update 25.06.2013, 21:41: Server is operational again, and will stay up this time! 😉