Well, yesterday I wrote about a looping RAID1 rebuild:
Usually I would wait for the rebuild to be finished and asking the hosting provider to replace the second disk (sdb), but as the medium error causes just another rebuild process, this won’t work for obvious reasons. So, the next plan is to fail the drive sdb and get a replacement disk then. Hopefully the first disk won’t fail as well then… but well, Murphy, Backup, stuff… sigh
Speaking of Murphy… it happened, of course, as it had to happen. I failed one of the drives of one of the RAIDs, did a pvmove to move all data from one RAID1 device to another and … then Murphy came along! The machine hang after I went to bed late in the night.
The next morning greeted me with a instant messenger notice from my coworker that the disks stopped working and he rebooted the machine. From the rescue system one could see SMART errors on both disks, so replacing one disk and then the other wouldn’t have worked. Sadly the hosting provider requested € 69.- for a temporary third disks. So we went right away with backupping the data from the still working RAID and LVM, then requested the replacement disks and restored the machine from scratch.
Now the machine is back and working normally again. *pheeew*