LVM on RAID5 broken – how to fix?

Some time ago one of my disks in my Software RAID5 failed. No big problem as I had two spare A08U-C2412 available to replace that single 1 TB SATA disk. I can’t remember any details but somewhat went wrong and I ended up with a non-booting system. I think I tried to add the HW-RAID as a physical volume to the LVM and thought of migrating the SW-RAID to the HW-RAID or do mirroring or such. Anyway: I booted into my rescue system that was on a RAID1 partition on those disks, but LVM didn’t came up anymore, because the SW-RAID5 wasn’t recognized during boot. So I re-created the md device and discovered that my PV was gone as well. =:-0

No, big deal, I thought, because I have a backup of that machine on another host. I restored /etc/lvm and tried to do a vgcfgrestore after I re-created the PV with pvcreate. First I didn’t use the old UUID, so vgcfgrestore complained. After creating the proper PV with UUID LVM did recognize the PV, VG and LVs. Unfortunately I can’t mount any of the LVs. Something seems to be broken:

hahn-rescue:~# mount /dev/vg/sys /mnt
mount: you must specify the filesystem type

Feb 19 07:50:02 hahn-rescue kernel: [748288.740949] XFS (dm-0): bad magic number
Feb 19 07:50:02 hahn-rescue kernel: [748288.741009] XFS (dm-0): SB validate failed

Running a gpart scan on my SW-RAID5 did gave me some results:

hahn-rescue:~# gpart /dev/md3

Begin scan...
Possible partition(SGI XFS filesystem), size(20470mb), offset(5120mb)
Possible partition(SGI XFS filesystem), size(51175mb), offset(28160mb)
Possible partition(SGI XFS filesystem), size(1048476mb), offset(117760mb)
Possible partition(SGI XFS filesystem), size(204787mb), offset(1168640mb)
Possible partition(SGI XFS filesystem), size(204787mb), offset(1418240mb)
Possible partition(SGI XFS filesystem), size(1048476mb), offset(1626112mb)

*** Fatal error: dev(/dev/md3): seek failure.

These is not the complete lists of VM as a comparison to the output of lvs shows:

hahn-rescue:~# lvs
  LV                 VG   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  storage1           lv   -wi-ao--   1.00t                                          
  AmigaSeagateElite3 vg   -wi-a---   3.00g                                          
  audio              vg   -wi-a---  70.00g                                          
  backup             vg   -wi-a---   1.00t                                          
  data               vg   -wi-a---  50.00g                                          
  hochzeit           vg   -wi-a---  40.00g                                          
  home               vg   -wi-a---   5.00g                                          
  pics               vg   -wi-a--- 200.00g                                          
  sys                vg   -wi-a---  20.00g                                          
  video              vg   -wi-a--- 100.00g                                          
  windata            vg   -wi-a--- 100.00g   

Please notice that /dev/lv/storage1 is my HW-RAID where I stored the images of /dev/vg/-LVs to run xfs_repair and such on. Anyway, the sizes of the recognized XFS partitions by gpart is mostly correct, but some a missing and xfs_repair can not do anything good on the backup images on storage1. Everything ends up in /lost+found, because the blocks seem to be mixed up somehow.

What I figured out is, that my old RAID5 device was in metaformat=1.2 whereas the new one is in format 0.9. My best guess is now to recreate the RAID5 device with format 1.2, do a vgcfgrestore on that and have (hopefully!) a working LVM with working LVs back that I can then mount again. If there’s anything else I might be able to try, dear Lazyweb, please tell me. Please see the attached config files/tarballs for a complete overview.

Side note: except for AmigaSeagateElite3, which is a dd image of an old Amiga SCSI disk, I should have a fairly complete backup on my second backup location, so there’s not much lost, but I would be a real timesaver when I would be able to recover the lost LVs. Both systems are behind DSL/cable and have a limit of 10 Mbps upstream. It would take weeks to transfer the data and sending an USB disk would be faster.

Uncategorized

2 thoughts on “LVM on RAID5 broken – how to fix?

Comments are closed.