Alma 9.0 Raid 1 installation unable to boot with failed/removed drive

As a longtime Centos 6/7 user, I’m trying to install Alma 9.0 on a slightly older motherboard.

My goal is to have two drives in raid-1 (mirroring), and be able to
automaticly boot from either drive alone into a running system with
degraded raid arrays. Then in case of drive failure, I just install
another, clone the partitioning, and rebuild the raid-1 arrays.

The system boots and runs nicely, but I’m trying to test that this system will
boot properly with one failed drive by alternately disconnecting one
drive’s sata cable, and then the other.

With either drive’s sata cable disconnected, the system wil get
through a grub menu and load the Alma-linux kernel, but then the fun
begins. The initrd code waits a long timeout, then asks for the root
password and enters dracut emergency mode.

Is it correct to expect an Alma 9 system with a failed or removed raid
member to boot properly into a running system with a all raid volumes
degraded? I seem to recall this is how things worked in centos 6 and
7.

Lots more details:

Advantech AIM-584 industrial MATX motherboard; bios boot only (no EFI)
intel i5-4590S CPU
two 6 TB sata drives.

I Re-used GPT partitioning and mdadm raid setup from a previous
install of some recent fedora release, which looks like this:

$ sudo fdisk -l /dev/sda

Disk /dev/sda: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD6003FFBX-6
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4E1B36B5-BA01-3E4A-8403-9A56E7F1AEE7
 
Device         Start         End     Sectors  Size Type
/dev/sda1       2048        6143        4096    2M BIOS boot
/dev/sda2    4196352     8390655     4194304    2G Linux RAID
/dev/sda3    8390656   218105855   209715200  100G Linux RAID
/dev/sda4  218105856   427821055   209715200  100G Linux RAID
/dev/sda5  427821056   532678655   104857600   50G Linux RAID
/dev/sda6  532678656 11721043967 11188365312  5.2T Linux RAID

$ sudo fdisk -l /dev/sdb

Disk /dev/sdb: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD6003FFBX-6
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4E1B36B5-BA01-3E4A-8403-9A56E7F1AEE7
 
Device         Start         End     Sectors  Size Type
/dev/sdb1       2048        6143        4096    2M BIOS boot
/dev/sdb2    4196352     8390655     4194304    2G Linux RAID
/dev/sdb3    8390656   218105855   209715200  100G Linux RAID
/dev/sdb4  218105856   427821055   209715200  100G Linux RAID
/dev/sdb5  427821056   532678655   104857600   50G Linux RAID
/dev/sdb6  532678656 11721043967 11188365312  5.2T Linux RAID

$ sudo lsblk -l -o +UUID

NAME  MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS UUID
sda     8:0    0  5.5T  0 disk              
sda1    8:1    0    2M  0 part              
sda2    8:2    0    2G  0 part              910b4fa4-c2a0-a6d6-4877-9d82bcc126ff
sda3    8:3    0  100G  0 part              b65bd4f9-dd9c-eec3-c38a-fc80d0dde46a
sda4    8:4    0  100G  0 part              746bd41e-bce4-fb78-221a-a4018b91fe69
sda5    8:5    0   50G  0 part              79402428-ab74-b67c-e93e-57fa3095fa81
sda6    8:6    0  5.2T  0 part              44915d0b-d025-58c7-8295-cbc587bb917f
sdb     8:16   0  5.5T  0 disk              
sdb1    8:17   0    2M  0 part              
sdb2    8:18   0    2G  0 part              910b4fa4-c2a0-a6d6-4877-9d82bcc126ff
sdb3    8:19   0  100G  0 part              b65bd4f9-dd9c-eec3-c38a-fc80d0dde46a
sdb4    8:20   0  100G  0 part              746bd41e-bce4-fb78-221a-a4018b91fe69
sdb5    8:21   0   50G  0 part              79402428-ab74-b67c-e93e-57fa3095fa81
sdb6    8:22   0  5.2T  0 part              44915d0b-d025-58c7-8295-cbc587bb917f
md0     9:0    0  5.2T  0 raid1 /home       5efdab1f-6bbf-4d76-bec6-bab2f2a02190
md1     9:1    0    2G  0 raid1 /boot       e0d6518f-60dd-48dd-aba5-450923e2ce0f
md2     9:2    0 99.9G  0 raid1 /           8870ab9b-c86f-423c-9934-73bcb9822bca
md5     9:5    0   50G  0 raid1             4e19527a-fc5b-4d72-a15f-b639a1a1bf1d
md127   9:127  0 99.9G  0 raid1             813f4a49-a94e-474e-84e5-fddc0a509361

While installing Alma9 using the usual anaconda, I picked the
mountpoints above from the custom-partitioning menu, much like I’ve
done in the past when reinstalling for example centos 7 over centos 6 while preserving /home.

I’ve done the traditional redundant boot thing of placing a biosboot partition on both drives,
and installing grub onto both drives:

$ sudo /sbin/grub2-install /dev/sda
$ sudo /sbin/grub2-install /dev/sdb

Again, all is normal with both drives in system.

With either drive’s sata cable disconnected, the system wil get
through a grub menu and load the linux kernel, but the initrd-dracut
waits a long timeout, then asks for the root password and enters
emergency mode.

The error messages are about being unable to assemble the raid arrays.

A normal boot has these volumes mounted (the spare partitions are ignored) (from “df”):

/dev/md2        102559672  3148376   94155360   4% /
/dev/md1          2022248   262048    1639112  14% /boot
/dev/md0       5548807328 44030456 5225057960   1% /home

which is exactly what I expect, seeing as /etc/fstab contains:

UUID=8870ab9b-c86f-423c-9934-73bcb9822bca /                       ext4    defaults        1 1
UUID=e0d6518f-60dd-48dd-aba5-450923e2ce0f /boot                   ext4    defaults        1 2
UUID=5efdab1f-6bbf-4d76-bec6-bab2f2a02190 /home                   ext4    defaults        1 2

/etc/mdadm.conf on the normal root partition contains:

# cat /etc/mdadm.conf

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/0 level=raid1 num-devices=2 UUID=44915d0b:d02558c7:8295cbc5:87bb917f
ARRAY /dev/md/1 level=raid1 num-devices=2 UUID=910b4fa4:c2a0a6d6:48779d82:bcc126ff
ARRAY /dev/md/2 level=raid1 num-devices=2 UUID=b65bd4f9:dd9ceec3:c38afc80:d0dde46a
ARRAY /dev/md/5 level=raid1 num-devices=2 UUID=79402428:ab74b67c:e93e57fa:3095fa81

From the dracut emergency shell, I can get the large /home raid1 array on md0 to assemble and become mountable with:
mdadm --assemble --scan --verbose
but I’m unable to get / and /boot assembled; those md devices stay “inactive” in /proc/mdstat:

Personalities : [raid1] 
md0 : active raid1 sda6[1]
      5594050560 blocks super 1.2 [2/1] [_U]
      bitmap: 1/42 pages [4KB], 65536KB chunk

md2 : inactive sda3[1](S)
      104791040 blocks super 1.2
       
md1 : inactive sda2[1](S)
      2094080 blocks super 1.2
       
md5 : inactive sda5[1](S)
      52395008 blocks super 1.2

Part of the output from running “mdadm --assemble --scan --verbose” a second time is:

mdadm: looking for devices for /dev/md/2
mdadm: no recogniseable superblock on /dev/md/0
mdadm: no recogniseable superblock on /dev/sdb2
mdadm: no recogniseable superblock on /dev/sdb1
mdadm: Cannot assemble mbr metadata on /dev/sdb
mdadm: /dev/sda6 has wrong uuid.
mdadm: /dev/sda5 has wrong uuid.
mdadm: /dev/sda4 has wrong uuid.
mdadm: /dev/sda3 is busy - skipping
mdadm: /dev/sda2 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda1
mdadm: Cannot assemble mbr metadata on /dev/sda

/dev/sda2 has wrong uuid.” is suspicious, that should be part of /dev/md1 /boot

Other things possibly of interest… when booted successfully with both drives:

# cat /etc/default/grub

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=4e19527a-fc5b-4d72-a15f-b639a1a1bf1d rd.md.uuid=b65bd4f9:dd9ceec3:c38afc80:d0dde46a rd.md.uuid=910b4fa4:c2a0a6d6:48779d82:bcc126ff rd.md.uuid=79402428:ab74b67c:e93e57fa:3095fa81 8250.nr_uarts=12"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

# cat /proc/cmdline
BOOT_IMAGE=(mduuid/910b4fa4c2a0a6d648779d82bcc126ff)/vmlinuz-5.14.0-70.26.1.el9_0.x86_64 root=UUID=8870ab9b-c86f-423c-9934-73bcb9822bca ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=4e19527a-fc5b-4d72-a15f-b639a1a1bf1d rd.md.uuid=b65bd4f9:dd9ceec3:c38afc80:d0dde46a rd.md.uuid=910b4fa4:c2a0a6d6:48779d82:bcc126ff rd.md.uuid=79402428:ab74b67c:e93e57fa:3095fa81 8250.nr_uarts=12

I’d like to attach the rdsosreport.txt found in dracut emergency mode, but the upload doesn’t seem to accept .txt I guess I can paste it into a followup.

Thanks for any suggestions!

My next step might be to wipe most of the partitions and reinstall,
painfully doing similar partioning manually in the installation gui.
But I’m hoping that instead there’s a chance to learn somthing by
fixing this almost-working install. Might even turn out to be a bug
to report.

i recently tried something similar with rhel8 and couldn’t get it to boot if i removed /dev/sda

i could remove /dev/sdb and put it back again after a reboot and all sorts, just couldn’t get it to boot if /dev/sda was gone.

this was an EFI system so even more complicated by the fact that you can’t just do a simple grub2-install /dev/sdb

i’m not convinced its even supposed to work with /boot on software raid1, as it wouldn’t mirror the mbr.

This definitely seems wrong: One should be able to recover from a failed drive somehow, without using another distribution as a rescue system to add a replacment drive to each raid array.

Anyway, I’ve had some success by changing the rd.md parameters on the kernel command line.
by replacing the explicit rd.md.uuid= entries with:
rd.md=1 rd.md.conf=1 rd.auto=1 rd.retry=30 rd.timeout=200
and then booting with one drive or the other disconnected, I get a short pause,
then the system comes up with degraded raid arrays.

Testing this way seems somewhat hazardous to one’s raid arrays: when I power off, reconnect the second drive, and boot up, the system comes up, but in a hodgepodge of raid array states:
Two fully assembled raid 1 with two volumes
some degraded using only /dev/sda partition
some degraded using only their /dev/sdb partitions.
So far I’ve been able to patch things up with mdadm --re-add . The small raid arrays do a full resync that takes several minutes, the 5 TB one does an incremental re-sync in only a few minutes. But I don’t think I want to do this test too many times.

I created the alternative kernel command lines by adding files for each experiment to /boot/loader/entries/*
But I don’t think these will get updated when kernel upgrades come along,
so next I’ll change GRUB_CMDLINE_LINUX=
in /etc/default/grub

Bottom line: There are mysteries in the rd.* kernel parameters and dracut that I don’t fully understand yet

yes using device numbers instead of uuid’s is explicitly mentioned in some redhat docs as “A Bad Thing” but i can’t see how you could ever boot from sdb if sda is hardcoded in various places.

i’ve given up for now as there’s not much call for baremetal anymore and mdraid sucks in a vm, but i’ll be following your progress just out of interest!