GRUB error after upgrade from CentOS 7.9 to AlmaLinux 8.5

Hello,

During the test-upgrade of one of the VM’s we are shipping our software on I ran into an issue where just before the last reboot in the process of upgrading OS from CentOS 7.9 to AlmaLinux 8.5 (after reboot from ELevate Intramfs) the machine does not boot. This is the error message from GRUB:

error: symbol 'grub_calloc' not found

Now I did some googling and it seems like this is likely caused by GRUB looking for a function that does not exist in its source, and that is because of a faulty update.

I did have two inhibitors I had to deal with prior to the upgrade.

  • Had to stop NFS, and unmount /proc/fs/nfsd
  • Had to blacklist two deprecated drivers (mptctl, mptbase) - which I blacklisted in the /etc/default/grub by adding module_blacklist=mptctl,mptbase to GRUB_CMDLINE_LINUX

Considering that, I booted from Alma ISO to rescue the system, and reinstalled grub on sda, which solved the problem, the machine booted and finished the upgrade to AlmaLinux 8.5

I have a way of recovery, now I am wondering how can I prevent this from happening altogether?

There is a permanent entry in the leapp-report stating that GRUB core will not be updated on legacy (BIOS) systems, I believe that this is the issue I am encountering. I’ve found an accepted solution to that issue on Red Hat forums suggesting that if you have GRUB2 installed (which I have on that VM) this should not be an issue.

Risk Factor: high
Title: GRUB core will be updated during upgrade
Summary: On legacy (BIOS) systems, GRUB core (located in the gap between the MBR and the first partition) does not get automatically updated when GRUB is upgraded.

I do not know if it changes things, but there is RAID array configured on the VM:

sda                    8:0    0   120G  0 disk
├─sda1                 8:1    0   500M  0 part
│ └─md0                9:0    0   499M  0 raid1 /boot
└─sda2                 8:2    0 119.5G  0 part
  └─md1                9:1    0 119.5G  0 raid1
    ├─rootvg-lv_root 253:0    0    40G  0 lvm   /
    ├─rootvg-lv_swap 253:1    0   7.9G  0 lvm   [SWAP]
    └─rootvg-lv_home 253:2    0  71.6G  0 lvm   /home
sdb                    8:16   0   120G  0 disk
├─sdb1                 8:17   0   500M  0 part
│ └─md0                9:0    0   499M  0 raid1 /boot
└─sdb2                 8:18   0 119.5G  0 part
  └─md1                9:1    0 119.5G  0 raid1
    ├─rootvg-lv_root 253:0    0    40G  0 lvm   /
    ├─rootvg-lv_swap 253:1    0   7.9G  0 lvm   [SWAP]
    └─rootvg-lv_home 253:2    0  71.6G  0 lvm   /home
sr0                   11:0    1  1024M  0 rom

I would really like to know if I can prevent this from happening, as I expect to perform the upgrade on a lot of machines with similar configuration.

Thanks!

I was able to work around this issue by breaking the RAID, boot from only a single drive and run ELevate against it, then when done rebuild the RAID.