Idrac error on boot of Alma 9.3

HI,

I’m getting the following idrac errors on two of my just refreshed Dell R640’s (fresh install of Alma 9) during bootup of the OS. However no errors show on the idrac until about half way though the OS boot process.

An OEM diagnostic event occurred. Fri Jan 05 2024 09:41:47
A fatal error was detected on a component at bus 2 device 0 function 0. Fri Jan 05 2024 09:41:46
An OEM diagnostic event occurred. Fri Jan 05 2024 09:41:46
An OEM diagnostic event occurred. Fri Jan 05 2024 09:41:46
A fatal error was detected on a component at bus 2 device 0 function 0. Fri Jan 05 2024 09:41:46
An OEM diagnostic event occurred.

I suspect the error is related to the Embedded AHCI 1 device as this is labelled as device 0 in the hardware Inventory.

I tried blacklisting the acpi_power_meter and while that does remove the following error in the OS i still get the idrac errors.

Jan 03 15:18:43 vmstorage2 kernel: ACPI Error: No handler for Region [SYSI] (000000009d3caf94) [IPMI] (20221020/evregion-130)
Jan 03 15:18:43 vmstorage2 kernel: ACPI Error: Region IPMI (ID=7) has no handler (20221020/exfldio-261)
Jan 03 15:18:43 vmstorage2 kernel: ACPI Error: Aborting method _SB.PMI0._GHL due to previous error (AE_NOT_EXIST) (20221020/psparse-529)
Jan 03 15:18:43 vmstorage2 kernel: ACPI Error: Aborting method _SB.PMI0.PMC due to previous error (AE_NOT_EXIST) (20221020/psparse-529)
Jan 03 15:18:43 vmstorage2 kernel: ACPI: _SB
.PMI0: _PMC evaluation failed: AE_NOT_EXIST

Anyone any ideas? I’m planning on a rebuild to Alma 8 to see if this clears the problem.

Nigel

We have a bunch of R640’s and R650’s running 8.x, never have seen this issue. Hoping your test with 8.9 resolves this.

So Dell support really had no suggestions at all? Hmm…

I’m presuming the iDRAC 9’s already are running the 7.00.00.00 firmware.

Although we don’t install it by default on our linux servers, have you tried installing the Dell iDRAC Service Module v5.3.0.0 ? Might be worth a go before totally reloading with 8.9…

Good luck!

Hi,

Just to update, upgrading to idrac 7.00 made no different, but Alma 8 sorted the issue. Looking at the errors in the journalctl on boot the one that sticks out that is present in Alma 9 but not in Alma 8 is the following :

Jan 23 18:45:22 vmstorage2-bt kernel: ERST: Error Record Serialization Table (ERST) support is initialized.
Jan 23 18:45:22 vmstorage2-bt kernel: pstore: Registered erst as persistent store backend
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: event severity: corrected
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: Error 0, type: corrected
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: section_type: PCIe error
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: port_type: 7, PCIe to PCI/PCI-X bridge
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: version: 3.0
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: command: 0x0007, status: 0x0010
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: device_id: 0000:02:00.0
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: slot: 0
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: secondary_bus: 0x03
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: vendor_id: 0x1556, device_id: 0xbe00
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: class_code: 060400
Jan 23 18:45:22 vmstorage2-bt kernel: {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x001b

Going to say this is solved but obviously it there is still some underlying issue with Alma 9 and idrac on r640’s. A problem to be solved on my next build or maybe Alma 10 will sort :wink:

Nigel

I have access to R640 and R440. Both have AlmaLinux 9. Both give six lines:

# dmesg -t | grep -i error
ERST: Error Record Serialization Table (ERST) support is initialized.
ACPI Error: No handler for Region [SYSI] (000000003cf717a4) [IPMI] (20221020/evregion-130)
ACPI Error: Region IPMI (ID=7) has no handler (20221020/exfldio-261)
ACPI Error: Aborting method \_SB.PMI0._GHL due to previous error (AE_NOT_EXIST) (20221020/psparse-529)
ACPI Error: Aborting method \_SB.PMI0._PMC due to previous error (AE_NOT_EXIST) (20221020/psparse-529)

Neither has any packages from Dell repos installed.