kernel error amd_pmu_v2_handle_ir


I encountered a bug fix in the 6.27 kernel version. The specific error I am facing is “kernel error amd_pmu_v2_handle_irq.” Here are the details of my server and kernel versions:

[root@newpanelhv85 ~]# uname -a
Linux newpanelhv85 5.14.0-284.18.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 29 17:06:27 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@newpanelhv85 ~]# cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)

I keep encountering the following error in the dmesg output:

Jul 17 15:16:28 newpanelhv85 kernel: ------------[ cut here ]------------
Jul 17 15:16:28 newpanelhv85 kernel: WARNING: CPU: 195 PID: 84434 at arch/x86/events/amd/core.c:972 amd_pmu_v2_handle_irq+0x2ca/0x2e0
Jul 17 15:16:28 newpanelhv85 kernel: Modules linked in: tls dm_mod vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat iptable_filter ip_tables bridge stp llc nfnetlink_cttimeout nfnetlink openvswitch nf_conncount rfkill nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl wmi_bmof pcspkr ipmi_ssif sunrpc vfat irdma fat ast drm_shmem_helper i40e drm_kms_helper acpi_ipmi syscopyarea ib_uverbs sysfillrect ses sysimgblt ipmi_si fb_sys_fops cdc_ether enclosure ib_core ipmi_devintf usbnet scsi_transport_sas mii k10temp i2c_piix4 ipmi_msghandler joydev acpi_cpufreq drm fuse xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sd_mod t10_pi sr_mod cdrom sg ahci libahci ice igb libata i2c_algo_bit megaraid_sas ccp gnss wmi sp5100_tco dca uas usb_storage
Jul 17 15:16:28 newpanelhv85 kernel: CPU: 195 PID: 84434 Comm: CPU 6/KVM Kdump: loaded Tainted: G        W        --------  ---  5.14.0-284.18.1.el9_2.x86_64 #1
Jul 17 15:16:28 newpanelhv85 kernel: Hardware name: ASUSTeK COMPUTER INC. RS700A-E12-RS12U/K14PP-D24 Series, BIOS 1002 05/24/2023
Jul 17 15:16:28 newpanelhv85 kernel: RIP: 0010:amd_pmu_v2_handle_irq+0x2ca/0x2e0
Jul 17 15:16:28 newpanelhv85 kernel: Code: c2 b9 01 03 00 c0 48 c1 ea 20 0f 30 66 90 e9 18 ff ff ff 31 d2 48 89 c6 bf 01 03 00 c0 e8 2e 2e 5a 00 e9 04 ff ff ff 45 31 e4 <0f> 0b e9 bf fe ff ff e8 da f8 b1 00 66 2e 0f 1f 84 00 00 00 00 00

I found information on the internet
https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg506355.html
suggesting that version 6.27 includes a bug fix for this issue. However, even after trying versions 6.3 and 6.4, the error still persists. It is worth noting that this problem only occurs in AlmaLinux and not in Rocky Linux 9.

I would appreciate any assistance with this matter.

What are you referring to when you say “version 6.27” or “versions 6.3 and 6.4” ?

asking for kernel
however now using 6.4.12 still have this error
Linux newpanelhv85 6.4.12-1.el8.elrepo.x86_64

------------[ cut here ]------------
[407523.577780] WARNING: CPU: 433 PID: 2589535 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x2c2/0x2d0
[407523.577793] Modules linked in: fuse(E) wireguard(E) libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) ip6_udp_tunnel(E) udp_tunnel(E) dm_mod(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) nft_compat(E) nft_chain_nat(E) nf_tables(E) bridge(E) stp(E) llc(E) nfnetlink_cttimeout(E) nfnetlink(E) openvswitch(E) nf_conncount(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) sunrpc(E) vfat(E) fat(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) kvm(E) ast(E) drm_shmem_helper(E) ipmi_ssif(E) irqbypass(E) irdma(E) drm_kms_helper(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) i40e(E) ghash_clmulni_intel(E) drm(E) sha512_ssse3(E) ib_uverbs(E) cdc_ether(E) usbnet(E) ses(E) rapl(E) syscopyarea(E) joydev(E) acpi_ipmi(E) wmi_bmof(E) mii(E) pcspkr(E) enclosure(E) acpi_cpufreq(E) sp5100_tco(E)

however now using 6.4.12 still have this error
Linux newpanelhv85 6.4.12-1.el8.elrepo.x86_64

(1) I see that this is a warning, not an error. Do you have any issue with running the system?

(2) You are running a kernel from ELRepo. As noted in kernel-ml, any bug should be reported to kernel.org.

(3) You say the problem is not seen in Rocky Linux. Did you install the same elrepo kernel(s) there?