Almalinux 8 unstable link BCM57414

Got 7 servers with the BCM57414 card (Broadcom SPF28 25G).
2 Poweredge R7515
5 HPE DL380

The network connection is created like this
nmcli con add type bond con-name bond0 ifname bond0 bond.options “mode=802.3ad,lacp_rate=fast,miimon=100” connection.autoconnect-slaves yes connection.autoconnect yes

nmcli con del ens3f0np0
nmcli con del ens3f1np1

nmcli con add type vlan ifname backbone con-name backbone dev bond0 id 10 ipv4.method manual ipv4.addresses x.x.x.x/24 ipv4.gateway x.x.x.x ipv4.dns 8.8.8.8 ipv6.method ignore connection.autoconnect

nmcli con add type ethernet ifname ens3f0np0 con-name ens3f0np0 slave-type bond master bond0 connection.autoconnect no

nmcli con add type ethernet ifname ens3f1np1 con-name ens3f1np1 slave-type bond master bond0 connection.autoconnect no

The switch is a Dell Powerswitch S5224F-ON

2 of the machines the link is stable (1 Dell and one HPE). When swapping cards, the same machines are stable (no port flapping).

Tried installing MS server on 1 of the problematic servers and the LACP is working as expected.

Also tried without lacp, but the port flapping happens.

Any tips how to find out what’s going on?
All machines patched to latest AL 8.7

################

lspci -vv | grep 57414 -A20

08:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
Subsystem: Broadcom Inc. and subsidiaries Device 4141
Physical Slot: 3
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
NUMA node: 0
Region 0: Memory at 92d10000 (64-bit, prefetchable) [size=64K]
Region 2: Memory at 92c00000 (64-bit, prefetchable) [size=1M]
Region 4: Memory at 92d22000 (64-bit, prefetchable) [size=8K]
Expansion ROM at 93300000 [virtual] [disabled] [size=256K]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Product Name: Broadcom Adv. Dual 25Gb Ethernet
Read-only fields:
[PN] Part number: BCM957414
[MN] Manufacture ID: 1028
[V0] Vendor specific: FFV22.21.07.80
[V1] Vendor specific: DSV1028VPDR.VER2.1
[V2] Vendor specific: NPY2
[V3] Vendor specific: PMTD
[V4] Vendor specific: NMVBroadcom Corp
[V5] Vendor specific: DTINIC
[V6] Vendor specific: DCM1001FFFFFF1202FFFFFF1403FFFFFF1604FFFFFF1805FFFFFF1A06FFFFFF1C07FFFFFF1E08FFFFFF2101FFFFFF2302FFFFFF2503FFFFFF2704FFFFFF2905FFFFFF2B06FFFFFF2D07FFFFFF2F08FFFFFF
[V7] Vendor specific: L1D0
[RV] Reserved: checksum good, 85 byte(s) reserved
End
Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [a0] MSI-X: Enable+ Count=74 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=000004a0
Capabilities: [ac] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-

ll /sys/bus/pci/devices/0000:08:00.0/

total 0
-r–r–r–. 1 root root 4096 Jan 23 12:40 aer_dev_correctable
-r–r–r–. 1 root root 4096 Jan 23 12:40 aer_dev_fatal
-r–r–r–. 1 root root 4096 Jan 23 12:40 aer_dev_nonfatal
-r–r–r–. 1 root root 4096 Jan 23 12:40 ari_enabled
-rw-r–r–. 1 root root 4096 Jan 23 12:40 broken_parity_status
-r–r–r–. 1 root root 4096 Jan 23 12:40 class
-rw-r–r–. 1 root root 4096 Jan 23 12:40 config
-r–r–r–. 1 root root 4096 Jan 23 12:40 consistent_dma_mask_bits
-r–r–r–. 1 root root 4096 Jan 23 12:40 current_link_speed
-r–r–r–. 1 root root 4096 Jan 23 12:40 current_link_width
-rw-r–r–. 1 root root 4096 Jan 23 12:40 d3cold_allowed
-r–r–r–. 1 root root 4096 Jan 23 12:40 device
-r–r–r–. 1 root root 4096 Jan 23 12:40 dma_mask_bits
lrwxrwxrwx. 1 root root 0 Jan 23 12:40 driver → …/…/…/…/bus/pci/drivers/bnxt_en
-rw-r–r–. 1 root root 4096 Jan 23 12:40 driver_override
-rw-r–r–. 1 root root 4096 Jan 23 12:40 enable
lrwxrwxrwx. 1 root root 0 Jan 23 12:40 firmware_node → …/…/…/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:5d/device:5e
drwxr-xr-x. 3 root root 0 Jan 23 12:40 hwmon
-r–r–r–. 1 root root 4096 Jan 23 12:40 irq
drwxr-xr-x. 2 root root 0 Jan 23 12:40 link
-r–r–r–. 1 root root 4096 Jan 23 12:40 local_cpulist
-r–r–r–. 1 root root 4096 Jan 23 12:40 local_cpus
-r–r–r–. 1 root root 4096 Jan 23 12:40 max_link_speed
-r–r–r–. 1 root root 4096 Jan 23 12:40 max_link_width
-r–r–r–. 1 root root 4096 Jan 23 12:40 modalias
-rw-r–r–. 1 root root 4096 Jan 23 12:40 msi_bus
drwxr-xr-x. 2 root root 0 Jan 23 12:40 msi_irqs
drwxr-xr-x. 3 root root 0 Jan 23 12:40 net
-rw-r–r–. 1 root root 4096 Jan 23 12:40 numa_node
-r–r–r–. 1 root root 4096 Jan 23 12:40 pools
drwxr-xr-x. 2 root root 0 Jan 23 12:40 power
-r–r–r–. 1 root root 4096 Jan 23 12:40 power_state
–w–w----. 1 root root 4096 Jan 23 12:40 remove
–w-------. 1 root root 4096 Jan 23 12:40 rescan
–w-------. 1 root root 4096 Jan 23 12:40 reset
-rw-r–r–. 1 root root 4096 Jan 23 12:40 reset_method
-r–r–r–. 1 root root 4096 Jan 23 12:40 resource
-rw-------. 1 root root 65536 Jan 23 12:40 resource0
-rw-------. 1 root root 65536 Jan 23 12:40 resource0_wc
-rw-------. 1 root root 1048576 Jan 23 12:40 resource2
-rw-------. 1 root root 1048576 Jan 23 12:40 resource2_wc
-rw-------. 1 root root 8192 Jan 23 12:40 resource4
-rw-------. 1 root root 8192 Jan 23 12:40 resource4_wc
-r–r–r–. 1 root root 4096 Jan 23 12:40 revision
-rw-------. 1 root root 262144 Jan 23 12:40 rom
lrwxrwxrwx. 1 root root 0 Jan 23 12:40 subsystem → …/…/…/…/bus/pci
-r–r–r–. 1 root root 4096 Jan 23 12:40 subsystem_device
-r–r–r–. 1 root root 4096 Jan 23 12:40 subsystem_vendor
-rw-r–r–. 1 root root 4096 Jan 23 12:40 uevent
-r–r–r–. 1 root root 4096 Jan 23 12:40 vendor
-rw-------. 1 root root 0 Jan 23 12:40 vpd

cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: XX:XX:XX:XX:XX:XX
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 2
Actor Key: 21
Partner Key: 32
Partner Mac Address: XX:XX:XX:XX:XX:XX

Slave Interface: ens3f0np0
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: XX:XX:XX:XX:XX:XX
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: b0:26:28:cd:90:f0
port key: 21
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 4096
system mac address: XX:XX:XX:XX:XX:XX
oper key: 32
port priority: 32768
port number: 33
port state: 61

Slave Interface: ens3f1np1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: XX:XX:XX:XX:XX:XX
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: XX:XX:XX:XX:XX:XX
port key: 21
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 4096
system mac address: XX:XX:XX:XX:XX:XX
oper key: 32
port priority: 32768
port number: 65
port state: 61