AlmaLinux 9.1 lxc cloud image doesn't bring main interface up on startup

I reported this problem first on Almalinux/9/cloud container is created with broken networking - LXD - Linux Containers Forum , but maybe you will have a better idea what is going on. I would really like to use AlmaLinux as a lxc guest, but it doesn’t work for me so far. The image comes from https://uk.lxd.images.canonical.com/.

I discovered that running:

ip link set dev eth0 up
ip route add default via 169.254.0.1 dev eth0 proto static onlink 
ip route add 169.254.0.1 dev eth0 scope link

fixes the network, but only until the reboot.

The NetworkManager’s log:

Feb 20 16:37:36 alma systemd[1]: Starting Network Manager...
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8360] NetworkManager (version 1.40.0-1.el9) is starting... (boot:2245da45-f1df-4a4f-a511-4fcf1426ef1b)
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8363] Read config: /etc/NetworkManager/NetworkManager.conf (etc: 99-cloud-init.conf)
Feb 20 16:37:36 alma systemd[1]: Started Network Manager.
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8400] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8415] manager[0x55e310fe5090]: monitoring kernel firmware directory '/lib/firmware'.
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8429] hostname: hostname: using hostnamed
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8429] hostname: static hostname changed from (none) to "alma"
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8432] dns-mgr: init: dns=none,systemd-resolved rc-manager=unmanaged
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8444] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8445] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8445] manager: Networking is enabled by state file
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8459] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.40.0-1.el9/libnm-settings-plugin-ifcfg-rh.so")
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8478] settings: Loaded settings plugin: keyfile (internal)
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8490] dhcp: init: Using DHCP client 'internal'
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8491] device (lo): carrier: link connected
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8493] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8498] manager: (eth0): new Veth device (/org/freedesktop/NetworkManager/Devices/2)
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8506] device (eth0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8509] device (eth0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8513] device (eth0): Activation: starting connection 'eth0' (53f5332f-f73f-4d0b-92b3-14d8b9d2b392)
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8516] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8518] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8519] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.8521] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.9105] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.9106] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.9108] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 20 16:37:36 alma NetworkManager[132]: <info>  [1676911056.9109] device (eth0): Activation: successful, device activated.
Feb 20 16:37:42 alma NetworkManager[132]: <info>  [1676911062.8574] manager: startup complete

ip a after boot:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
35: eth0@if36: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:b2:dc:a2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 188[redacted]/32 scope global eth0

clound-init.log:

Content of the /etc/sysconfig/network-scripts/ifcfg-eth0:

# Created by cloud-init on instance boot automatically, do not edit.
#
AUTOCONNECT_PRIORITY=999
BOOTPROTO=none
DEFROUTE=yes
DEVICE=eth0
DNS1=8.8.8.8
DNS2=8.8.4.4
GATEWAY=169.254.0.1
IPADDR=188[redacted]
NETMASK=255.255.255.255
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

Does anyone have an idea what’s going on here? Is there anything else I can do to further debug the exact reason why the interface is always down on boot?

I found the solution!

It turns out that when you add to your cloud-init’s user-data boot-cmd commands that enable the interface, the issue gets magically resolved!

My theory is that because of the initially down interface, the cloud-init network setup fails in some weird way, breaking NetworkManager somehow. But if we’ll ensure that eth0 is up during the cloud-init run, everything completes fine, and NetworkManager works as expected. eth0 correctly gets automatically up after each reboot and the network is available!

Edit: Actually, bootcmd might be running after every reboot, so it might be just hiding the issue, but at least it’s some solution.

I have no idea what is exactly at fault here (AlmaLinux, cloud-init, NetworkManager or LXD), but at least we have a solution.

This is the example LXD profile for AlmaLinux 9.1, where networking actually works:

      - name: alma
        description: "Almalinux testing profile."
        config:
          user.user-data: |
            #cloud-config
            bootcmd:
              - nmcli c up eth0
              - nmcli d mod eth0 ipv4.gateway 169.254.0.1
          user.network-config: |
            version: 2
            ethernets:
              eth0:
                dhcp4: no
                dhcp6: no
                addresses:
                - [something]/32
                nameservers:
                  addresses:
                  - 8.8.8.8
                  - 8.8.4.4
                  search: []
                routes:
                - to: 0.0.0.0/0
                  via: 169.254.0.1
                  on-link: true
        devices:
          eth0:
            type: nic
            ipv4.address: [something]
            nictype: routed
            parent: enp1s0f0
            host_name: veth-alma
          root:
            type: disk
            path: /
            pool: default
            size: 20GB

I hope it will help someone!