I have an Alma 9.3 VM host. I set up and run the VMs on the host though the Cockpit Machines interface. The host is using ZFS to pool some large drives. I am sharing one dataset from the ZFS pool as a shared directory with an Alma 9.3 VM, again through the Cockpit Machines interface.
The VM that is sharing the pool randomly hangs from time to time, I believe because the shared directory stops responding for a while. To speed up recovery of the VM, I added a watchdog to the VM that reboots the VM when the file system becomes unresponsive. I can see the VM now reboots about 2-5 times a day most days. I haven’t lost any data, but I’d like to get the VM stable.
I don’t see anything in the host or VM logs about why this might be happening. I replaced and expanded the RAM on the host in case it was a memory error.
Any ideas about how to troubleshoot this? I am at a loss.