It’s happened to multiple telecom giants, and in their PCRF implementation, pacemaker nodes failed with too many open files and
pacemaker logs get filled within a min and consume whole disk space, approx 90GB
The only error printed is “Could not accept client connection: Too many open files in the system.”
Note: It’s not an issue with the File limit, and the pacemaker spawns tons of processes and fills the logfile for less than a min.
Note: this issue is not with the pacemaker, but with Kernal Code (Please see the strace output)
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63192
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63192
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
As the system was in bad shape, we could only extract the top and bottom of the file, and it’s all the same log lines.
We have log rotate, but even that won’t get triggered as the Log file gets filled in less than a min.
We are facing issues with multiple releases, and Focus is on CenOS 8.1
System 1 : CentOS Linux release 8.1.1911 (Core)
uname Output: Linux dc211-Installer 4.18.0-193.14.2.el8_2.x86_64 #1 SMP Sun Jul 26 03:54:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
pacemaker-2.0.2-3.el8_1.2.rpm
pacemaker-libs-2.0.2-3.el8_1.2.rpm
pacemaker-cluster-libs-2.0.2-3.el8_1.2.rpm
pacemaker-schemas-2.0.2-3.el8_1.2.rpm
pacemaker-cli-2.0.2-3.el8_1.2.rpm
This is Just FYI; that same issue also happening with another Linux with pacemaker-2.1.0-8 as well.
System 2: AlmaLinux release 8.5 (Arctic Sphynx)
uname output: Linux dc222-Installer 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 08:57:35 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
pacemaker-libs-2.1.0-8.el8.rpm
pacemaker-2.1.0-8.el8.rpm
pacemaker-schemas-2.1.0-8.el8.rpm
pacemaker-cli-2.1.0-8.el8.rpm
pacemaker-cluster-libs-2.1.0-8.el8.rpm
In one case, we could log in to the system and get the snippets to log snip from the top and bottom of the pacemaker.log file.
Please let me know if anything is needed from my side to resolve the issue.
Thanks
Iqbal Singh Aulakh
What is the business impact? Please also provide timeframe information.
System Crash, Business loss, Nationwide outage
Where are you experiencing the behavior? What environment?
Multiple Customers reported this issue
When does the behavior occur? Frequency? Repeatedly? At certain times?
Sep 15 04:16