If you’re unfamiliar with LibreNMS, it’s an easy-to-use and fairly lightweight network monitoring system, that mainly relies on trusty old protocols such as SNMP and ICMP. I am especially fond of it’s simple but feature rich UI, with lots of great looking graphs and device health overviews.
My main use for Observium used to be monitoring my home lab networking equipment. But as of late, I’ve also been using it as a monitoring tool for Windows and Linux servers. I’ve also started using it at work, as we have had the need for a simple but agentless monitoring setup for virtualized servers. One key benefit of LibreNMS being that it could be thrown out any day at “zero cost”, should we make the switch to another system or somehow render it obsolete.
After getting all the servers up and running with graphing and some basic alert rules, I soon experienced that LibreNMS would trigger quite a few false positive alerts. It became apparent that it’s a system aimed more at monitoring networking equipment rather than servers (and services), but after browsing the web, I quickly found solutions to most issues.
I’m still working out a few more quirks, so this is currently a work in progress. But as of now, here are some of the key changes I have made to the main configuration file. Simply append any or all of these to the config file for them to take effect (typically /opt/librenms/config.php):
Sometimes LibreNMS would trigger “Device up/down” alerts when encountering high latency ping responses, which happen from time to time in a large server environment. These parameter changes seem to have resolved this issue:
$config['fping_options']['retries'] = 5; $config['fping_options']['timeout'] = 1000; $config['fping_options']['count'] = 4; $config['fping_options']['millisec'] = 1000;
The SNMP services of some Windows servers can be quite slow. In some cases this might cause SNMP checks to time out and report devices as down, and also make storage usage graphs show 0% usage.
Extending the SNMP timeout parameter solves this issue:
$config['snmp']['timeout'] = 10;
I add servers by their IP address. But this makes device listing quite hard to read unless every user knows every server IP address by heart. This option enables the use of sysName in the UI instead of the IP address:
$config['force_ip_to_sysname'] = true;
I’m mainly interested in monitoring CPU utilization, memory consumption and disk usage. These options hide unnecessary options from the header menu:
$config['int_customers'] = 0; # Hide Customer Port Parsing $config['int_transit'] = 0; # Hide Transit Types $config['int_peering'] = 0; # Hide Peering Types $config['int_core'] = 0; # Hide Core Port Types $config['int_l2tp'] = 0; # Hide L2TP Port Types
If you’ve got suggestions as to other tweaks which improve server OS monitoring in LibreNMS, feel free to leave a comment below.