Introduction

After being an Observium user for several years, I recently made the switch to LibreNMS, an open source fork of Observium. If you’re unfamiliar with LibreNMS, it’s an easy-to-use and fairly lightweight network monitoring system, that mainly relies on trusty old protocols such as SNMP and ICMP. I am especially fond of it’s simple but feature rich UI, with lots of great looking graphs and device health overviews.

My main use for Observium used to be monitoring my home lab networking equipment. But lately I’ve been dealing with it a lot at work as well, as we decided to use it for monitoring a large number of virtualized Windows and Linux servers. One key benefit of LibreNMS being that it could be thrown out any day at “zero cost”, should we make the switch to another system or somehow render it obsolete.

Instructions

After getting all the servers up and running with graphing and some basic alert rules, I soon experienced that LibreNMS would trigger quite a few false positive alerts. It became apparent that it’s a system aimed more at monitoring networking equipment rather than servers (and services). But after doing some research, I quickly found solutions to most issues. I’m still working out a few more quirks, so this is currently a work in progress.

But as of now, here are some of the key changes I have made to the main configuration file. Simply append any or all of these to the config file for them to take effect (typically /opt/librenms/config.php):  

Sometimes LibreNMS would trigger “Device up/down” alerts when encountering high latency ping responses, which happen from time to time in a large server environment. These parameter changes seem to have resolved this issue:

$config['fping_options']['retries'] = 5;
$config['fping_options']['timeout'] = 1000;
$config['fping_options']['count'] = 4;
$config['fping_options']['millisec'] = 1000;

The SNMP services of some Windows servers can be quite slow. In some cases this might cause SNMP checks to time out and report devices as down, and also make storage usage graphs show 0% usage. Extending the SNMP timeout parameter solves this issue:

$config['snmp']['timeout'] = 10;

I add servers by their IP address. But this makes device listing quite hard to read unless every user knows every server IP address by heart. This option enables the use of sysName in the UI instead of the IP address:

$config['force_ip_to_sysname'] = true;

I’m mainly interested in monitoring CPU utilization, memory consumption and disk usage. These options hide unnecessary options from the header menu:

$config['int_customers'] = 0; # Hide Customer Port Parsing 
$config['int_transit'] = 0; # Hide Transit Types 
$config['int_peering'] = 0; # Hide Peering Types 
$config['int_core'] = 0; # Hide Core Port Types 
$config['int_l2tp'] = 0; # Hide L2TP Port Types

If you’ve got suggestions as to other tweaks which improve server OS monitoring in LibreNMS, feel free to leave a comment below.


Comments:

Please note: These comments are exports from an older platform, and are no longer active.

anthony - Jul 4, 2017

This was a great collection of ‘quick fixes’ for small gripes when using this to monitor mostly servers. Thanks for aggregating this all into one place.

Winny - May 6, 2017

Great post, thanks for sharing. I’m looking forward to more LibreNMS articles.

sindre - May 3, 2017

This is a great tip. I have made some changes to the alert templates as well, I might include this in another post.

Krumm - May 2, 2017

$config[‘force_ip_to_sysname’] = true; if you are going to force sysnames you may want to also setup Alert Template to include sysnames %sysName\r\n that way your Alerts will have the sysname and not the IP address.

cgunzelman - Aug 4, 2018

$config[‘snmp’][‘timeout’] = 10 caused my LibreNMS to stop. Maybe that directive was changed. Might want to fix or remove that one.

sindre - Sep 1, 2018

Did you include a semicolon at the end? I still have the parameter set in my running setup.