Installing RRFW in big enterprise or carrier networks requires special planning and design measures, in order to ensure its reliable and efficient function.
Hardware planning for large RRFW installations is of big importance. It is vital to understand the potential bottlenecks and performance limits before purchasing the hardware.
First of all, you need to estimate the number of devices that you are
going to monitor, with some room for future growth. It is a good practice
first to model the situation on a test server, and then project the
results to a bigger number of network devices. The utilities that
would help you in assessing the requirements are configinfo
and
schedulerinfo
.
The resources for planning are the server CPU, RAM, and disks. While CPU and RAM are of great importance, it is the disk subsystem that often becomes the bottleneck.
For large installations, CPU power is one of the critical resources.
One of CPU-intensive processes is XML configuration compiler. A configuration for few hundred of nodes may take few dozens of minutes to compile. In some complicated configuration, it may require few hours to recompile the whole datasource tree. Here CPU power means literally your time while testing the configuration changes or troubleshooting a problem.
The SNMP collector is quite moderate in CPU usage, still when the number of SNMP variables reaches dozens of thousands, the CPU power becomes an important resource to pay attention to. In addition, the collector process initialization time can be quite CPU-intensive. This happens every time the collector process starts, or when the configuration has been recompiled.
The empiric estimation made by Christian Schnidrig is that one SNMP counter collection every 5 minutes occupies approximately 1.0e-5 of the Intel Xeon 2.8GHz time, including the OS overhead. For example, the RRFW collectors running on 60'000 counters would make the server busy at the average of 60%.
The collector would need RAM space to store all the counters information, and of course it's undesirable to swap. In addition, the more RAM you have available for disk cache, the faster your collector may update the data files.
Each update of an RRD file consists of a number of operations: open a file, read the header, seek to the needed offset, and then write. With enough disk cache, it is possible that the read operations are made solely from RAM, and that significantly speeds up the collector running cycle.
According to Christian Schnidrig's empiric estimations, 30 KB RAM per counter should be enough to hold all the neccessary data, including the disk cache. For example, for 60'000 counters this gives 1'757 MB, thus 2 GB of server RAM should be enough.
In addition, Apache with mod_perl occupies 20-30 MB RAM per process, so few hundred extra megabytes of RAM would be good to have.
It is not recommended to use IDE disks. They are not designed for continuous and intensive use. As experienced by Christian Schnidrig, IDE disks don't live long under such load.
It is recommended to reduce the number of RRD files by grouping the datasources. This reduces dramatically the number of read and write operations during the update process.
As noted by Rodrigo Cunha, reducing the size of read-ahead in the filesystem may lead to significant optimisation of disk cache usage. RRD update process reads only a short header in the beginnin of RRD file, and the rest of readahead data is never reused. On Linux, the following command would set the readahead size to 4 KB, which equals to i386 page size:
/sbin/hdparm -a 4 /dev/sda
For servers with dozens of thousands RRD files, it is recommended to use hashed data directories. Then the data directories will form a structure of 256 directories, with hash function based on hostnames. See RRFW SNMP Discovery User Guide for more details.
Spreading the data files over several physical disks is also a good plus.
Depending on the number of trees and processes that run on a single server, you might require to increase the maximum number of filehandles that may be opened at the same time, system-wide and per process. See the manuals for your operating system for more details.
When using lots of collectors and/or lots of HTTP processes, it is important to increase the size of BerkeleyDB lock region. The command
db_stat -h var/db -c
would show you the current number of locks and lockers, and their maximum quantities during the database history. These parameters can be tuned by creating the file DB_CONFIG in the database home directory, which is usually resided in RRFW_PREFIX/var/db/. The following settings would work fine with about 20 collector processes and 5 HTTP daemon processes:
set_lk_max_lockers 6000 set_lk_max_locks 3000
After updating DB_CONFIG, stop all RRFW processes, including Apache server, then run
db_recover -h var/db
Then start the processes again. Futher info is available at:
http://www.sleepycat.com/docs/ref/env/db_config.html http://www.sleepycat.com/docs/ref/lock/max.html http://www.sleepycat.com/docs/api_c/env_set_lk_max_lockers.html
For large datasource trees, XML compilation may take dozens of minutes, if not hours. Other processes are not suspended during the compilation, and they use the previous configuration version.
For debugging and testing, it is recommended to create a new tree, separate from large production trees. That would save you a lot of time and would allow you to see the result of changes quickly.
The RRFW collector has a very flexible scheduling mechanism. Each data source has its own pair of scheduler parameters. These parameters are period and timeoffset. Period is usually set to default 300 seconds. The time is divided into even intervals. For the default 5-minutes period, each hour's intervals would start at 00, 05, 10, 15, etc. minutes. The timeoffset determines the moment within each interval when the data source should be collected. The default value for timeoffset is 10 seconds. This means that the collector process would try to collect the values at 00:00:10, 00:05:10, ..., 23:55:10 every day.
Data sources with the same period and timeoffset values are grouped together. The SNMP collector works asynchronously, and it tries to send as many SNMP packets at the same time as possible. Due to the asynchronous architecture, the collector is able to perform thousands of queries at the same time with very small delay. Within the same collector process, a large number of datasources configured with the same schedule is usually not a problem.
If you configured several datasource trees all with the same period and
timeoffset values, each collector process would start flooding the SNMP
packets to the network at the same time. This may lead to packet loss and
collector timeouts. In addition, all collector processes would try to update
the RRD files concurrently, and this would cause overall performance
degradation. Therefore, it is better to assign different timeoffset values
to different trees. This may be achieved by manually specifying the
collector-timeoffset
parameter in discovery configuration files.
In large installations, the collector schedules need thorough planning and
tuning to insure maximum performance and minimize load on the network devices'
CPUs. The schedulerinfo
utility is designed to help you in this planning.
It shows two types of reports: configuration report gives you the idea
of how many datasources are queried at which moments in time. The runtime
report gives you realtime statistics of collector schedules, including
average and maximum running cycle, and statistics on missed or delayed cycles.
There is a feature that eases the load in large installations. With dispersed timeoffsets enabled, the timeoffset for each datasource is evenly assigned to one of allowed values, based on the name of the host, and name of the interface. By default, these values are: 2, 40, 80, ..., 200. With thousands of datasources, this feature smoothens the CPU and disk load on RRFW server, and avoids CPU usage peaks on network devices with big number of SNMP variables per device. It is recommended to analyse the current scheduler statistics before using this feature. If you run several large datasource trees, don't forget to plan and analyse the schedules for the whole system, not just for one tree.
The following setup allows you to distribute the load among several physical servers.
Several RRFW (backend) servers which run collectors and store RRD files in the local storage, shared by NFS. The frontend server runs the Web interface, and probably some monitor processes, accessing the data files by NFS.
It is possible to organize the directory structure so that each data file would be seen at the same path on every server. Then you can keep identical RRFW configurations on all servers, and launch the collector process only on one of them. XML configuration files may be shared via NFS too.
Be aware that BerkeleyDB database home directory cannot be NFS-mounted. See the following link for more details: http://www.sleepycat.com/docs/ref/env/remote.html
Backend servers may run near the limits of their system capacities. 70-80% CPU usage should not be a problem. For the frontend machine, it is preferred that at least 50% of average CPU time is idle.
Copyright (c) 2004 Stanislav Sinyagin <ssinyagin@yahoo.com>
Copyright (c) 2004 Christian Schnidrig <christian.schnidrig@bluewin.ch>