NetSage Flow Processing Pipeline Installation Guide
This document covers installing the NetSage Flow Processing Pipeline on a new machine. Steps should be followed below in order unless you know for sure what you are doing. This document assumes a RedHat Linux environment or one of its derivatives.
#
Data sourcesThe Processing pipeline needs data to ingest in order to do anything. There are two types of data that can be consumed.
- sflow or netflow
- tstat
At least one of these must be set up on a sensor to provide the incoming flow data.
Sflow and netflow data should be sent to ports on the pipeline host where nfcapd and/or sfcapd are ready to receive it.
Tstat data should be sent directly to the logstash input RabbitMQ queue (the same one that the Importer writes to, if it is used). From there, the data will be processed the same as sflow/netflow data.
#
Installing the Prerequisites#
Installing nfdumpSflow and netflow data and the NetFlow Importer use nfdump tools. If you are only collecting tstat data, you do not need nfdump.
nfdump is not listed as a dependency of the Pipeline RPM package, as in a lot cases people are running special builds of nfdump -- but make sure you install it before you try running the Netflow Importer. If in doubt, yum install nfdump
should work. Flow data exported by some routers require a newer version of nfdump than the one in the CentOS repos; in these cases, it may be necessary to manually compile and install the lastest nfdump.
#
Installing RabbitMQThe pipeline requires a RabbitMQ server. Typically, this runs on the same server as the pipeline itself, but if need be, you can separate them (for this reason, the Rabbit server is not automatically installed with the pipeline package).
Typically, the default configuration will work. Perform any desired Rabbit configuration, then, start RabbitMQ:
#
Installing LogstashSee the logstash documentation. We are currently using Version 7.10.
#
Installing the EPEL repoSome of our dependencies come from the EPEL repo. To install this:
#
Installing the GlobalNOC Open Source repoThe Pipeline package (and its dependencies that are not in EPEL) are in the GlobalNOC Open Source Repo.
For Red Hat/CentOS 6, create /etc/yum.repos.d/grnoc6.repo
with the following content.
For Red Hat/CentOS 7, create /etc/yum.repos.d/grnoc7.repo
with the following content.
The first time you install packages from the repo, you will have to accept the GlobalNOC repo key.
#
Installing the Pipeline (Importer and Logstash configs)Install it like this:
Pipeline components:
- Flow Filter - GlobalNOC uses this for Cenic data to filter out some flows. Not needed otherwise.
- Netsage Netflow Importer - required to read nfcapd files from sflow and netflow importers. (If using tstat flow sensors only, this is not needed.)
- Logstash - be sure the number of logstash pipeline workers in /etc/logstash/logstash.yml is set to 1 or flow stitching/aggregation will not work right!
- Logstash configs - these are executed in alphabetical order. See the Logstash doc. At a minimum, the input, output, and aggregation configs have parameters that you will need to update or confirm.
Nothing will automatically start after installation as we need to move on to configuration.
#
Importer ConfigurationConfiguration files of interest are
- /etc/grnoc/netsage/deidentifier/netsage_shared.xml - Shared config file allowing configuration of collections, and Rabbit connection information
- /etc/grnoc/netsage/deidentifier/netsage_netflow_importer.xml - other settings
- /etc/grnoc/netsage/deidentifier/logging.conf - logging config
- /etc/grnoc/netsage/deidentifier/logging-debug.conf - logging config with debug enabled
#
Setting up the shared config file/etc/grnoc/netsage/deidentifier/netsage_shared.xml
There used to be many perl-based pipeline components and daemons. At this point, only the importer is left, the rest having been replaced by logstash. The shared config file, which was formerly used by all the perl components, is read before reading the individual importer config file.
The most important part of the shared configuration file is the definition of collections. Each sflow or netflow sensor will have its own collection stanza. Here is one such stanza, a netflow example. Instance and router-address can be left commented out.
Having multiple collections in one importer can sometimes cause issues for aggregation, as looping through the collections one at a time adds to the time between the flows, affecting timeouts. You can also set up multiple Importers with differently named shared and importer config files and separate init.d files.
There is also RabbitMQ connection information in the shared config, though queue names are set in the Importer config. (The Importer does not read from a rabbit queue, but other old components did, so both input and output are set.)
Ideally, flows should be deidentified before they leave the host on which the data is stored. If flows that have not be deidentified need to be pushed to another node for some reason, the Rabbit connection must be encrypted with SSL.
If you're running a default RabbitMQ config, which is open only to 'localhost' as guest/guest, you won't need to change anything here.
#
Setting up the Importer config file/etc/grnoc/netsage/deidentifier/netsage_netflow_importer.xml
This file has a few more setting specific to the Importer component which you may like to adjust.
- Rabbit_output has the name of the output queue. This should be the same as that of the logstash input queue.
- (The Importer does not actually use an input rabbit queue, so we add a "fake" one here.)
- Min-bytes is a threshold applied to flows aggregated within one nfcapd file. Flows smaller than this will be discarded.
- Min-file-age is used to be sure files are complete before being read.
- Cull-enable and cull-ttl can be used to have nfcapd files older than some number of days automatically deleted.
- Pid-file is where the pid file should be written. Be sure this matches what is used in the init.d file.
- Keep num-processes set to 1.
#
Logstash Setup NotesStandard logstash filter config files are provided with this package. Most should be used as-is, but the input and output configs may be modified for your use.
The aggregation filter also has settings that may be changed as well - check the two timeouts and the aggregation maps path.
When upgrading, these logstash configs will not be overwritten. Be sure any changes get copied into the production configs.
FOR FLOW STITCHING/AGGREGATION - IMPORTANT! Flow stitching (ie, aggregation) will NOT work properly with more than ONE logstash pipeline worker! Be sure to set "pipeline.workers: 1" in /etc/logstash/logstash.yml and/or /etc/logstash/pipelines.yml. When running logstash on the command line, use "-w 1".
#
Start LogstashIt will take couple minutes to start. Log files are normally /var/log/messages and /var/log/logstash/logstash-plain.log.
When logstash is stopped, any flows currently "in the aggregator" will be written out to /tmp/logstash-aggregation-maps (or the path/file set in 40-aggregation.conf). These will be read in and deleted when logstash is started again.
#
Start the ImporterTypically, the daemons are started and stopped via init script (CentOS 6) or systemd (CentOS 7). They can also be run manually. The daemons all support these flags:
--config [file]
- specify which config file to read
--sharedconfig [file]
- specify which shared config file to read
--logging [file]
- the logging config
--nofork
- run in foreground (do not daemonize)
The Importer will create a deamon process and a worker process. When stopping the service, the worker process might take a few minutes to quit. If it does not quit, kill it by hand.
#
Cron jobsSample cron files are provided. Please review and uncomment their contents. These periodically download MaxMind, CAIDA, and Science Registry files, and also restart logstash. Logstash needs to be restarted in order for any updated files to be read in.