Version: 1.2.7

Docker Default Installation Guide

In this deployment guide, you will learn how to deploy docker-based sflow and netflow collectors (see https://github.com/netsage-project/docker-nfdump-collector) and a basic docker flow processing pipeline. The collectors will save incoming flow data to disk, while the pipeline Importer will read it and pipeline Lostash filters will process it. Without any modification, 1 sflow collector, 1 netflow collector, and a flow processing pipeline will run. If you have only 1 collector, this guide will show you how to disable the unnecessary one. If you need 2 or more collectors of the same type, please read "Docker Advanced" after reading through this guide.

First#

If you haven't already, install Docker/compose and clone this project from github (https://github.com/netsage-project/netsage-pipeline.git).

Docker-compose.override.yml#

The pattern for running the Pipeline, with docker-based collectors, is defined in the docker-compose.override_example.yml. Copy this to docker-compose.override.yml.

cp docker-compose.override_example.yml docker-compose.override.yml

By default this will bring up a single netflow collector and a single sflow collector. For most people this is more than enough. If you're sticking to the default you don't need to make any changes to the docker-compose.override_example.yml

note

You may need to remove all the comments in the override file as they may conflict with the parsing done by docker-compose

note

If you are only interested in netflow OR sflow data, you should remove the section for the collector that is not used.

This file also specifies port numbers and directories for nfcapd files. By default, the sflow collector will listen to udp traffic on localhost:9998, while the netflow collector will listen on port 9999, and data will be written to /data/input_data/ . Each collector is namespaced by its type so the sflow collector will write data to /data/input_data/sflow/ and the netflow collector will write data to /data/input_data/netflow/.

Environment File#

Please copy env.example to .env

cp env.example .env

then edit the .env file to set the sensor names

sflowSensorName="my sflow sensor name"
netflowSensorName="my netflow sensor name"

Simply change the names to unique identifiers and you're good to go. (Use quotes if the names have spaces.)

note

These names uniquely identify the source of the data. In elasticsearch, they are saved in the meta.sensor_id field and can be used in visualizations. Choose names that are meaningful and unique. For example, your sensor names might be "RNDNet Sflow" and "RNDNet Netflow" or "rtr.one.rndnet.edu" and "rtr.two.nrdnet.edu". Whatever makes sense in your situation.

If you don't set a sensor name, the default docker hostname, which changes each time you run the pipeline, will be used.
If you have only one collector, comment out the line for the one you are not using.
If you have more than one of the same type of collector, see the "Docker Advanced" documentation.
If you're not using a netflow or an sflow collector (you are getting only tstat data), then simply disregard the env settings and don't start up either collector.

Other settings of note in this file includes the following. You will not necessarily need to change these, but be aware.

rabbit_output_host: this defines where the final data will land after going through the pipeline. Use the default rabbit for the local rabbitMQ server, running in its docker container. Enter a hostname to send to a remote rabbitMQ server (also the correct username, password, and queue key/name).

The Logstash Aggregation Filter settings are exposed in case you wish to use different values. (See comments in the *-aggregation.conf file.) This config stitches together long-lasting flows that are seen in multiple nfcapd files, matching by the 5-tuple (source and destination IPs, ports, and protocol) plus sensor name.

Aggregation_maps_path: the name of the file to which logstash will write in-progress aggregation data when logstash shuts down. When logstash starts up again, it will read this file in and resume aggregating. The filename is configurable for complex situations, but /data/ is required.

Inactivity_timeout: If more than inactivity_timeout seconds have passed between the 'start' of a flow and the 'start' of the LAST matching flow, OR if no matching flow has coming in for inactivity_timeout seconds on the clock, assume the flow has ended.

note

Nfcapd files are typically written every 5 minutes. Netsage uses an inactivity_timeout = 630 sec = 10.5 min for 5-min files; 960 sec = 16 min for 15-min files. (For 5-min files, this allows one 5 min gap or period during which the no. of bits transferred don't meet the cutoff)

max_flow_timeout: If a long-lasting flow is still aggregating when this timeout is reached, arbitrarily cut it off and start a new flow. The default is 24 hours.

Pipeline Version#

Once you've created the docker-compose.override.xml and finished adjusting it for any customizations, you're ready to select your version.

git fetch
git checkout "tag name"
./scripts/docker_select_version.sh

Replace "tag name" with the version you intend to use, e.g., "v1.2.5". Select the same version when prompted by docker_select_version.sh.

Running the Collectors#

After selecting the version to run, you can start the two flow collectors by running the following line. If you only need one of the collectors, remove the other from this command.

docker-compose up -d sflow-collector netflow-collector

Running the Pipeline#

Start up the pipeline (importers and logstash) using:

docker-compose up -d

You can check the logs for each of the container by running

docker-compose logs

Shut down the pipeline using:

docker-compose down

Data sources#

The data processing pipeline needs data to ingest in order to do anything, of course. There are three types of data that can be consumed.

sflow
netflow
tstat

At least one of these must be set up on a sensor to provide the incoming flow data.

Sflow and netflow data should be exported to the pipeline host where nfcapd and/or sfcapd collectors are ready to receive it.

Tstat data should be sent directly to the logstash input RabbitMQ queue (the same one that the Importer writes to, if it is used). From there, the data will be processed the same as sflow/netflow data. (See the Docker Advanced guide.)

Upgrading#

Update Source Code

To do a Pipeline upgrade, just reset and pull changes, including the new release, from github. Your non-example env and override files will not be overwritten, but check the new example files to see if there are any updates to copy in.

git reset --hard
git pull origin master

Docker and Collectors

Since the collectors live outside of version control, please check the docker-compose.override_example.yml to see if nfdump needs to be updated (eg, image: netsage/nfdump-collector:1.6.18). Also check the docker version (eg, version: "3.7") to see if you'll need to ugrade docker.

Select Release Version

Run these two commands to select the new release you want to run. In the first, replace "tag_value" by the version to run (eg, v1.2.8). When asked by the second, select the same version as the tag you checked out.

git checkout "tag_value" 
./scripts/docker_select_version.sh

Update docker containers

This applies for both development and release

docker-compose pull