Docker Installation Guide
In this deployment guide, you will learn how to deploy a basic Netsage setup that includes one sflow and/or one netflow collector. If you have more than one collector of either type, or other special situations, see the Docker Advanced guide.
The Docker containers included in the installation are
- rabbit (the local RabbitMQ server)
- sflow-collector (receives sflow data and writes nfcapd files)
- netflow-collector (receives netflow data and writes nfcapd files)
- importer (reads nfcapd files and puts flows into a local rabbit queue)
- logstash (logstash pipeline that processes flows and sends them to their final destination, by default a local rabbit queue)
- ofelia (cron-like downloading of files used by the logstash pipeline)
The code and configs for the importer and logstash pipeline can be viewed in the netsage-project/netsage-pipeline github repo. See netsage-project/docker-nfdump-collector for code related to the collectors.
#
1. Set up Data SourcesThe data processing pipeline needs data to ingest in order to do anything, of course. There are three types of data that can be consumed.
- sflow
- netflow
- tstat
At least one of these must be set up on a sensor (flow exporter/router), to provide the incoming flow data. You can do this step later, but it will helpful to have it working first.
Sflow and netflow data should be exported to the pipeline host where there are collectors (nfcapd and/or sfcapd processes) ready to receive it (see below). To use the default settings, send sflow to port 9998 and netflow to port 9999. On the pipeline host, allow incoming traffic from the flow exporters, of course.
Tstat data should be sent directly to the logstash input rabbit queue "netsage_deidentifier_raw" on the pipeline host. No collector is needed for tstat data. See the netsage-project/tstat-transport repo. (From there, logstash will grab the data and process it the same way as it processes sflow/netflow data. (See the Docker Advanced guide.)
#
2. Clone the Netsage Pipeline ProjectIf you haven't already, install Docker and Docker Compose and clone this project
(If you are upgrading to a new release, see the Upgrade section below!)
Then checkout the right version of the code.
Replace "{tag}" with the release version you intend to use, e.g., "v1.2.8". ("Master" is the development version and is not intended for general use!)
git status
will confirm which branch you are on, e.g., master or v1.2.8.
#
3. Create Docker-compose.override.ymlInformation in the docker-compose.yml
file tells docker which containers (processes) to run and sets various parameters for them.
Settings in the docker-compose.override.yml
file will overrule and add to those. Note that docker-compose.yml should not be edited since upgrades will replace it. Put all customizations in the override file, since override files will not be overwritten.
Collector settings may need to be edited by the user, so the information that docker uses to run the collectors is specified (only) in the override file. Therefore, docker-compose_override.example.yml must always be copied to docker-compose_override.yml.
By default docker will bring up a single netflow collector and a single sflow collector. If this matches your case, you don't need to make any changes to the docker-compose.override_example.yml. If you have only one collector, remove or comment out the section for the one not needed so the collector doesn't run and simply create empty nfcapd files.
note
If you only have one collector, you should remove or comment out the section for the collector that is not used, so it doesn't run and just create empty files.
This file also specifies port numbers, and directories for nfcapd files. By default, the sflow collector will listen to udp traffic on localhost:9998, while the netflow collector will listen on port 9999, and data will be written to /data/input_data/
. Each collector is namespaced by its type so the sflow collector will write data to /data/input_data/sflow/
and the netflow collector will write data to /data/input_data/netflow/
. Change these only if required.
Other lines in this file you can ignore for now.
note
If you run into issues, try removing all the comments in the override file as they may conflict with the parsing done by docker-compose
#
4. Create Environment FilePlease copy env.example
to .env
then edit the .env file to set the sensor names
Simply change the names to unique identifiers (with spaces or not, no quotes) and you're good to go.
note
These names uniquely identify the source of the data and will be shown in the Grafana dashboards. In elasticsearch, they are saved in the meta.sensor_id
field. Choose names that are meaningful and unique.
For example, your sensor names might be "RNDNet New York Sflow" and "RNDNet Boston Netflow" or "RNDNet New York - London 1" and "RNDNet New York - London 2". Whatever makes sense in your situation.
- If you don't set a sensor name, the default docker hostname, which changes each time you run the pipeline, will be used.
- If you have only one collector, remove or comment out the line for the one you are not using.
- If you have more than one of the same type of collector, see the "Docker Advanced" documentation.
Other settings of note in this file include the following. You will not necessarily need to change these, but be aware.
rabbit_output_host: this defines where the final data will land after going through the pipeline. By default, the last rabbit queue will be on rabbit
, ie, the local rabbitMQ server running in its docker container. Enter a hostname to send to a remote rabbitMQ server (also the correct username, password, and queue key/name). (For NetSage, another logstash pipeline on a remote server moves flows from this final rabbit queue into Elasticsearch.)
The following Logstash Aggregation Filter settings are exposed in case you wish to use different values. (See comments in the *-aggregation.conf file.) The aggregation filter stitches together long-lasting flows that are seen in multiple nfcapd files, matching by the 5-tuple (source and destination IPs, ports, and protocol) plus sensor name.
Aggregation_maps_path: the name of the file to which logstash will write in-progress aggregation data when logstash shuts down. When logstash starts up again, it will read this file in and resume aggregating. The filename is configurable for complex situations, but /data/ is required.
Inactivity_timeout: If more than inactivity_timeout seconds have passed between the 'start' of a flow and the 'start' of the LAST matching flow, OR if no matching flow has coming in for inactivity_timeout seconds on the clock, assume the flow has ended.
note
Nfcapd files are typically written every 5 minutes. Netsage uses an inactivity_timeout = 630 sec = 10.5 min for 5-min files; 960 sec = 16 min for 15-min files. (For 5-min files, this allows one 5 min gap or period during which the no. of bits transferred don't meet the cutoff)
max_flow_timeout: If a long-lasting flow is still aggregating when this timeout is reached, arbitrarily cut it off and start a new flow. The default is 24 hours.
#
5. Choose Pipeline VersionOnce you've created the docker-compose.override.xml file and finished adjusting it for any customizations, you're ready to select which version Docker should run.
When prompted, select the same version you checked out earlier. This script will replace the version numbers of docker images in the docker-compose files with the correct values.
#
Running the CollectorsAfter selecting the version to run, you could start the two flow collectors by themselves by running the following line. If you only need one of the collectors, remove the other from this command.
(Or see the next section for how to start all the containers, including the collectors.)
If the collector(s) are running properly, you should see nfcapd files in subdirectories of data/input_data/, and they should have sizes of more than a few hundred bytes. (See Troubleshooting if you have problems.)
#
Running the Collectors and PipelineStart up the pipeline (all containers) using:
This will also restart any containers/processes that have died. "-d" runs containers in the background.
You can see the status of the containers and whether any have died (exited) using
To check the logs for each of the containers, run
Add -f
or, e.g., -f logstash
to see new log messages as they arrive. --timestamps
, --tail
, and --since
are also useful -- look up details in Docker documentation.
To shut down the pipeline (all containers) use