Skip to main content
Version: 1.2.6

Pipeline Logstash


These Logstash config files are in /etc/logstash/conf.d/

Logstash Sequence#

The following steps are defined for logstash:


Reads flows from a rabbitmq queue. (".disabled" can be removed from other input configs to get flows from other sources.)


Drops flows to or from private IP addresses; adds @ingest_time (this is mainly for developers); converts any timestamps in milliseconds to seconds; drops events with timestamps more than a year in the past or (10 sec) in the future; does some data type conversions


Adds a unique id based on the 5-tuple of the flow (src and dst ips and ports, and protocol) plus the sensor name. This ends up being called


Stitches together flows from different nfcapd files into longer flows, matching them up by and using a specified inactivity_timeout to decide when to start a new flow.

Notes: By default, 5-minute nfcapd files are assumed, and if less than 10.5 min have passed between the start of the current flow and the start of the last matching one, stitch the two together.

Your logstash pipeline can have only 1 worker or aggregation is not going to work!


If the destination IP is in the multicast range, sets the destination Organization, Country, and Continent to "Multicast"; queries the MaxMind GeoLite2-City database by IP to get src and dst Countries, Continents, Latitudes, and Longitudes.


Normally, flows come in with source and destination ASNs. If there is no ASN in the input event; or the input ASN is 0, 4294967295, or 23456, or it is a private ASN, try getting an ASN by IP from the MaxMind ASN database. Sets ASN to -1 if it is unavailable for any reason.


Uses the ASN determined previously to get the organization name from the prepared CAIDA lookup file.


Search (optional) lookup tables by IP to obtain member or customer organization names and overwrite the Organization determined previously. This allows entities which don't own their own ASs to be listed as the src or dst Organization.

Notes: These lookup tables are not stored in github.


Uses a fake geoip database containing Science Registry information to tag the flows with source and destination science disciplines and roles, organizations and locations, etc; removes scireg fields we don't need to save to elasticsearch.

Notes: The science registry fake geoip database can be downloaded from via wget in a cron job.


Replaces the last octet of IPv4 addresses and the last 4 hextets of IPv6 addresses with x's in order to deidentify them.

Removes information about Australian organizations (or, with modification, any country that has privacy rules that require us not to identify organizations). If the ASN is one of those listed, completely replaces the IP with x's, sets the location to central Autralia, sets all organizations to "AARNet", removes all Projects.


Copies Science Registry organization and location values, if they exist, to the preferred_organization and preferred_location fields.


Sets additional quick and easy fields. Currently we have: sensor_group = TACC, AMPATH, etc. (based on matching sensor names to regexes) sensor_type = Circuit, Archive, Exchange Point, or Regional Network (based on matching sensor names to regexes) country_scope = Domestic, International, or Mixed (based on src and dst countries, where Domestic = US, Puerto Rico, or Guam) is_network_testing = yes, no (yes if discipline = CS.Network Testing and Monitoring or port = 5001, 5101, or 5201)


Does small misc. tasks at the end like rename, remove, or convert fields


Adds @exit_time and @processing_time (these are mainly for developers)


Sends results to a rabbitmq queue (".disabled" can be removed from other output configs to send flows to other places)

Final Stage#

In our case, OmniSOC manages the last stage. Their logstash instance reads flows from the netsage_archive_input queue and sends it into elasticsearch. The indices are named like (or om-ns-ilight-, etc).

This can be easily replicated with the following configuration though you'll need one for each feed/index.

Naturally the hosts for rabbit and elastic will need to be updated accordingly.

input {
rabbitmq {
host => 'localhost'
user => 'guest'
password => "${rabbitmq_pass}"
exchange => ''
key => XXXXXXX'
queue => 'netsage'
durable => true
subscription_retry_interval_seconds => 5
connection_timeout => 10000
filter {
if [@metadata][rabbitmq_properties][timestamp] {
date {
match => ["[@metadata][rabbitmq_properties][timestamp]", "UNIX"]
output {
elasticsearch {
hosts => [
user => "logstash"
password => "${logstash_elasticsearch_password}"
cacert => "/etc/logstash/ca.crt"
index => "om-ns-netsage"
template_overwrite => true
failure_type_logging_whitelist => []
action => index
#ssl_certificate_verification => false

Once the data is published in elastic, you can use the grafana dashboard to visualize the data.

Elasticsearch Fields#

ES fields#

_indexom-ns-netsage-2020.06.equivalent to an sql table
_type_docset by ES
_idHRkcm3IByJ9fEnbnCpaYdocument id, set by ES
_score1set by ES query
@version1set by ES

Developer fields#

typeflowAlways "flow" for us. Other types may be "macy", etc.
@injest_time2020-06-09T21:51:57.059ZEssentially time the flow went into the logstash pipeline (10-preliminaries.conf for tstat flows) or the time stitching of the flow commenced (40-aggregation.conf for others)
@timestampJun 9, 2020 @ 18:03:21.703The time the flow went into the logstash pipeline for tstat flows, or the time stitching finished and the event was pushed for other flows.
@exit_timeJun 9, 2020 @ 18:03:25.369The time the flow exited the pipeline (99-outputs.conf)
@processing_time688.31@exit_time minus @injest_time. Useful for seeing how long stitching took.
stitched_flows1Number of flows stitched together to make this final one. 0 for tstat flows, which are always complete. 1 if no flows were stitched together.

Flow fields#

startJun 9, 2020 @ 17:39:53.808Start time of the flow (first packet seen)
endJun 9, 2020 @ 17:39:57.699End time of the flow (last packet seen)
meta.protocoltcpProtocol used
meta.ida17c4f05420d7ded9eb151ccd293a633 ff226d1752b24e0f4139a87a8b26d779Assigned flow id
meta.flow_typesflowSflow, Netflow, or Tstat
meta.sensor_idsnvl2-pw-sw-1-mgmt-2.cenic.netSensor name (set in importer config, may not always be a hostname)
meta.sensor_groupCENICAssigned sensor group
meta.sensor_typeRegional NetworkAssigned sensor type
meta.country_scopeDomesticDomestic, International, or Mixed, depending on countries of src and dst

Source Fields (Destination Fields similarly)#

meta.src_ip171.64.68.xdeidentified IP address
meta.src_port80port used
meta.src_asn32ASN of the IP from geoip ASN database or the ASN from the flow header
meta.src_location.lat37.423latitude of IP from geoip database
meta.src_location.lon-122.164longitude of IP from geoip database
meta.src_country_nameUnited Statescountry of IP from geoip database
meta.src_continentNorth Americacontinent of IP from geoip database
meta.src_organizationStanford Universityorganization that owns the AS of the IP from geoip ASN database

Source Science Registry Fields (Destination Fields similarly)#

meta.scireg.src.resourceStanford - ImageNetResource name from SciReg
meta.scireg.src.resource_abbr-Resource abbreviation (if any)
meta.scireg.src.disciplineCS. Intelligent SystemsThe science discipline that uses the resource (ie host). Note that not the src MAY not have the same discipline as the dst.
meta.scireg.src.roleStorageRole that the host plays
meta.scireg.src.org_nameStanford UniversityThe organization the manages and/or uses the resource, as listed in the Science Registry
meta.scireg.src.org_abbrStanfordA shorter name for the organization. May not be the official abbreviation.
meta.scireg.src.projects.Can be an array of projects [we may change this field name soon]
meta.scireg.src.latitude37.4178Resource's location, as listed in the Science Registry

"Preferred" fields#

meta.src_preferred_orgStanford UniversityIf the IP was found in the Science Registry, these are the SciReg values.
meta.src_preferred_location.lat37.417800Otherwise, they are the geoip values.


values.num_bits939, 458, 560Sum of the number of bits in all the stitched flows
values.num_packets77, 824Sum of the number of packets in all the stitched flows
values.duration3.891Calculated as end minus start
values.bits_per_second241, 443, 988Calculated as num_bits divided by duration
values.packets_per_second20, 001Calculated as num_packets divided by duration

Tstat Values#
