README.md
author Mahlon E. Smith <mahlon@martini.nu>
Tue, 26 Jun 2018 09:47:23 -0700
changeset 15 ed87882bb7f0
parent 14 717e89280a20
child 19 1f09cfb560e0
permissions -rw-r--r--
Lowercase all hostnames before sending to the database.


Netdata-TSRelay
===============

What's this?
------------

This program is designed to accept JSON streams from
[Netdata](https://my-netdata.io/) clients, and write metrics to a
PostgreSQL table - specifically, [Timescale](http://timescale.com)
backed tables (though that's not technically a requirement.)


Installation
------------

You'll need a working [Nim](http://nim-lang.org) build environment and
PostgreSQL development headers to compile the binary.

Simply run `make` to build it.  Put it wherever you please.


Configuration
-------------

There are a few assumptions that should be satisfied before running
this successfully.

### Database setup

You'll need to create the destination table.

```sql
CREATE TABLE netdata (
	time timestamptz default now() not null,
	host text not null,
	metrics jsonb default '{}'::jsonb not null
);
```

Index it based on how you intend to query the data, including JSON
functional indexing, etc.  See PostgreSQL documentation for details.

Strongly encouraged:  Promote this table to a Timescale "hypertable".
See [Timescale](http://timescale.com) docs for that, but a quick example
to partition automatically at weekly boundaries would look something
like this, if you're running v0.9.0 or better:

```sql
SELECT create_hypertable( 'netdata', 'time', migrate_data => true, chunk_time_interval => '1 week'::interval );
```

Timescale also has some great examples and advice for efficient [JSON
indexing](http://docs.timescale.com/v0.8/using-timescaledb/schema-management#json)
and queries.


### Netdata

You'll likely want to pare down what netdata is sending.  Here's an
example configuration for `netdata.conf` -- season this to taste (what
charts to send and frequency.)

```
[backend]
    hostname           = your-hostname
    enabled            = yes
    type               = json
    data source        = average
    destination        = machine-where-netdata-tsrelay-lives:14866
    prefix             = n
    update every       = 60
    buffer on failures = 5
    send charts matching = !cpu.cpu* !ipv6* !users* nfs.rpc net.* net_drops.* net_packets.* !system.interrupts* system.* disk.* disk_space.* disk_ops.* mem.*
```


Running the Relay
-----------------

### Options

  * [-q|--quiet]:    Quiet mode.  No output at all. Ignored if -d is supplied.
  * [-d|--debug]:    Debug mode.  Show incoming data.
  * [--dbopts]:      PostgreSQL connection information.  (See below for more details.)
  * [-h|--help]:     Display quick help text.
  * [--listen-addr]: A specific IP address to listen on.  Defaults to **INADDR_ANY**.
  * [--listen-port]: The port to listen for netdata JSON streams.
                     Default is **14866**.
  * [-T|--dbtable]:  Change the table name to insert to.  Defaults to **netdata**.
  * [-t|--timeout]:  Maximum time in milliseconds to wait for data.  Slow
                     connections may need to increase this from the default **500** ms.
  * [-v|--version]:  Show version.


**Notes**

Nim option parsing might be slightly different than what you're used to.
Flags that require arguments must include an '=' or ':' character.

  * --timeout=1000  *valid*
  * --timeout:1000  *valid*
  * -t:1000  *valid*
  * --timeout 1000  *invalid*
  * -t 1000  *invalid*

All database connection options are passed as a key/val string to the
*dbopts* flag.  The default is:

	"host=localhost dbname=netdata application_name=netdata-tsrelay"

... which uses the default PostgreSQL port, and connects as the running
user.

Reference the [PostgreSQL
Documentation](https://www.postgresql.org/docs/current/static/libpq-conn
ect.html#LIBPQ-PARAMKEYWORDS) for all available options (including how
to store passwords in a separate file, enable SSL mode, etc.)


### Daemonizing

Use a tool of your choice to run this at system
startup in the background.  My personal preference is
[daemontools](https://cr.yp.to/daemontools.html), but I won't judge you
if you use something else.

Here's an example using the simple
[daemon](https://www.freebsd.org/cgi/man.cgi?query=daemon&apropos=0&sektion=8&manpath=FreeBSD+11.0-RELEASE+and+Ports&arch=default&format=html) wrapper tool:

	# daemon \
		-o /var/log/netdata_tsrelay.log \
		-p /var/run/netdata_tsrelay.pid \
		-u nobody -cr \
		/usr/local/bin/netdata_tsrelay \
			--dbopts="dbname=metrics user=metrics host=db-master port=6432 application_name=netdata-tsrelay"

### Scaling

Though performant by default, if you're going to be storing a LOT of
data (or have a lot of netdata clients), here are some suggestions for
getting the most bang for your buck:

  * Use the [pgbouncer](https://pgbouncer.github.io/) connection
    pooler.
  * DNS round robin the hostname where **netdata_tsrelay** lives across
    *N* hosts -- you can horizontally scale without any gotchas.
  * Edit your **netdata.conf** file to only send the metrics you are
    interested in.
  * Decrease the frequency at which netdata sends its data. (When in
    "average" mode, it averages over that time automatically.)
  * Use [Timescale](http://timescale.com) hypertables.
  * Add database indexes specific to how you intend to consume the data.
  * Use the PostgreSQL
    [JSON Operators](https://www.postgresql.org/docs/current/static/functions-json.html#FUNCTIONS-JSONB-OP-TABLE),
	which take advantage of GIN indexing.
  * Put convenience SQL VIEWs around the data you're fetching later, for
    easier graph building with [Grafana](https://grafana.com/) (or whatever.)