README.md
changeset 8 e0b7c95a154f
parent 6 71578fe8e9ec
child 9 10bf910a379c
--- a/README.md	Wed Aug 30 13:55:02 2017 -0700
+++ b/README.md	Wed Apr 04 11:00:35 2018 -0700
@@ -1,4 +1,6 @@
-# Arborist-SNMP
+
+Arborist-SNMP
+=============
 
 home
 : http://bitbucket.org/mahlon/Arborist-SNMP
@@ -7,79 +9,234 @@
 : http://code.martini.nu/Arborist-SNMP
 
 
-## Description
+Description
+-----------
 
 Arborist is a monitoring toolkit that follows the UNIX philosophy
 of small parts and loose coupling for stability, reliability, and
 customizability.
 
-This adds SNMP support to Arborist's monitoring, for things such as:
+This adds various SNMP support to Arborist's monitoring, specifically
+for OIDS involving:
 
  - Disk space capacity
- - System load
- - Free memory
- - Swap in use
+ - System CPU utilization
+ - Memory and swap usage
  - Running process checks
 
-
-## Prerequisites
-
-* Ruby 2.2 or better
+It tries to provide sane defaults, while allowing fine grained settings
+per resource node.  Both Windows and UCD-SNMP systems are supported.
 
 
-## Installation
+Prerequisites
+-------------
+
+  * Ruby 2.3 or better
+  * Net-SNMP libraries
+
+
+Installation
+------------
 
     $ gem install arborist-snmp
 
 
-## Usage
+Configuration
+-------------
+
+Global configuration overrides can be added to the Arborist config file,
+under the `snmp` key.
+
+The defaults are as follows:
 
-In this example, we've created a resource node under an existing host, like so:
+	arborist:
+	  snmp:
+		timeout: 2
+		retries: 1
+		community: public
+		version: 2c
+		port: 161
+		batchsize: 25
+		cpu:
+		  warn_at: 80
+		disk:
+		  warn_at: 90
+		  include: ~
+		  exclude:
+		  - "^/dev(/.+)?$"
+		  - "^/net(/.+)?$"
+		  - "^/proc$"
+		  - "^/run$"
+		  - "^/sys/"
+		memory:
+		  physical_warn_at: ~
+		  swap_warn_at: 60
+		processes:
+		  check: []
+
+
+The `warn_at` keys imply usage capacity as a percentage. ie:  "Warn me
+when a disk mount point is at 90 percent utilization."
+
+
+### Library Options
+
+  * **timeout**: How long to wait for an SNMP response, in seconds?
+  * **retries**: If an error occurs during SNMP communication, try again this many times before giving up.
+  * **community**: The SNMP community name for reading data.
+  * **version**: The SNMP protocol version.  1 and 2c are supported.
+  * **port**: The UDP port SNMP is listening on.
+  * **batchsize**: How many hosts to gather SNMP data on simultaneously.
+
+
+### Category Options and Behavior
+
+#### CPU
+
+  * **warn_at**: Set the node to a `warning` state when utilization is at or over this percentage.
+
+Utilization takes into account CPU core counts, and uses the 5 minute
+load average to calculate a percentage of current CPU use.
+
+2 properties are set on the node. `cpu` contains the detected CPU count
+and current utilization. `load` contains the 1, 5, and 15 minute load
+averages of the machine.
 
-	Arborist::Host( 'example' ) do
-		description "Example host"
-		address     '10.6.0.169'
-		resource 'load', description: 'machine load'
-		resource 'disk' do
-			include: [ '/', '/mnt' ]
-		end
+
+#### Disk
+
+  * **warn_at**: Set the node to a `warning` state when disk capacity is at or over this amount.
+                 You can also set this to a Hash, keyed on mount name, if you want differing
+                 warning values per mount point.  A mount point that is at 100% capacity will
+                 be explicity set to `down`, as the resource it represents has been exhausted.
+  * **include**: String or Array of Strings.  If present, only matching mount points are
+                 considered while performing checks.  These are treated as regular expressions.
+  * **exclude**: String or Array of Strings.  If present, matching mount point are removed
+                 from evaluation.  These are treated as regular expressions.
+
+
+#### Memory
+
+  * **physical_warn_at**: Set the node to a `warning` state when RAM utilization is at or over this percentage.
+  * **swap_warn_at**: Set the node to a `warning` state when swap utilization is at or over this percentage.
+
+Warnings are only set for swap my default, since that is usually a
+better indication of an impending problem.
+
+
+#### Processes
+
+  * **check**: String or Array of Strings.  A list of processes to check if running.  These are
+               treated as regular expressions, and include process arguments.
+
+If any process in the list is not found in the process table, the
+resource is set to a `down` state.
+
+
+Examples
+--------
+
+In the simplest form, using default behaviors and settings, here's an
+example Monitor configuration:
+
+	require 'arborist/snmp'
+
+	Arborist::Monitor 'cpu load check', :cpu do
+		every 1.minute
+		match type: 'resource', category: 'cpu'
+		exec( Arborist::Monitor::SNMP::CPU )
+	end
+
+	Arborist::Monitor 'partition capacity', :disk do
+		every 1.minute
+		match type: 'resource', category: 'disk'
+		exec( Arborist::Monitor::SNMP::Disk )
+	end
+
+	Arborist::Monitor 'process checks', :proc do
+		every 1.minute
+		match type: 'resource', category: 'process'
+		exec( Arborist::Monitor::SNMP::Process )
+	end
+
+	Arborist::Monitor 'memory', :memory do
+		every 1.minute
+		match type: 'resource', category: 'memory'
+		exec( Arborist::Monitor::SNMP::Memory )
 	end
 
 
-From a monitor file, require this library, and create an snmp instance.
-You can reuse a single instance, or create individual ones per monitor.
+Additionally, if you'd like these SNMP monitors to rely on the SNMP
+service itself, you can add a UDP check for that.
 
-	require 'arborist/monitor/snmp'
-
-	Arborist::Monitor '5 minute load average check' do
+	Arborist::Monitor 'udp service checks', :udp do
 		every 30.seconds
-		match type: 'resource', category: 'load'
-		include_down true
-		use :addresses
-
-		snmp = Arborist::Monitor::SNMP::Load( error_at: 10 )
-		exec( snmp )
+		match type: 'service', protocol: 'udp'
+		exec( Arborist::Monitor::Socket::UDP )
 	end
 
-	Arborist::Monitor 'mount capacity check' do
-		every 30.seconds
-		match type: 'resource', category: 'disk'
-		include_down true
-		use :addresses, :config
+
+And a default node declaration:
 
-		exec( Arborist::Monitor::SNMP::Disk )
+	Arborist::Host 'example' do
+		description 'An example host'
+		address 'demo.example.com'
+
+		resource 'cpu'
+		resource 'memory'
+		resource 'disk'
 	end
 
 
-Please see the rdoc for all the mode types and error_at options.  Per
-node "config" vars override global defaults when instantiating the
-monitor.
+
+All configuration can be overridden from the defaults using the `config`
+pragma, per node.  Here's a more elaborate example that performs the following:
+
+  * All SNMP monitored resources are quieted if the SNMP service itself is unavailable.
+  * Only monitor specific disk partitions, warning at different capacities .
+  * Ensure the 'important' processing is running with the '--production' flag.
+  * Warns at 95% memory utilization OR 10% swap.
+
+
+	Arborist::Host 'example' do
+		description 'An example host'
+		address 'demo.example.com'
+
+		service 'snmp', protocol: 'udp'
+
+		resource 'cpu', description: 'machine cpu load' do
+			depends_on 'example-snmp'
+		end
+
+		resource 'memory', description: 'machine ram and swap' do
+			depends_on 'example-snmp'
+			config physical_warn_at: 95, swap_warn_at: 10
+		end
+
+		resource 'disk', description: 'partition capacity' do
+			depends_on 'example-snmp'
+			config \
+				include: [
+					'^/tmp',
+					'^/var'
+				],
+				warn_at: {
+						'/tmp' => 50,
+						'/var' => 80
+				}
+		end
+
+		resource 'process' do
+			depends_on 'example-snmp'
+			config check: 'important --production'
+		end
+	end
 
 
 
 ## License
 
-Copyright (c) 2016, Michael Granger and Mahlon E. Smith
+Copyright (c) 2016-2018 Michael Granger and Mahlon E. Smith
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without