README.md
changeset 8 e0b7c95a154f
parent 6 71578fe8e9ec
child 9 10bf910a379c
equal deleted inserted replaced
7:4548e58c8c66 8:e0b7c95a154f
     1 # Arborist-SNMP
     1 
       
     2 Arborist-SNMP
       
     3 =============
     2 
     4 
     3 home
     5 home
     4 : http://bitbucket.org/mahlon/Arborist-SNMP
     6 : http://bitbucket.org/mahlon/Arborist-SNMP
     5 
     7 
     6 code
     8 code
     7 : http://code.martini.nu/Arborist-SNMP
     9 : http://code.martini.nu/Arborist-SNMP
     8 
    10 
     9 
    11 
    10 ## Description
    12 Description
       
    13 -----------
    11 
    14 
    12 Arborist is a monitoring toolkit that follows the UNIX philosophy
    15 Arborist is a monitoring toolkit that follows the UNIX philosophy
    13 of small parts and loose coupling for stability, reliability, and
    16 of small parts and loose coupling for stability, reliability, and
    14 customizability.
    17 customizability.
    15 
    18 
    16 This adds SNMP support to Arborist's monitoring, for things such as:
    19 This adds various SNMP support to Arborist's monitoring, specifically
       
    20 for OIDS involving:
    17 
    21 
    18  - Disk space capacity
    22  - Disk space capacity
    19  - System load
    23  - System CPU utilization
    20  - Free memory
    24  - Memory and swap usage
    21  - Swap in use
       
    22  - Running process checks
    25  - Running process checks
    23 
    26 
    24 
    27 It tries to provide sane defaults, while allowing fine grained settings
    25 ## Prerequisites
    28 per resource node.  Both Windows and UCD-SNMP systems are supported.
    26 
    29 
    27 * Ruby 2.2 or better
    30 
    28 
    31 Prerequisites
    29 
    32 -------------
    30 ## Installation
    33 
       
    34   * Ruby 2.3 or better
       
    35   * Net-SNMP libraries
       
    36 
       
    37 
       
    38 Installation
       
    39 ------------
    31 
    40 
    32     $ gem install arborist-snmp
    41     $ gem install arborist-snmp
    33 
    42 
    34 
    43 
    35 ## Usage
    44 Configuration
    36 
    45 -------------
    37 In this example, we've created a resource node under an existing host, like so:
    46 
    38 
    47 Global configuration overrides can be added to the Arborist config file,
    39 	Arborist::Host( 'example' ) do
    48 under the `snmp` key.
    40 		description "Example host"
    49 
    41 		address     '10.6.0.169'
    50 The defaults are as follows:
    42 		resource 'load', description: 'machine load'
    51 
    43 		resource 'disk' do
    52 	arborist:
    44 			include: [ '/', '/mnt' ]
    53 	  snmp:
    45 		end
    54 		timeout: 2
    46 	end
    55 		retries: 1
    47 
    56 		community: public
    48 
    57 		version: 2c
    49 From a monitor file, require this library, and create an snmp instance.
    58 		port: 161
    50 You can reuse a single instance, or create individual ones per monitor.
    59 		batchsize: 25
    51 
    60 		cpu:
    52 	require 'arborist/monitor/snmp'
    61 		  warn_at: 80
    53 
    62 		disk:
    54 	Arborist::Monitor '5 minute load average check' do
    63 		  warn_at: 90
       
    64 		  include: ~
       
    65 		  exclude:
       
    66 		  - "^/dev(/.+)?$"
       
    67 		  - "^/net(/.+)?$"
       
    68 		  - "^/proc$"
       
    69 		  - "^/run$"
       
    70 		  - "^/sys/"
       
    71 		memory:
       
    72 		  physical_warn_at: ~
       
    73 		  swap_warn_at: 60
       
    74 		processes:
       
    75 		  check: []
       
    76 
       
    77 
       
    78 The `warn_at` keys imply usage capacity as a percentage. ie:  "Warn me
       
    79 when a disk mount point is at 90 percent utilization."
       
    80 
       
    81 
       
    82 ### Library Options
       
    83 
       
    84   * **timeout**: How long to wait for an SNMP response, in seconds?
       
    85   * **retries**: If an error occurs during SNMP communication, try again this many times before giving up.
       
    86   * **community**: The SNMP community name for reading data.
       
    87   * **version**: The SNMP protocol version.  1 and 2c are supported.
       
    88   * **port**: The UDP port SNMP is listening on.
       
    89   * **batchsize**: How many hosts to gather SNMP data on simultaneously.
       
    90 
       
    91 
       
    92 ### Category Options and Behavior
       
    93 
       
    94 #### CPU
       
    95 
       
    96   * **warn_at**: Set the node to a `warning` state when utilization is at or over this percentage.
       
    97 
       
    98 Utilization takes into account CPU core counts, and uses the 5 minute
       
    99 load average to calculate a percentage of current CPU use.
       
   100 
       
   101 2 properties are set on the node. `cpu` contains the detected CPU count
       
   102 and current utilization. `load` contains the 1, 5, and 15 minute load
       
   103 averages of the machine.
       
   104 
       
   105 
       
   106 #### Disk
       
   107 
       
   108   * **warn_at**: Set the node to a `warning` state when disk capacity is at or over this amount.
       
   109                  You can also set this to a Hash, keyed on mount name, if you want differing
       
   110                  warning values per mount point.  A mount point that is at 100% capacity will
       
   111                  be explicity set to `down`, as the resource it represents has been exhausted.
       
   112   * **include**: String or Array of Strings.  If present, only matching mount points are
       
   113                  considered while performing checks.  These are treated as regular expressions.
       
   114   * **exclude**: String or Array of Strings.  If present, matching mount point are removed
       
   115                  from evaluation.  These are treated as regular expressions.
       
   116 
       
   117 
       
   118 #### Memory
       
   119 
       
   120   * **physical_warn_at**: Set the node to a `warning` state when RAM utilization is at or over this percentage.
       
   121   * **swap_warn_at**: Set the node to a `warning` state when swap utilization is at or over this percentage.
       
   122 
       
   123 Warnings are only set for swap my default, since that is usually a
       
   124 better indication of an impending problem.
       
   125 
       
   126 
       
   127 #### Processes
       
   128 
       
   129   * **check**: String or Array of Strings.  A list of processes to check if running.  These are
       
   130                treated as regular expressions, and include process arguments.
       
   131 
       
   132 If any process in the list is not found in the process table, the
       
   133 resource is set to a `down` state.
       
   134 
       
   135 
       
   136 Examples
       
   137 --------
       
   138 
       
   139 In the simplest form, using default behaviors and settings, here's an
       
   140 example Monitor configuration:
       
   141 
       
   142 	require 'arborist/snmp'
       
   143 
       
   144 	Arborist::Monitor 'cpu load check', :cpu do
       
   145 		every 1.minute
       
   146 		match type: 'resource', category: 'cpu'
       
   147 		exec( Arborist::Monitor::SNMP::CPU )
       
   148 	end
       
   149 
       
   150 	Arborist::Monitor 'partition capacity', :disk do
       
   151 		every 1.minute
       
   152 		match type: 'resource', category: 'disk'
       
   153 		exec( Arborist::Monitor::SNMP::Disk )
       
   154 	end
       
   155 
       
   156 	Arborist::Monitor 'process checks', :proc do
       
   157 		every 1.minute
       
   158 		match type: 'resource', category: 'process'
       
   159 		exec( Arborist::Monitor::SNMP::Process )
       
   160 	end
       
   161 
       
   162 	Arborist::Monitor 'memory', :memory do
       
   163 		every 1.minute
       
   164 		match type: 'resource', category: 'memory'
       
   165 		exec( Arborist::Monitor::SNMP::Memory )
       
   166 	end
       
   167 
       
   168 
       
   169 Additionally, if you'd like these SNMP monitors to rely on the SNMP
       
   170 service itself, you can add a UDP check for that.
       
   171 
       
   172 	Arborist::Monitor 'udp service checks', :udp do
    55 		every 30.seconds
   173 		every 30.seconds
    56 		match type: 'resource', category: 'load'
   174 		match type: 'service', protocol: 'udp'
    57 		include_down true
   175 		exec( Arborist::Monitor::Socket::UDP )
    58 		use :addresses
   176 	end
    59 
   177 
    60 		snmp = Arborist::Monitor::SNMP::Load( error_at: 10 )
   178 
    61 		exec( snmp )
   179 And a default node declaration:
    62 	end
   180 
    63 
   181 	Arborist::Host 'example' do
    64 	Arborist::Monitor 'mount capacity check' do
   182 		description 'An example host'
    65 		every 30.seconds
   183 		address 'demo.example.com'
    66 		match type: 'resource', category: 'disk'
   184 
    67 		include_down true
   185 		resource 'cpu'
    68 		use :addresses, :config
   186 		resource 'memory'
    69 
   187 		resource 'disk'
    70 		exec( Arborist::Monitor::SNMP::Disk )
   188 	end
    71 	end
   189 
    72 
   190 
    73 
   191 
    74 Please see the rdoc for all the mode types and error_at options.  Per
   192 All configuration can be overridden from the defaults using the `config`
    75 node "config" vars override global defaults when instantiating the
   193 pragma, per node.  Here's a more elaborate example that performs the following:
    76 monitor.
   194 
       
   195   * All SNMP monitored resources are quieted if the SNMP service itself is unavailable.
       
   196   * Only monitor specific disk partitions, warning at different capacities .
       
   197   * Ensure the 'important' processing is running with the '--production' flag.
       
   198   * Warns at 95% memory utilization OR 10% swap.
       
   199 
       
   200 
       
   201 	Arborist::Host 'example' do
       
   202 		description 'An example host'
       
   203 		address 'demo.example.com'
       
   204 
       
   205 		service 'snmp', protocol: 'udp'
       
   206 
       
   207 		resource 'cpu', description: 'machine cpu load' do
       
   208 			depends_on 'example-snmp'
       
   209 		end
       
   210 
       
   211 		resource 'memory', description: 'machine ram and swap' do
       
   212 			depends_on 'example-snmp'
       
   213 			config physical_warn_at: 95, swap_warn_at: 10
       
   214 		end
       
   215 
       
   216 		resource 'disk', description: 'partition capacity' do
       
   217 			depends_on 'example-snmp'
       
   218 			config \
       
   219 				include: [
       
   220 					'^/tmp',
       
   221 					'^/var'
       
   222 				],
       
   223 				warn_at: {
       
   224 						'/tmp' => 50,
       
   225 						'/var' => 80
       
   226 				}
       
   227 		end
       
   228 
       
   229 		resource 'process' do
       
   230 			depends_on 'example-snmp'
       
   231 			config check: 'important --production'
       
   232 		end
       
   233 	end
    77 
   234 
    78 
   235 
    79 
   236 
    80 ## License
   237 ## License
    81 
   238 
    82 Copyright (c) 2016, Michael Granger and Mahlon E. Smith
   239 Copyright (c) 2016-2018 Michael Granger and Mahlon E. Smith
    83 All rights reserved.
   240 All rights reserved.
    84 
   241 
    85 Redistribution and use in source and binary forms, with or without
   242 Redistribution and use in source and binary forms, with or without
    86 modification, are permitted provided that the following conditions are met:
   243 modification, are permitted provided that the following conditions are met:
    87 
   244