README.md
changeset 35 c24dbd004cbc
parent 33 ba41bfbe87a2
equal deleted inserted replaced
34:fbfdaa5215a3 35:c24dbd004cbc
       
     1 
       
     2 Volta
       
     3 =====
       
     4 
       
     5 What is volta?
       
     6 --------------
       
     7 
       
     8 Volta is a high performance, low resource URI rewriter for use with the
       
     9 Squid caching proxy server (http://www.squid-cache.org/.)  With it, you
       
    10 can dynamically alter URI requests that pass through Squid based on
       
    11 various criteria.
       
    12 
       
    13 It uses a state machine to parse URIs and rules, and a constant database
       
    14 to store and access those rules.  It can then either perform conditional
       
    15 rewrites internally, or by evaluating Lua scripts.
       
    16 
       
    17 
       
    18 Why is it called "volta"?
       
    19 -------------------------
       
    20 
       
    21 It's a type of old Italian music written in triple-time.  Quick!
       
    22 
       
    23 
       
    24 How fast is it?
       
    25 ---------------
       
    26 
       
    27 On a 2Ghz Xeon 5130, it can process a million squid requests against
       
    28 10000 rules in less than 8 seconds, using about 800k of ram.  On an
       
    29 1.8Ghz Intel E4300, it can do it in 3 seconds.
       
    30 
       
    31 Your mileage may vary, but for most all intents and purposes the answer
       
    32 is "definitely fast enough."
       
    33 
       
    34 
       
    35 Configuring squid
       
    36 -----------------
       
    37 
       
    38 You must enable url rewriting from within the squid.conf file.
       
    39 
       
    40 	url_rewrite_program /usr/local/bin/volta
       
    41 
       
    42 ... and that's it.  You may need some additional customization, like where
       
    43 the volta database is stored on disk:
       
    44 
       
    45 	url_rewrite_program /usr/local/bin/volta -f /var/db/squid/volta.db
       
    46 
       
    47 Busy servers:
       
    48 
       
    49 While Volta is lightweight enough to simply increase the amount of
       
    50 rewriter children, it also supports Squid's rewrite_concurrency format
       
    51 if you find that to be more efficient for your environment.  Adjust to
       
    52 taste.
       
    53 
       
    54 	url_rewrite_children 5 startup=1 idle=2 concurrency=50
       
    55 
       
    56 
       
    57 Using volta
       
    58 -----------
       
    59 
       
    60 See the INSTALL file for instructions on how to compile volta.
       
    61 
       
    62 Volta reads its rewrite rules from a local database.  You can create the
       
    63 rules in a text editor, then convert it to the database like so:
       
    64 
       
    65 	% volta -c rules.txt
       
    66 
       
    67 You'll be left with a "volta.db" file in the current directory.  Put it
       
    68 wherever you please, and use the -f flag to point to it.
       
    69 
       
    70 
       
    71 Rule file syntax
       
    72 ----------------
       
    73 
       
    74 Volta's rule syntax is designed to be easy to parse by humans and
       
    75 machines.  Blank lines are skipped, as is any line that starts with the
       
    76 '#' character, so you can keep the ascii version of your rules well
       
    77 documented and in version control.  There is no practical limit on the
       
    78 number of rules in this database.
       
    79 
       
    80 When compiling the ruleset into the database format, volta detects
       
    81 malformed rules and stops if there are any problems, leaving your
       
    82 original database intact.  You can change the ruleset at any time while
       
    83 volta is running, and the new rules will take affect within about 10
       
    84 seconds.  No need to restart squid!
       
    85 
       
    86 There are two types of rules -- positive matches, and negative matches.
       
    87 Positive matches cause the rewrite, negative matches intentionally allow
       
    88 the original request to pass.  Rule order is consistent, top-down, first
       
    89 match wins.  Fields are separated by any amount of whitespace (spaces or
       
    90 tabs.)
       
    91 
       
    92 
       
    93 ### Positive matches:
       
    94 
       
    95 **First field**: *the hostname to match*
       
    96 
       
    97 	You can use an exact hostname (www.example.com), or the top level
       
    98 	domain (tld) if you want to match everything under a specific host
       
    99 	(example.com.)  You can also use a single '*' to match every request,
       
   100 	though this essentially bypasses a lot of what makes volta quick, it
       
   101 	is included for completeness.  You may have an unlimited amount of
       
   102 	rules per hostname.  Hostnames are compared without case sensitivity.
       
   103 
       
   104 **Second field**: *the path to match*
       
   105 
       
   106 	This can be an exact match ('/path/to/something.html'), a regular
       
   107 	expression ('\.(jpg|gif|png)$'), or a single '*' to match for any
       
   108 	path. Regular expressions are matched without case sensitivity.  There
       
   109 	is currently no internal support for captures, though you can use
       
   110 	a Lua rule (see below) for more complex processing.
       
   111 
       
   112 **Third field**: *the redirect code and url to rewrite to*
       
   113 
       
   114 	Any pieces of a url that are omitted are automatically replaced
       
   115 	with the original request's element -- the exception is a hostname,
       
   116 	which is required.  If you omit a redirect code, the URL rewrite is
       
   117 	transparent to the client.  You can attach a 301: or 302: prefix to
       
   118 	cause a permanent or temporary code to be respectively sent, instead.
       
   119 
       
   120 	If you require more complex processing than what volta provides
       
   121 	internally, you can also specify a path to a Lua script (prefixed
       
   122 	with 'lua:'.)  See the 'Lua rules' section of this README for more
       
   123 	information.
       
   124 
       
   125 
       
   126 ### Negative matches:
       
   127 
       
   128 **First field**: *the hostname to match*
       
   129 
       
   130 See above -- all the same rules apply.
       
   131 
       
   132 
       
   133 **Second field**: *the path to match*
       
   134 
       
   135 See above -- all the same rules apply.
       
   136 
       
   137 
       
   138 **Third field**: *the 'negative' marker*
       
   139 
       
   140 This is simply the '-' character, that signals to volta that this is
       
   141 a negative matching rule.
       
   142 
       
   143 
       
   144 You can easily test your rules by running volta on the command line, and
       
   145 pasting URLs into it.   Boost the debug level (-d4) if you're having any issues.
       
   146 
       
   147 
       
   148 Examples
       
   149 --------
       
   150 
       
   151 Rewrite all requests to Google to the SSL version:
       
   152 
       
   153 	google.com * 302:https://www.google.com
       
   154 
       
   155 	This will redirect the request "http://www.google.com/search?q=test" to
       
   156 	"https://www.google.com/search?q=test".
       
   157 
       
   158 
       
   159 Transparently alter all uploaded images on imgur to be my face:  :)
       
   160 
       
   161 	i.imgur.com \.(gif|png|jpg)$ http://www.martini.nu/images/mahlon.jpg
       
   162 
       
   163 
       
   164 Expand a local, non qualified hostname to a FQDN (useful alongside the
       
   165 'dns_defnames' squid setting to enforce browser proxy behaviors):
       
   166 
       
   167 	local-example * local-example.company.com
       
   168 
       
   169 
       
   170 Cause all blog content except for 2011 posts to permanently redirect to
       
   171 an archival page:
       
   172 
       
   173 	martini.nu /blog/2011 -
       
   174 	martini.nu /blog 301:martini.nu/content-archived.html
       
   175 
       
   176 
       
   177 Send all requests to reddit/r/WTF/* through a lua script for further processing.
       
   178 
       
   179 	reddit.com /r/wtf lua:/path/to/a/lua-script.lua
       
   180 
       
   181 
       
   182 Turn off rewriting for specific network segment or IP address:
       
   183 
       
   184 	Squid has this ability built in -- see the 'url_rewrite_access' setting.
       
   185 	Alternatively, do the checks in lua.
       
   186 
       
   187 
       
   188 
       
   189 Lua Rules
       
   190 ---------
       
   191 
       
   192 Volta has an embedded Lua interpreter that you can use to perform all
       
   193 kinds of conditional rewrites.  Read more about the syntax of the Lua
       
   194 language here: http://www.lua.org/manual/5.1/
       
   195 
       
   196 ### Loading a script
       
   197 
       
   198 To use a Lua script, prefix the rewrite target of a volta rule with
       
   199 'lua:'.  The rest of the target is then treated as a path to the script.
       
   200 (You can find an example in the Examples section of this README.)
       
   201 
       
   202 You can specify a path to either an ascii file, or Lua bytecode. (If
       
   203 speed is an absolute premium, I'm seeing around a 25% performance
       
   204 increase by using Lua bytecode files.)
       
   205 
       
   206 You can use different scripts for different rules, or use the same
       
   207 script across any number of separate rules.
       
   208 
       
   209 There is no need to restart squid when modifying Lua rules.  Changes are
       
   210 seen immediately.
       
   211 
       
   212 
       
   213 ### Environment
       
   214 
       
   215 * Global variable declarations are disabled, so scripts can't accidently stomp on each other.  All variables must be declared with the 'local' keyword.
       
   216 * There is a global table called 'shared' you may use if you want to share data between separate scripts, or remember things in-between rule evaluations.
       
   217 * The details of the request can be found in a table, appropriately named 'request'.  HTTP scheme, host, path, port, method, client_ip, and domain are all available by default from the request table.
       
   218 * Calling Lua's print() function emits debug information to stderr.  Use a debug level of 2 or higher to see it.
       
   219 
       
   220 
       
   221 ### Return value
       
   222 
       
   223 The return value of the script is sent unmodified to squid, which should
       
   224 be a URL the request is rewritten to, with an optional redirect code
       
   225 prefix (301 or 302.)
       
   226 
       
   227 Omitting a return value, or returning 'nil' has the same effect as a negative
       
   228 rule match -- the original request is allowed through without any rewrite.
       
   229 
       
   230 
       
   231 An extremely simple Lua rule script can be found in the 'examples'
       
   232 directory, distributed with volta.
       
   233