README
changeset 35 c24dbd004cbc
parent 34 fbfdaa5215a3
child 36 abd2aa4aaca8
equal deleted inserted replaced
34:fbfdaa5215a3 35:c24dbd004cbc
     1 
       
     2 Volta
       
     3 =====
       
     4 
       
     5 What is volta?
       
     6 --------------
       
     7 
       
     8 Volta is a high performance, low resource URI rewriter for use with the
       
     9 Squid caching proxy server (http://www.squid-cache.org/.)  With it, you
       
    10 can dynamically alter URI requests that pass through Squid based on
       
    11 various criteria.
       
    12 
       
    13 It uses a state machine to parse URIs and rules, and a constant database
       
    14 to store and access those rules.  It can then either perform conditional
       
    15 rewrites internally, or by evaluating Lua scripts.
       
    16 
       
    17 
       
    18 Why is it called "volta"?
       
    19 -------------------------
       
    20 
       
    21 It's a type of old Italian music written in triple-time.  Quick!
       
    22 
       
    23 
       
    24 How fast is it?
       
    25 ---------------
       
    26 
       
    27 On a 2Ghz Xeon 5130, it can process a million squid requests against
       
    28 10000 rules in less than 8 seconds, using about 800k of ram.  On an
       
    29 1.8Ghz Intel E4300, it can do it in 3 seconds.
       
    30 
       
    31 Your mileage may vary, but for most all intents and purposes the answer
       
    32 is "definitely fast enough."
       
    33 
       
    34 
       
    35 Configuring squid
       
    36 -----------------
       
    37 
       
    38 You must enable url rewriting from within the squid.conf file.
       
    39 
       
    40 	url_rewrite_program /usr/local/bin/volta
       
    41 
       
    42 ... and that's it.  You may need some additional customization, like where
       
    43 the volta database is stored on disk:
       
    44 
       
    45 	url_rewrite_program /usr/local/bin/volta -f /var/db/squid/volta.db
       
    46 
       
    47 Busy servers:
       
    48 
       
    49 While Volta is lightweight enough to simply increase the amount of
       
    50 rewriter children, it also supports Squid's rewrite_concurrency format
       
    51 if you find that to be more efficient for your environment.  Adjust to
       
    52 taste.
       
    53 
       
    54 	url_rewrite_children 5 startup=1 idle=2 concurrency=50
       
    55 
       
    56 
       
    57 Using volta
       
    58 -----------
       
    59 
       
    60 See the INSTALL file for instructions on how to compile volta.
       
    61 
       
    62 Volta reads its rewrite rules from a local database.  You can create the
       
    63 rules in a text editor, then convert it to the database like so:
       
    64 
       
    65 	% volta -c rules.txt
       
    66 
       
    67 You'll be left with a "volta.db" file in the current directory.  Put it
       
    68 wherever you please, and use the -f flag to point to it.
       
    69 
       
    70 
       
    71 Rule file syntax
       
    72 ----------------
       
    73 
       
    74 Volta's rule syntax is designed to be easy to parse by humans and
       
    75 machines.  Blank lines are skipped, as is any line that starts with the
       
    76 '#' character, so you can keep the ascii version of your rules well
       
    77 documented and in version control.  There is no practical limit on the
       
    78 number of rules in this database.
       
    79 
       
    80 When compiling the ruleset into the database format, volta detects
       
    81 malformed rules and stops if there are any problems, leaving your
       
    82 original database intact.  You can change the ruleset at any time while
       
    83 volta is running, and the new rules will take affect within about 10
       
    84 seconds.  No need to restart squid!
       
    85 
       
    86 There are two types of rules -- positive matches, and negative matches.
       
    87 Positive matches cause the rewrite, negative matches intentionally allow
       
    88 the original request to pass.  Rule order is consistent, top-down, first
       
    89 match wins.  Fields are separated by any amount of whitespace (spaces or
       
    90 tabs.)
       
    91 
       
    92 
       
    93 ### Positive matches:
       
    94 
       
    95     First field: the hostname to match.
       
    96 
       
    97       You can use an exact hostname (www.example.com), or the top level
       
    98       domain (tld) if you want to match everything under a specific host
       
    99       (example.com.)  You can also use a single '*' to match every request,
       
   100       though this essentially bypasses a lot of what makes volta quick, it
       
   101       is included for completeness.  You may have an unlimited amount of
       
   102       rules per hostname.  Hostnames are compared without case sensitivity.
       
   103 
       
   104 
       
   105     Second field: the path to match.
       
   106 
       
   107 	  This can be an exact match ('/path/to/something.html'), a regular
       
   108 	  expression ('\.(jpg|gif|png)$'), or a single '*' to match for any
       
   109 	  path. Regular expressions are matched without case sensitivity.  There
       
   110 	  is currently no internal support for captures, though you can use
       
   111 	  a Lua rule (see below) for more complex processing.
       
   112 
       
   113 
       
   114     Third field: The redirect code and url to rewrite to.
       
   115 
       
   116       Any pieces of a url that are omitted are automatically replaced
       
   117       with the original request's element -- the exception is a hostname,
       
   118       which is required.  If you omit a redirect code, the URL rewrite is
       
   119       transparent to the client.  You can attach a 301: or 302: prefix to
       
   120       cause a permanent or temporary code to be respectively sent, instead.
       
   121 
       
   122       If you require more complex processing than what volta provides
       
   123       internally, you can also specify a path to a Lua script (prefixed
       
   124       with 'lua:'.)  See the 'Lua rules' section of this README for more
       
   125 	  information.
       
   126 
       
   127 
       
   128 ### Negative matches:
       
   129 
       
   130     First field: the hostname to match.
       
   131 
       
   132 	  See above -- all the same rules apply.
       
   133 
       
   134 
       
   135     Second field: the path to match.
       
   136 
       
   137 	  See above -- all the same rules apply.
       
   138 
       
   139 
       
   140 	Third field: the 'negative' marker.
       
   141 
       
   142 	  This is simply the '-' character, that signals to volta that this is
       
   143 	  a negative matching rule.
       
   144 
       
   145 
       
   146 You can easily test your rules by running volta on the command line, and
       
   147 pasting URLs into it.   Boost the debug level (-d4) if you're having any issues.
       
   148 
       
   149 
       
   150 Examples
       
   151 --------
       
   152 
       
   153 Rewrite all requests to Google to the SSL version:
       
   154 
       
   155     google.com * 302:https://www.google.com
       
   156 
       
   157 	This will redirect the request "http://www.google.com/search?q=test" to
       
   158 	"https://www.google.com/search?q=test".
       
   159 
       
   160 
       
   161 Transparently alter all uploaded images on imgur to be my face:  :)
       
   162 
       
   163 	i.imgur.com \.(gif|png|jpg)$ http://www.martini.nu/images/mahlon.jpg
       
   164 
       
   165 
       
   166 Expand a local, non qualified hostname to a FQDN (useful alongside the
       
   167 'dns_defnames' squid setting to enforce browser proxy behaviors):
       
   168 
       
   169 	local-example * local-example.company.com
       
   170 
       
   171 
       
   172 Cause all blog content except for 2011 posts to permanently redirect to
       
   173 an archival page:
       
   174 
       
   175 	martini.nu /blog/2011 -
       
   176 	martini.nu /blog 301:martini.nu/content-archived.html
       
   177 
       
   178 
       
   179 Send all requests to reddit/r/WTF/* through a lua script for further processing.
       
   180 
       
   181 	reddit.com /r/wtf lua:/path/to/a/lua-script.lua
       
   182 
       
   183 
       
   184 Turn off rewriting for specific network segment or IP address:
       
   185 
       
   186 	Squid has this ability built in -- see the 'url_rewrite_access' setting.
       
   187 	Alternatively, do the checks in lua.
       
   188 
       
   189 
       
   190 
       
   191 Lua Rules
       
   192 ---------
       
   193 
       
   194 Volta has an embedded Lua interpreter that you can use to perform all
       
   195 kinds of conditional rewrites.  Read more about the syntax of the Lua
       
   196 language here: http://www.lua.org/manual/5.1/
       
   197 
       
   198 ### Loading a script
       
   199 
       
   200 To use a Lua script, prefix the rewrite target of a volta rule with
       
   201 'lua:'.  The rest of the target is then treated as a path to the script.
       
   202 (You can find an example in the Examples section of this README.)
       
   203 
       
   204 You can specify a path to either an ascii file, or Lua bytecode. (If
       
   205 speed is an absolute premium, I'm seeing around a 25% performance
       
   206 increase by using Lua bytecode files.)
       
   207 
       
   208 You can use different scripts for different rules, or use the same
       
   209 script across any number of separate rules.
       
   210 
       
   211 There is no need to restart squid when modifying Lua rules.  Changes are
       
   212 seen immediately.
       
   213 
       
   214 
       
   215 ### Environment
       
   216 
       
   217 * Global variable declarations are disabled, so scripts can't accidently stomp on each other.  All variables must be declared with the 'local' keyword.
       
   218 * There is a global table called 'shared' you may use if you want to share data between separate scripts, or remember things in-between rule evaluations.
       
   219 * The details of the request can be found in a table, appropriately named 'request'.  HTTP scheme, host, path, port, method, client_ip, and domain are all available by default from the request table.
       
   220 * Calling Lua's print() function emits debug information to stderr.  Use a debug level of 2 or higher to see it.
       
   221 
       
   222 
       
   223 ### Return value
       
   224 
       
   225 The return value of the script is sent unmodified to squid, which should
       
   226 be a URL the request is rewritten to, with an optional redirect code
       
   227 prefix (301 or 302.)
       
   228 
       
   229 Omitting a return value, or returning 'nil' has the same effect as a negative
       
   230 rule match -- the original request is allowed through without any rewrite.
       
   231 
       
   232 
       
   233 An extremely simple Lua rule script can be found in the 'examples'
       
   234 directory, distributed with volta.
       
   235 
       
   236