README
author Mahlon E. Smith <mahlon@laika.com>
Wed, 09 Nov 2011 16:07:25 -0800
changeset 21 3510b50c6694
parent 18 d4ce82194b64
child 22 822094314703
child 26 7b28fb383da2
permissions -rw-r--r--
Tag for initial release (semver)


Volta
=====

What is volta?
--------------

Volta is a high performance, low resource URI rewriter for use with the
Squid caching proxy server (http://www.squid-cache.org/.)  With it, you
can dynamically alter URI requests that pass through Squid based on
various criteria.

It uses a state machine to parse URIs and rules, and a constant database
to store and access those rules.


Why is it called "volta"?
-------------------------

It's a type of old Italian music written in triple-time.  Quick!


How fast is it?
---------------

On a 2Ghz Xeon 5130, it can process a million squid requests against
10000 rules in less than 8 seconds, using about 800k of ram.  On an
1.8Ghz Intel E4300, it can do it in 3 seconds.

Your mileage may vary, but for most all intents and purposes the answer
is "definitely fast enough."


Configuring squid
-----------------

You must enable url rewriting from within the squid.conf file.

	url_rewrite_program /usr/local/bin/volta

... and that's it.  You may need some additional customization, like where
the volta database is stored on disk:

	url_rewrite_program /usr/local/bin/volta -f /var/db/squid/volta.db

Busy servers:

Make sure rewrite_concurrency is disabled, volta is single threaded.
Instead, just add more volta children.  They are lightweight, so load em
up.  A proxy at my $DAYJOB is in use by around 450 people, and we get by
nicely with 10 volta children.

	url_rewrite_concurrency 0
	url_rewrite_children 10


Using volta
-----------

See the INSTALL file for instructions on how to compile volta.

Volta reads its rewrite rules from a local database.  You can create the
rules in a text editor, then convert it to the database like so:

	% volta -c rules.txt

You'll be left with a "volta.db" file in the current directory.  Put it
wherever you please, and use the -f flag to point to it.


Rule file syntax
----------------

Volta's rule syntax is designed to be easy to parse by humans and
machines.  Blank lines are skipped, as is any line that starts with the
'#' character, so you can keep the ascii version of your rules well
documented and in version control.

When compiling the ruleset into the database format, volta detects
malformed rules and stops if there are any problems, leaving your
original database intact.  You can change the ruleset at any time while
volta is running, and the new rules will take affect within about 10
seconds.  No need to restart squid!

There are two types of rules -- positive matches, and negative matches.
Positive matches cause the rewrite, negative matches allow the original
request to pass.  Rule order is consistent, top-down, first match wins.
Fields are separated by any amount of whitespace (spaces or tabs.)


### Positive matches:

    First field: the hostname to match.

      You can use an exact hostname (www.example.com), or the top level
      domain (tld) if you want to match everything under a specific host
      (example.com.)  You can also use a single '*' to match every request,
      though this essentially bypasses a lot of what makes volta quick, it
      is included for completeness.  You may have an unlimited amount of
      rules per hostname.  Hostnames are compared without case sensitivity.


    Second field: the path to match.

	  This can be an exact match ('/path/to/something.html'), a regular
	  expression ('\.(jpg|gif|png)$'), or a single '*' to match for any
	  path. Regular expressions are matches without case sensitivity.  There
	  is currently no support for capturing, though this may be added in
	  a future release.


    Third field: The redirect code and url to rewrite to.

      Any pieces of a url that are omitted are automatically replaced
      with the original request's element -- the exception is a hostname,
      which is required.  If you omit a redirect code, the URL rewrite is
      transparent to the client.  You can attach a 301: or 302: prefix to
      cause a permanent or temporary code to be respectively sent, instead.


### Negative matches:

    First field: the hostname to match.

	  See above -- all the same rules apply.


    Second field: the path to match.

	  See above -- all the same rules apply.


	Third field: the 'negative' marker.

	  This is simply the '-' character, that signals to volta that this is
	  a negative matching rule.


You can easily test your rules by running volta on the command line, and
pasting URLs into it.   Boost the debug level (-d4) if you're having any issues.


Examples
--------

Rewrite all requests to Google to the SSL version:

    google.com * 302:https://www.google.com

	This will redirect the request "http://www.google.com/search?q=test" to
	"https://www.google.com/search?q=test".


Transparently alter all uploaded images on imgur to be my face:  :)

	i.imgur.com \.(gif|png|jpg)$ http://www.martini.nu/images/mahlon.jpg


Expand a local, non qualified hostname to a FQDN (useful alongside the
'dns_defnames' squid setting to enforce browser proxy behaviors):

	local-example * local-example.company.com


Cause all blog content except for 2011 posts to permanently redirect to
an archival page:

	martini.nu /blog/2011 -
	martini.nu /blog 301:martini.nu/content-archived.html


Turn off rewriting for specific network segment or IP address:

	Squid has this ability built in -- see the 'url_rewrite_access' setting.