README
author Mahlon E. Smith <mahlon@laika.com>
Wed, 09 Nov 2011 16:40:38 -0800
changeset 28 46e23ce07981
parent 27 bf206815c2ab
child 30 5cc836e06759
permissions -rw-r--r--
Merge Michael's <ged@faeriemud.org> README changes in. Thanks for proofreading!


Volta
=====

What is volta?
--------------

Volta is a high-performance, low-resource URI rewriter for use with the
Squid caching proxy server (http://www.squid-cache.org/.)  With it, you
can dynamically alter URI requests that pass through Squid based on
various criteria.

It uses a state machine to parse URIs and rules, and a constant database
to store and access those rules.


Why is it called "volta"?
-------------------------

It's a type of old Italian music written in triple-time.  Quick!


How fast is it?
---------------

On a 2Ghz Xeon 5130, it can process a million squid requests against
10000 rules in less than 8 seconds, using about 800k of ram.  On an
1.8Ghz Intel E4300, it can do it in 3 seconds.

Your mileage may vary, but for most all intents and purposes the answer
is "definitely fast enough."


Configuring squid
-----------------

You must enable url rewriting from within the squid.conf file.

    url_rewrite_program /usr/local/bin/volta

... and that's it.  You may need some additional customization, like where
the volta database is stored on disk:

    url_rewrite_program /usr/local/bin/volta -f /var/db/squid/volta.db

Busy servers:

Make sure rewrite_concurrency is disabled, volta is single threaded.
Instead, just add more volta children.  They are lightweight, so load em
up.  A proxy at my $DAYJOB is in use by around 450 people, and we get by
nicely with 10 volta children.

    url_rewrite_concurrency 0
    url_rewrite_children 10


Using volta
-----------

See the INSTALL file for instructions on how to compile volta.

Volta reads its rewrite rules from a local database.  You can create the
rules in a text editor, then convert it to the database like so:

    % volta -c rules.txt

You'll be left with a "volta.db" file in the current directory.  Put it
wherever you please, and use the -f flag to point to it.


Rule file syntax
----------------

Volta's rule syntax is designed to be easy to parse by humans and
machines.  Blank lines are skipped, as is any line that starts with the
'#' character, so you can keep the ascii version of your rules well
documented and in version control.

When compiling the ruleset into the database format, volta detects
malformed rules and stops if there are any problems, leaving your
original database intact. You can change the ruleset and recompile the
database at any time while volta is running, and the new rules will take
affect within about 10 seconds. No need to restart squid!

There are two types of rules -- positive matches, and negative matches.
Positive matches cause the rewrite, negative matches allow the original
request to pass.  Rule order is consistent, top-down, first match wins.
Fields are separated by any amount of whitespace (spaces or tabs.)


### Positive matches:

    First field: the hostname to match.

      You can use an exact hostname (www.example.com), or the top level
      domain (tld) if you want to match everything under a specific host
      (example.com.)  You can also use a single '*' to match every request,
      though this essentially bypasses a lot of what makes volta quick, it
      is included for completeness.  You may have an unlimited amount of
      rules per hostname.  Hostnames are compared without case sensitivity.


    Second field: the path to match.

      This can be an exact match ('/path/to/something.html'), a regular
      expression ('\.(jpg|gif|png)$'), or a single '*' to match for any
      path. Regular expressions are matches without case sensitivity. There
      is currently no support for capturing, though this may be added in a
      future release.


    Third field: The redirect code and url to rewrite to.

      Any pieces of a url that are omitted are automatically replaced with
      the original request's element -- the exception is a hostname, which
      is required. If you omit a redirect code, the URL rewrite is
      transparent to the client. You can attach a 301: or 302: prefix to
      cause a permanent or temporary (respectively) redirect response to be
      sent, instead.


### Negative matches:

    First field: the hostname to match.

      See above -- all the same rules apply.


    Second field: the path to match.

      See above -- all the same rules apply.


    Third field: the 'negative' marker.

      This is simply the '-' character, that signals to volta that this is
      a negative matching rule.


You can easily test your rules by running volta on the command line, and
pasting URLs into it.   Boost the debug level (-d4) if you're having any issues.


Examples
--------

Rewrite all requests to Google to the SSL version:

    google.com * 302:https://www.google.com

    This will redirect the request "http://www.google.com/search?q=test" to
    "https://www.google.com/search?q=test".


Transparently alter all uploaded images on imgur to be my face:  :)

    i.imgur.com \.(gif|png|jpg)$ http://www.martini.nu/images/mahlon.jpg


Expand a local, non qualified hostname to a FQDN (useful alongside the
'dns_defnames' squid setting to enforce browser proxy behaviors):

    local-example * local-example.company.com


Cause all blog content except for 2011 posts to permanently redirect to
an archival page:

    martini.nu /blog/2011 -
    martini.nu /blog 301:martini.nu/content-archived.html