Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > eef56d5d1b7972ef4a2c51ea5be6f6b4 > files > 6

apache-mod_replace-0.1.0-10mdv2010.0.i586.rpm

mod_replace
===========

HTTP Header / Body text replacement for Apache 2.0.x Webservers.

 ------------------------------------------------------------------------------

1) What is mod_replace?
2) How does it work?
3) Configuration
4) FAQ
5) Copyright / License
6) Contact

 ------------------------------------------------------------------------------

1) What is mod_replace?
   --------------------

mod_replace is a simple Apache 2.0.x filter module which has originally been
developed based on mod_ext_filter.c. The initial purpose has been to support a
Apache based reverse proxy with mod_rewrite. Absolute URLs contained in the
HTTP body could not be handled with mod_rewrite. Thus there was a slow mod_perl
solution to rewrite the body content.

The C-based mod_replace in its original version provided a much faster approach
to this problem. Since then HTTP header replacement (eg. for Cookie
adjustments) have been added.

Until now, mod_replace is only used in addition to mod_proxy to provide an 
improved reverse proxy experience. It greatly helps to sanitize ill behaving 
web servers / applications. Examples: absolute links within web pages, absolute
links in HTTP headers which aren't controlled by mod_proxy (eg. Set-Cookie).

2) How does it work?
   -----------------

There are up to date three destinct mechanisms to do pattern replacement within
mod_replace. Those are:

- HTTP response body replacements
  > The most powerful replacements, since there's support for subpatterns
  > most commonly used feature of mod_replace
  > based upon Apache's filter mechanism (Output Filter)

- HTTP response header replacements
  > Suitable for rewriting HTTP header information of server responses
  > No support for subpatterns (still to come)
  > based upon Apache's filter mechanism (Output Filter)

- HTTP request header replacements
  > Suitable for rewriting HTTP header information of client requests before
    they reach the server (eg. on a reverse proxy)
  > NOT based upon Apache's filter mechanism

When a HTTP response is routed through an HTTP body filter, there are a couple
of things you should be aware of:

 - The filter has to assemble the whole response before it starts processing
   the content. Apache uses buckets to store parts of the data. If you process
   only those buckets, you certainly get into trouble when a pattern extends
   over the edges of those buckets. Thus you have to concatenate all buckets to 
   a single data structure. This takes time and resources! 
   
 - Multiple patterns for the same filter definition are concatenated using a
   linked list. This means that the patterns are processed sequentially. The
   pattern first defined will be the pattern first processed by the filter.
   Once this run is complete, the next pattern processes the already altered
   data. This means, if you define multiple patterns, each page has to be
   processed multiple times. Try to solve you problem with as few patterns as
   possible.

 - Once all patterns are processed the data is passed to the next filter. If
   this is another HTTP body filter, the whole story is repeated.

Using an HTTP response header filter, the process is almost the same as above. 
The patterns are sequentially matched against the data and the necessary
replacements take part before the next pattern is processed.

One special feature to note with this filter is, that is doesn't stop looking
for matching headers once it found one (which would make sense, according to
the HTTP standard, there is only one occurrence of an HTTP header per
response). There is one commonly used situation where there are multiple
occurrences of the same HTTP header, each with different content: Set-Cookie.

The HTTP request header filter is quite different from the other filters,
because it doesn't use the same mechanism within Apache. If it would use the
filter mechanism (eg. as an input filter), any request that is also routed
through mod_proxy (using Apache as a reverse proxy) will first be processed by
mod_proxy and then by mod_replace. Any modifications done to the HTTP header
then are completely ignored by mod_proxy, since it already has created the
request to the origin server and the modifications by mod_replace are simply
discarded.

The mechanism used by mod_replace for modifications of the request header allow
you to alter the HTTP header before mod_proxy processes the request. The same
rules apply for the patterns: Multiple patterns are linked together in a linked
list and are processed sequentially. Note: There is only one "filter" for all
patterns. You don't need to create a named definition and you don't have to set
the output filter. But you won't be able to specify additional parameters.

3) Configuration
   -------------

- Configuring an HTTP body filter:

Syntax:

  ReplaceFilterDefine <name> [<options> ...]

    <name>      The name of the filter definition. Used to destinguish multiple
                filters (not patterns) and to selectively actived filters.
    <options>   Configuration options for this filter definition. 
                
                CaseIgnore      Pattern matching is case insensitive. Don't set
                                this option if you want your patterns to be
                                matched case sensitive!
                intype=<mime>   Narrows the pattern matching to HTTP responses
                                with the specified MIME type (eg. text/html).
                                Be careful if you use this option with HTTP
                                header pattern.

  ReplacePattern <name> <pattern> <string>

    <name>      The name of the filter definition which this pattern is added
                to. Be sure to define a filter by using the ReplaceFilterDefine
                command.
    <pattern>   A PCRE (perl compatible regular expression) pattern. This
                pattern is matched against any the HTTP body coming from the
                server. You may use subpatterns and reference them (up to 9) in
                the replacement string. See the examples for more information.
    <string>    The string that is inserted as an replacement if a pattern
                matches. You may specify up to 9 subpatterns from the original
                pattern (\0 - \9). See the examples.

  SetOutputFilter <name>[;<name>]

    <name>      The name of a filter definition that needs to be activated. If
                there are multiple definitions, you have to put semicolons 
                between the names.

Examples:

  ReplaceFilterDefine revproxy CaseIgnore intype=text/html
  ReplacePattern revproxy "(http|https)://origin.server/" "\1://revproxy/"
  SetOutputFilter revproxy

- Configuring an HTTP header filter:

  ReplaceFilterDefine <name> [<options> ...]

    <name>      The name of the filter definition. Used to destinguish multiple
                filters (not patterns) and to selectively actived filters.
    <options>   Configuration options for this filter definition.

                CaseIgnore      Pattern matching is case insensitive. Don't set
                                this option if you want your patterns to be
                                matched case sensitive!
                intype=<mime>   Narrows the pattern matching to HTTP responses
                                with the specified MIME type (eg. text/html).
                                Be careful if you use this option with HTTP
                                header pattern.
  HeaderReplacePattern <name> <header> <pattern> <string>
    <name>      The name of the filter definition which this pattern is added
                to. Be sure to define a filter by using the ReplaceFilterDefine
                command.
    <header>    This is the HTTP header that is to be altered. Note: you cannot
                alter the header field, only its content. Eg. you can alter the
                domain name of a Set-Cookie header, but not change an
                - obviously wrong - "SetKookie" to "Set-Cookie".
    <pattern>   A PCRE (perl compatible regular expression) pattern. This
                pattern is matched against the HTTP body coming from the
                server. You can use subpatterns here, but you are not able to
                reference them in the replacement string (not implemented).
    <string>    The string that is inserted as an replacement if a pattern
                matches.
                
  SetOutputFilter <name>[;<name>]

    <name>      The name of a filter definition that needs to be activated. If
                there are multiple definitions, you have to put semicolons
                between the names.

Examples:

  ReplaceFilterDefine revproxy CaseIgnore
  HeaderReplacePattern revproxy Set-Cookie \
    " domain=[.]?server.com" \
    " domain=revproxy.com"
  SetOutputFilter revproxy

  -> HTTP header before OutputFilter:
     Date: Wed, 07 Apr 2004 13:08:01 GMT
     Server: Apache/1.3.29
     Vary: Accept-Encoding,User-agent
     Set-Cookie: UID=0815; domain=server.com; path=/
     Connection: close
     Content-Type: text/html; charset=iso-8859-1

  -> HTTP header after OutputFilter:
     Date: Wed, 07 Apr 2004 13:08:01 GMT
     Server: Apache/1.3.29
     Vary: Accept-Encoding,User-agent
     Set-Cookie: UID=0815; domain=revproxy.com; path=/
     Connection: close
     Content-Type: text/html; charset=iso-8859-1

- Configuring an HTTP request header filter:

Syntax:

  RequestHeaderPattern <header> <pattern> <string>
    <header>    This is the HTTP header that is to be altered. Note: you cannot
                alter the header field, only its content. Eg. you can alter the
                domain name of a Set-Cookie header, but not change an
                - obviously wrong - "SetKookie" to "Set-Cookie".
    <pattern>   A PCRE (perl compatible regular expression) pattern. This
                pattern is matched against the HTTP body coming from the
                server. You can use subpatterns here, but you are not able to
                reference them in the replacement string (not implemented).
    <string>    The string that is inserted as an replacement if a pattern
                matches.

Examples:

  RequestHeaderPattern Cookie " UID=0815" " UID=007"

4) Frequently Asked Questions (FAQ)
   --------------------------------

Q: Does it run on my platform?
A: I don't know. It has been developed under Linux/i386 and Solaris 8/Sparc. It
   runs well on these platforms. It has been developed using NO platform 
   specific code (AFAIK). So it should run on all Apache supported platforms.
   Reports of (un-)successful deployment of mod_replace are always welcome.

Q: The body replacement works when I use a simple telnet connection to the
   server, but not with a browser. What is wrong?
A: Your browser most likely sends a "Accept-Encoding: gzip" in the HTTP header
   and the server compresses the response (eg. with mod_gzip). Try either to
   unset the request header entry (mod_headers is great for this) or turn off
   the compression.
   
   Add the following line:
     RequestHeader unset Accept-Encoding

Q: The replacement works OK, but sometimes images or binaries are garbled.
A: Any data that comes from the server is most likely to be passed through your
   filter. Narrow the intype to text/html.

Q: Will there be an Apache 1.3.x version?
A: No! mod_replace has been developed using Apache 2. The differences between
   both versions are most likely too big to make it work. If anyone is
   interested in porting the module, he/she is more than welcome.

5) Copyright / License
   -------------------

   see file: LICENSE

6) Thanks
   ------

Thanks go to science + computing ag (http://www.science-computing.de) which
sponsored most of the development for this software!

7) Contact
   -------

If you have any comments, feature requests, patches, security fixes, etc feel
free to send me an email:

  s.tesch (at) science-computing (dot) de