<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <html lang="en"> <head> <title>mod_replace: Documentation</title> <meta name="description" content="mod_replace documentation"> <meta name="keywords" content="mod_replace docs documentation"> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> </head> <body background="#ffffff"> <h1>mod_replace: Documentation</h1> <h2>What is <i>mod_replace</i>?</h2> <p> <i>mod_replace</i> is a simple Apache 2.0.x filter module which has originally been developed based on <i>mod_ext_filter</i>. The initial purpose has been to support an Apache-based reverse proxy with <i>mod_rewrite</i>. Absolute URLs contained in the HTTP body could not be handled with <i>mod_rewrite</i>. Thus there was a slow <i>mod_perl</i> solution to rewrite the body content. </p> <p> The C-based <i>mod_replace</i> in its original version provided a much faster approach to this problem. Since then HTTP header replacement (eg. for Cookie adjustments) have been added. </i> <p> Until now, <i>mod_replace</i> is only used in addition to <i>mod_proxy</i> to provide an improved reverse proxy experience. It greatly helps to sanitize ill behaving web servers / applications. Examples: absolute links within web pages, absolute links in HTTP headers which aren't controlled by <i>mod_proxy</i> (eg. Set-Cookie). </p> <h2>How does it work?</h2> <p> There are up to date three destinct mechanisms to do pattern replacement within mod_replace. Those are: <ul> <li>HTTP response body replacements <ul> <li>The most powerful replacements, since there's support for subpatterns</li> <li>most commonly used feature of <i>mod_replace</i></li> <li>based upon Apache's filter mechanism (Output Filter)</li> </ul> </li> <li>HTTP response header replacements <ul> <li>Suitable for rewriting HTTP header information of server responses</li> <li>No support for subpatterns (still to come)</li> <li>based upon Apache's filter mechanism (Output Filter)</li> </ul> </li> <li>HTTP request header replacements <ul> <li>Suitable for rewriting HTTP header information of client requests before they reach the server (eg. on a reverse proxy)</li> <li><b>NOT</b> based upon Apache's filter mechanism</li> </ul> </li> </ul> </p> <p> When a HTTP response is routed through an HTTP body filter, there are a couple of things you should be aware of: <ul> <li>The filter has to assemble the whole response before it starts processing the content. Apache uses buckets to store parts of the data. If you process only those buckets, you certainly get into trouble when a pattern extends over the edges of those buckets. Thus you have to concatenate all buckets to a single data structure. This takes time and resources!</li> <li>Multiple patterns for the same filter definition are concatenated using a linked list. This means that the patterns are processed sequentially. The pattern first defined will be the pattern first processed by the filter. Once this run is complete, the next pattern processes the already altered data. This means, if you define multiple patterns, each page has to be processed multiple times. Try to solve you problem with as few patterns as possible.</li> <li>Once all patterns are processed the data is passed to the next filter. If this is another HTTP body filter, the whole story is repeated.</li> </ul> </p> <p> This means, that you can define multiple patterns for a single filter definition. You simply create a new <i>ReplacePattern</i> with the same name as the previous one (see examples below). </p> <p> Using an HTTP response header filter, the process is almost the same as above. The patterns are sequentially matched against the data and the necessary replacements take place before the next pattern is processed. </p> <p> One special feature to note with this filter is, that is doesn't stop looking for matching headers once it found one (which would make sense, according to the HTTP standard, there is only one occurrence of an HTTP header per response). There is one commonly used situation where there are multiple occurrences of the same HTTP header, each with different content: Set-Cookie. </p> <p> The HTTP request header filter is quite different from the other filters, because it doesn't use the same mechanism within Apache. If it would use the filter mechanism (eg. as an input filter), any request that is also routed through <i>mod_proxy</i> (using Apache as a reverse proxy) will first be processed by <i>mod_proxy</i> and then by <i>mod_replace</i>. Any modifications applied to the HTTP header then are completely ignored by <i>mod_proxy</i>, since it already has created the request to the origin server and the modifications by <i>mod_replace</i> are simply discarded. </p> <p> The mechanism used by <i>mod_replace</i> for modifications of the request header allow you to alter the HTTP header before mod_proxy processes the request. The same rules apply for the patterns: Multiple patterns are linked together in a linked list and are processed sequentially. Note: There is only one "filter" for all patterns. You don't need to create a named definition and you don't have to set the output filter. But you won't be able to specify additional parameters. </p> <h2>Configuration</h2> <h3>Configuring an HTTP body filter</h3> <h4>Syntax</h4> <p> <tt>ReplaceFilterDefine <name> [<options> ...]</tt> </p> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><name></tt></td> <td>The name of the filter definition. Used to destinguish multiple filters (not patterns) and to selectively actived filters.</td> </tr> <tr> <td><tt><options></tt></td> <td>Configuration options for this filter definition.<br><br> <table> <tr> <td><tt>CaseIgnore</tt></td> <td>Pattern matching is case insensitive. Don't set this option if you want your patterns to be matched case sensitive!</td> </tr> <tr> <td><tt>intype=<mime></tt></td> <td>Narrows the pattern matching to HTTP responses with the specified MIME type (eg. text/html). Be careful if you use this option with HTTP header patterns.</td> </tr> </table></td> </tr> </table> <p> <tt>ReplacePattern <name> <pattern> <string></tt> </p> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><name></tt></td> <td>The name of the filter definition which this pattern is added to. Be sure to define a filter by using the ReplaceFilterDefine command.</td> </tr> <tr> <td><tt><pattern></tt></td> <td>A PCRE (perl compatible regular expression) pattern. This pattern is matched against any the HTTP body coming from the server. You may use subpatterns and reference them (up to 9) in the replacement string. See the examples for more information.</td> </tr> <tr> <td><tt><string></tt></td> <td>The string that is inserted as an replacement if a pattern matches. You may specify up to 9 subpatterns from the original pattern (\0 - \9). See the examples.</td> </tr> </table> <p> <tt>SetOutputFilter <name>[;<name>]</tt> </p> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><name></tt></td> <td>The name of a filter definition that needs to be activated. If there are multiple definitions, you have to put semicolons between the names.</td> </tr> </table> <h4>Examples</h4> <pre> ReplaceFilterDefine revproxy CaseIgnore intype=text/html ReplacePattern revproxy "(http|https)://origin.server/" "\1://revproxy/" SetOutputFilter revproxy </pre> <pre> ReplaceFilterDefine multiple CaseIgnore intype=text/html ReplacePattern multiple "(http|https)://origin.server/" "\1://revproxy/" ReplacePattern multiple "ftp://origin.server" "ftp://public.server/pub" SetOutputFilter multiple </pre> <h3>Configuring an HTTP header filter</h3> <h4>Syntax</h4> <pre> ReplaceFilterDefine <name> [<options> ...] </pre> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><name></tt></td> <td>The name of the filter definition. Used to destinguish multiple filters (not patterns) and to selectively actived filters.</td> </tr> <tr> <td><tt><options></tt></td> <td>Configuration options for this filter definition. <table> <tr> <td><tt>CaseIgnore</tt></td> <td>Pattern matching is case insensitive. Don't set this option if you want your patterns to be matched case sensitive!</td> </tr> <tr> <td><tt>intype=<mime></tt></td> <td>Narrows the pattern matching to HTTP responses with the specified MIME type (eg. text/html). Be careful if you use this option with HTTP header pattern.</td> </tr> </table></td> </tr> </table> <pre> HeaderReplacePattern <name> <header> <pattern> <string> </pre> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><name></tt></td> <td>The name of the filter definition. Used to destinguish multiple filters (not patterns) and to selectively actived filters.</td> </tr> <tr> <td><tt><header></tt></td> <td>This is the HTTP header that is to be altered. Note: you cannot alter the header field, only its content. Eg. you can alter the domain name of a Set-Cookie header, but not change an - obviously wrong - "SetKookie" to "Set-Cookie".</td> </tr> <tr> <td><tt><pattern></tt></td> <td>A PCRE (perl compatible regular expression) pattern. This pattern is matched against the HTTP body coming from the server. You can use subpatterns here, but you are not able to reference them in the replacement string (not implemented).</td> </tr> <tr> <td><tt><string></tt></td> <td>The string that is inserted as an replacement if a pattern matches.</td> </tr> </table> <pre> SetOutputFilter <name>[;<name>] </pre> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><name></tt></td> <td>The name of a filter definition that needs to be activated. If there are multiple definitions, you have to put semicolons between the names.</td> </tr> </table> <h4>Examples</h4> <pre> ReplaceFilterDefine revproxy CaseIgnore HeaderReplacePattern revproxy Set-Cookie \ " domain=[.]?server.com" \ " domain=revproxy.com" SetOutputFilter revproxy </pre> <table bgcolor="#e0e0e0"> <tr> <th>HTTP header before OutputFilter</th> <th>HTTP header after OutputFilter</th> </tr> <tr> <td> <pre> Date: Wed, 07 Apr 2004 13:08:01 GMT Server: Apache/1.3.29 Vary: Accept-Encoding,User-agent Set-Cookie: UID=0815; domain=server.com; path=/ Connection: close Content-Type: text/html; charset=iso-8859-1 </pre> </td> <td> <pre> Date: Wed, 07 Apr 2004 13:08:01 GMT Server: Apache/1.3.29 Vary: Accept-Encoding,User-agent Set-Cookie: UID=0815; domain=revproxy.com; path=/ Connection: close Content-Type: text/html; charset=iso-8859-1 </pre> </td> </tr> <table> <h3>Configuring an HTTP request header filter</h3> <h4>Syntax</h4> <pre> RequestHeaderPattern <header> <pattern> <string> </pre> <table border=1> <tr> <th>Option</th> <th>Description</th> </tr> <tr> <td><tt><header></tt></td> <td>This is the HTTP header that is to be altered. Note: you cannot alter the header field, only its content. Eg. you can alter the domain name of a Set-Cookie header, but not change an - obviously wrong - "SetKookie" to "Set-Cookie".</td> </tr> <tr> <td><tt><pattern></tt></td> <td>A PCRE (perl compatible regular expression) pattern. This pattern is matched against the HTTP body coming from the server. You can use subpatterns here, but you are not able to reference them in the replacement string (not implemented).</td> </tr> <tr> <td><tt><string></tt></td> <td>The string that is inserted as an replacement if a pattern matches.</td> </tr> </table> <h4>Examples</h4> <pre> RequestHeaderPattern Cookie " UID=0815" " UID=007" </pre> </body> </html>