Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > d265f71d7fce441fd72dd0a77d0f8893 > files > 242

apache-ssl-1.3.41_1.59-1mdv2010.0.i586.rpm

<?xml version="1.0" encoding="iso-8859-1" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
 <title>Caching mod_gzip compressed data using proxy servers</title>
 <meta name="author"      content="Michael Schr&ouml;pl" />
 <meta name="description" content="A discussion of the problems of a proxy server attempting a correct handling of caching compressed HTTP content as well as the required cooperation by the HTTP server" />
 <meta name="keywords"    content="browsers, HTTP, encoding, gzip, compression, cache" />
 <style type="text/css">
body{font-family:sans-serif;margin:0px 30px 0px 30px;}
h1{font-size:22px;margin-top:20px;}
h2{font-size:18px;margin-top:14px;}
small{font-size:80%;}
td{vertical-align:top;}
tt{font-weight:bold;}
code,tt{font-family:"Courier New",monospace;}
h1,h2{margin-bottom:1px;}
p,td{margin-top:3px;margin-bottom:3px;}
p,ul,ol,li{font-size:17px;line-height:22px;}
ul,ol,li{margin-top:0px;margin-bottom:0px;}
img{border-width:0;}

#nav{position:absolute;top:30px;left:0px;font-size:14px;width:170px;font-weight:bold;margin:2px 2px 2px 30px;}
#nav[id]{position:fixed;}
#nav img{margin:5px;}
#nav p, #nav a:hover, #nav a{display:block;padding:3px;margin:2px;width:150px;font-size:15px;line-height:18px;}
#content{position:absolute;left:220px;right:30px;}
#mail{text-align:right;}
#icon{width:190px;float:left;}
#mail,#icon{margin-top:30px;}

@media screen {
body{color:#000;background-color:#f8ebd9;}
h1{color:#666;}
h2{color:#840;}
code{color:#333;}
em{color:#900;}
tt{color:#909;}
h1,h2,code,em,tt{background-color:inherit;}
.new13192a{color:#inherit;background-color:#ffd;}
.new13261a{color:#inherit;background-color:#eff;}
.bugfix{color:#fff;background-color:#f00;font-weight:bold;padding:0px 4px;}
#nav a{color:#530;background-color:transparent;}
#nav a{text-decoration:none;}
#nav p, #nav a:hover{color:#000;background-color:#fff;}
#nav p {border:1px #660 solid;}
#nav a {border:1px #666 dotted;}
}

@media print {
#icon,#nav{display:none;}
#content{position:absolute;left:0px;right:0px;}
}
 </style>
</head>

<body>

<div id="nav">

<img src="mod_gzip_logo.gif" height="47" width="102" alt="mod_gzip logo" />


<a title="mod_gzip - what's that, anyway?" href="index.htm">mod_gzip</a>



<a title="Compression of HTTP content using Content-Encoding" href="encoding.htm">Content-Encoding</a>



<a title="Which browsers can handle 'Content-Encoding: gzip'?" href="browser.htm">Browsers</a>



<a title="How do Firewalls handle 'Content-Encoding:'?" href="firewalls.htm">Firewalls</a>



<a title="An example configuration for mod_gzip" href="config.htm">Configuration</a>



<a title="Complete description of mod_gzip status codes" href="status.htm">Status Codes</a>



<a title="Possible enhancements in future versions of mod_gzip" href="enhancements.htm">Enhancements</a>



<p>Caching</p>



<a title="Version history and change log for mod_gzip" href="versions.htm">Versions</a>



<a title="Other ressources about mod_gzip" href="links.htm">Links</a>


</div>

<div id="content">

<h1>Caching mod_gzip compressed data using proxy servers</h1>

<h2><a name="negotiation"></a>Compression via negotiation</h2>
<p>Using a configurable compression function like <tt>mod_gzip</tt> ultimately must always be some kind of content negotiation, i.&nbsp;e. serving different content conditionally for the same requested URL, depending on specific information inside the HTTP headers.</p>
<p>On the other hand, HTTP allows the <strong>temporary storage</strong> of responses to HTTP requests in caches, especially when using proxy servers. If now</p>
<ol>
 <li>a HTTP client sends a request,</li>
 <li>the corresponding response is served in compressed form and stored by some proxy and</li>
 <li>subsequently another HTTP client submits a request for the URL in question,</li>
</ol>
<p>then the proxy server - not in possession of further information - has a problem:</p>
<ul>
 <li>Is it entitled to serve the cached content to this second HTTP client as well, or</li>
 <li>must it forward the request to the HTTP server?</li>
</ul>
<p>For only the HTTP server can ultimately find out <small>(based upon its configuration containing the corresponding filter rules)</small> whether the second HTTP client may receive compressed response data as well.</p>
<p>By the way this is <em>not</em> an effect of using a compression procedure alone but a general problem about caching HTTP data whose content cannot be specified unambiguously by an URL inside a proxy server's cache or similar memory equipped servers within the transport route. This includes negotiation procedures of any types as well as submitting additional informations in the HTTP headers, like Authentfication data or Cookies.</p>

<h2><a name="performance"></a>Performance requirements</h2>
<p>Of course one can try to avoid the problem by explicitly <em>denying</em> to cache the corresponding response's data <small>(by using the corresponding HTTP headers <code>Expires:</code> and <code>Pragma:</code> in HTTP/1.0 and <code>Cache-Control:</code> in HTTP/1.1)</small> to all proxy servers existing on the way between client and server .</p>
<p>But the goal of compression is to speed up the data transfer <small>(by reducing the data volume)</small> - and caching data serves the same goal <small>(by reducing accesses to the HTTP server)</small>. And one performance optimization should not lead to another one being no longer usable especially as these two don't replace each other but can effectively <strong>complement each other</strong> in the case in question.</p>

<h2><a name="parameters"></a>Information about negotiation parameters</h2>
<p>The HTTP specification contains the definition of the <strong><code>Vary</code> HTTP header</strong> where the HTTP server can inform the proxy server about</p>
<ul>
 <li>whether the response was the <em>unique</em> result of an URL request or</li>
 <li>whether other attributes of a request for the same URL could lead to <em>different</em> results.</li>
</ul>
<p>Its value may contain a list of <strong>names of other HTTP headers</strong> whose content has been relevant for serving this very response for a request. Thus the HTTP server can even inform the proxy server about <em>which</em> HTTP headers have influenced the decision about the served content.</p>
<p>When a proxy server forwards a request to a HTTP server and wants to store the response inside its cache later then it should still be in possession of the HTTP headers of original request when the HTTP server's response arrives.</p>
<p>Now if the HTTP server marks a conditional content of a response by the corresponding <code>Vary</code> header then</p>
<ul>
 <li>the proxy server must store inside its cache not only this response's content but all the relevant HTTP headers information <small>(whose names were enumerated in the <code>Vary</code> HTTP header's value list of the response)</small> from the request as well, and</li>
 <li>it must not serve this cached content as response to further requests unless the information of the corresponding HTTP headers of such a subsequent request at least 'matches' those of the original request, i. e. is <em>semantically identical to the original request's values for each one of these headers</em>.</li>
</ul>

<h2><a name="restrictions"></a>Resulting restrictions for the HTTP server</h2>
<p>The previous explanations have shown how a proxy server can handle the conditional delivery of HTTP responses <small>(being the result of a Content Negotiation)</small> correctly and with maximum utilization of its caching effect at the same time - assumed that</p>
<ul>
 <li>the HTTP server provides the proxy server with sufficient information about the negotiation parameters and</li>
 <li>the proxy server is in possession of the corresponding information in case of a subsequent request to the same URL by another HTTP client.</li>
</ul>
<p>But the latter one now means a restriction for the degrees of freedom for the negotiation process. For if the proxy server must decide about whether it may serve its cache content or not <em>exclusively</em> based upon information within a HTTP request then the negotiation rules of the HTTP servers must not refer to anything other than HTTP header contents!</p>
<p>But unfortunately this precondition is not fulfilled by <tt>mod_gzip</tt>, as of the six classes of filter rules provided</p>
<ul>
 <li>two 'legal' ones <small>(<code>reqheader</code> and <code>uri</code>)</small> exclusively refer to HTTP header contents but</li>
 <li>four other 'illegal' ones <small>(<code>rspheader</code>, <code>handler</code>, <code>file</code> and <code>mime</code>)</small> refer to information that will be <em>available only during evaluation of the request by the HTTP server</em>.</li>
</ul>
<p>So if a <tt>mod_gzip</tt> enhanced server uses one of these 'illegal' filter rules then the proxy server <em>cannot</em> any longer be able to correctly decide about the applicability of its cache content for responding to further requests.</p>
<p>In doing so it doesn't help the proxy server a lot either if <tt>mod_gzip</tt> would notify the proxy server about being evidently overtaxed <small>(by supplying a complete list of the filter rule classes significant for this request within some <code>Vary:</code> header if <em>that</em> would even be legal)</small>. All the proxy server could do is using the occurrence of one of these four 'illegal' filter rule classes as criterion for not caching the response's content.</p>
<p>This alone would not be that bad - as long as the HTTP server limits itself to use nothing but 'legal' rules it would be able to cooperate optimally with a proxy server.</p>
<p>But unfortunately doing so is impossible with <code>mod_gzip 1.3.19.1a</code>.</p>
<p>The embedding of <code>mod_gzip 1.3.19.1a</code> into the Apache 1.3 architecture is done in a relatively complex way:</p>
<ul>
 <li>In processing phase 1 processing <tt>mod_gzip</tt> checks whether it should be interested at all in handling this request's results and prepare for it - based upon the rules of four classes <small>(<code>reqheader</code>, <code>uri</code>, <code>file</code> und <code>handler</code>, i.&nbsp;e. two 'legal' und two 'illegal' rule classes)</small></li>
 <li>In processing phase 2 <tt>mod_gzip</tt> checks whether it now should actually compress the <small>(now available)</small> response content - based upon the rules of two classes <small>(<code>rspheader</code> und <code>mime</code>, both 'illegal' rule classes)</small>.</li>
</ul>
<p>For the successful permission of a request for compression at least the fulfillment of one <code>include</code> rule from <em>either of both phases</em> is required <small>(and the non-fulfillment of all <code>exclude</code> rules)</small>.</p>
<p>But as <em>both</em> <code>include</code> rule classes from phase 2 are 'illegal' each list of relevant filter rule classes for a successful compression in the current <tt>mod_gzip</tt> implementation <em>must</em> at least cover one 'illegal' rule class.</p>
<p>Thus it is impossible to provide a proxy server with information it can use for deciding about the applicability of some cache content - the submitted information will <em>always</em> overdo the comprehension of the proxy server.</p>

<div class="new13192a">

<h2><a name="vary-1.3.19.2a"></a><code>Vary</code> headers in <tt>mod_gzip</tt> 1.3.19.2a and up</h2>
<p>Starting with version 1.3.19.2a, <tt>mod_gzip</tt> is sending <code>Vary:</code> headers - for <em>each and every</em> request where the module has been involved at least once <small>(regardless whether compressed data have been served or not)</small>.</p>
<p>At this state of research for <tt>mod_gzip</tt> <em>each</em> request <small>(regardless whether or not the response has actually been served in compressed form)</small> is potentially a negotiation:</p>
<ul>
 <li>at least about the <code>Accept-Encoding</code> HTTP header, and</li>
 <li>possibly about other HTTP headers as well <small>(namely all those that occur within filter rules of the <code>reqheader</code> class)</small></li>
</ul>
<p>As of now, <tt>mod_gzip</tt> is <em>not yet</em> able to generate the best possible, i.&nbsp;e. the minimum set of <code>Vary:</code> headers required - for this it would be necessary to rewrite the rule evaluation procedure of <tt>mod_gzip</tt> completely.</p>
<p>As a first step the module since <a href="version.htm#v1.3.19.2a">Version 1.3.19.2a</a> sends a <code>Vary:</code> header that contains</p>
<ul>
 <li>the value <code>Accept-Encoding</code> as well as</li>
 <li>the names of <em>every</em> header being used within any <code>reqheader</code> rules,</li>
</ul>
<p>because each one of these rules might make the difference for the result of the negotiation, and in each of these cases the result would depend on the values of the received HTTP headers. In certain cases this may be way too much <small>(and then massively hinder the efficient caching of content)</small>, but at least is it something to begin with.</p>

<div class="new13261a">

<p><a name="vary-1.3.26.1a"></a>As an improvement to this strategy, <a href="versions.htm#v1.3.26.1a"><tt>mod_gzip</tt> 1.3.26.1a</a> is sending <em>no</em> <code>Vary:</code> header if the compression of the request in question has been declined because of a <a href="config.htm#filters"><code>mod_gzip_item_exclude</code> rule</a> of the</p>
<ul>
 <li><code>file</code>,</li>
 <li><code>uri</code> rsp.</li>
 <li><code>handler</code></li>
</ul>
<p>type - as the evaluation of this rule cannot have been dependent on the received HTTP headers, and therefore in these cases actually no negotiation <small>(about dimensions that might contain different values for different HTTP requests)</small> has taken place at all.</p>
<p>If you want to have no <code>Vary:</code> headers being sent for files that you are sure to never be served in compressed form because of <em>other</em> configuration rules, you would have to <em>turn off</em> <tt>mod_gzip</tt> being these.</p>

</div>

<p>An example for not sending <code>Vary:</code> headers for GIF images that might be cache by some proxy like Squid 2.4 might look like this:</p>
<pre>&lt;FilesMatch \.gif$&gt;
 mod_gzip_on No
&lt;FilesMatch&gt;</pre>
<p>For versions to come the following tasks remain open:</p>
<ul>
 <li>Recognizing <em>in all possible cases</em> that the reaction to the current request can <em>never</em> cause compressed data to be served because some <code>mod_gzip_item_exclude</code> rule independent from the request's attributes is firing.</li>
 <li>Recognizing that some negotiation has taken place that <em>cannot</em> be described by a list of HTTP header names - in this case <code>Vary: *</code> ought to be sent <small>(and the documentation for <tt>mod_gzip</tt> should explicitly point out that these directives be used only if absolutely required as using them will have a negative effect on the work of caching proxies)</small>.</li>
 <li>Doublechecking whether constellations are possible where only some subset of header names from all <code>reqheader</code> rules are required in a <code>Vary:</code> header - the fewer names there, the fewer variants have to be stored in the proxy cache in parallel.</li>
</ul>

</div>

<h2><a name="vary-wildcard"></a>Negotiation about other dimensions than HTTP headers</h2>
<p>In very special cases, i.&nbsp;e. when using certain configurations directives, some negotiation is done by <tt>mod_gzip</tt> about dimensions that cannot even be expressed in terms of HTTP header names. This applies to the directives</p>
<ul>
 <li><a href="config.htm#requirements"><code>mod_gzip_min_http</code></a> <small>(minimum HTTP version required)</small> as well as</li>
 <li><a href="config.htm#requirements"><code>mod_gzip_handle_methods</code></a> <small>(HTTP methods to be handled)</small></li>
</ul>
<p>In both cases <tt>mod_gzip</tt> cannot explain to a proxy what has been done by telling the names of HTTP headers. The appropriate reaction according to the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44"><img class="linkicon" height="15" width="16" alt="arrow" title="external" src="extern.gif" />HTTP/1.1 specification</a> is sending the <code>Vary: *</code> HTTP header.</p>

<div class="new13261a">

<p><a href="versions.htm#v1.3.26.1a"><tt>mod_gzip</tt> 1.3.26.1a</a> is sending a <code>Vary: *</code> header if the <code>mod_gzip_min_http</code> directive has been used.</p>
<p>As for the <code>mod_gzip_handle_methods</code> directive, it currently seems to be not yet absolutely clear whether two HTTP requests for the same URI but using different HTTP-methods actually ask for the same HTTP entity - this will decide whether a <code>Vary: *</code> header will have to be sent when using this directive as well, and be an issue to be solved in forthcoming releases.</p>

</div>

<p>But as in this case a proxy server cannot understand the type of negotiation performed it isn't entitled to store responses bearing this mark inside some cache.</p>
<p>Thus <strong>using one of these directives completely disables the proxy caching of each and every response being send by this HTTP server</strong>, whether in compressed or in uncompressed form. Therefore we advise you not to use one of these directives any more.</p>

<h2><a name="useragent"></a>The UserAgent as special case</h2>
<p>Storing variants of different negotiation parameters in parallel in a proxy cache may be reasonable if only a few possible values may actually occur - such like in the case of <code>Content-Encoding</code>. If there are a large number of possible values then a parallel storing of variants is no longer feasible.</p>
<p>Exactly this does apply to the <code>UserAgent</code> name as identification of the HTTP client. Each sub-version of a browser is sending a complex UserAgent string that contains not only name and version of the browsers but further information <small>(national language, operating system name and version etc.)</small>. There are hundreds of known UserAgent strings - and beyond this a number of mechanisms to manipulate this UserAgent string. Some browsers <small>(like Opera)</small> even allow the user to explicitly select the content of this UserAgent strings as to pose as a different browser <small>(because many technically incompetent web page creators build their site based on the name of a browser and unnecessarily exclude some browsers from it, or just because their user doesn't want to unnecessarily show details about the computer equipment they use, for the sake of keeping his/her privacy)</small>.</p>
<p>How reasonable serving compressed web pages conditionally on the identity of a HTTP client may ever be in some cases <small>(like in respect to the numerous bugs of Netscape 4)</small> still the downside of using the UserAgent strings as base of a HTTP negotiation will be that the content of this HTTP header on one hand is too varying to draw reliable conclusions out of it and on the other hand contains too many different values for any caching proxy to ever be able to keep in parallel the results of requests for the same URL for all these negotiation variants.</p>

<div class="new13261a">

<p>From version 1.3.19.2a on <tt>mod_gzip</tt> is sending a <code>Vary:</code> header describing the HTTP header <code>User-Agent:</code> as parameter of the negotiation, if a corresponding directive has been used in the configuration. But the probability for a successive request to contain an <em>exactly identical</em> <code>User-Agent:</code> value <small>(so that this client may therefore receive the already stored content)</small> is very low.</p>
<p>Actually, the HTTP server <em>would</em> treat even large sets of UserAgents <small>(that are assumed to be functionally equivalent due to its configuration)</small> identically during negotiation - but the <code>Vary:</code> header doesn't allow the HTTP server to tell the caching proxy which parts of the UserAgent strings were evaluated by the HTTP server as significant content during negotiation. The proxy server can only get to know that the UserAgent <em>has</em> played <em>some</em> role - and being aware of this, the proxy <em>must</em> treat individual UserAgents as being <em>different</em> even if the HTTP server would not act like this.</p>
<p>So <strong>using filter rules evaluating the <code>UserAgent</code> HTTP header will lead to totally disabling any caching for response packets created this way</strong>. The user of <tt>mod_gzip</tt> should be absolutely aware of this effect - and therefore use other filter methods <small>(having a smaller number of possible different values)</small> if at all possible, to provide the same type of differentiation between these HTTP Clients.</p>

</div>

<div id="icon">
 <a href="http://validator.w3.org/check/referer"><img alt="" title="valid XHTML 1.1" height="31" width="88" src="valid-xhtml11.png" /></a><a href="http://jigsaw.w3.org/css-validator/check/referer"><img alt="" title="valid CSS" height="31" width="88" src="valid-css.png" /></a>
</div>


<p id="mail">(<a href="mailto:michael.schroepl&#x40;gmx.de?subject=mod_gzip">Michael Schr&ouml;pl</a>, 2002-09-30)</p>

</div>

</body>
</html>