Normally Apache logs all requests in its access log. In certain cases this can distort your page view statistics (if you use a tool like Webalizer or AWStats that creates statistics based on Apache’s access log), for example if you get lots of visits from search engine spiders or from a certain IP address (e.g. your own), or if each of your pages includes another page (e.g. in an iframe) from your web site (that would instantly double your page views which is obviously not correct). This short guide shows how you use Apache’s SetEnvIf directive to prevent Apache from logging such requests.
This document comes without warranty of any kind! I do not issue any guarantee that this will work for you!
1 Using SetEnvIf
The SetEnvIf directive can be used in the following contexts in your Apache configuration: in the global Apache configuration (if the directive should be valid for the whole server), in vhost configurations (if the directive should be valid only for that specific vhost), between <Directory …></Directory> (if the directive should be valid only for a certain directory and its subdirectories), and in .htaccess files (AllowOverride FileInfo must be set).
With SetEnvIf, you can prevent requests from getting logged based on the following criteria (among others – see http://httpd.apache.org/docs/2.0/mod/mod_setenvif.html for more details):
- Remote_Host: the hostname (if available) of the client making the request.
- Remote_Addr: the IP address of the client making the request.
- Server_Addr: the IP address of the server on which the request was received (only with versions later than 2.0.43).
- Request_Method: the name of the method being used (GET, POST, etc.).
- Request_Protocol: the name and version of the protocol with which the request was made (e.g., “HTTP/0.9”, “HTTP/1.1”, etc.).
- Request_URI: the resource requested on the HTTP request line – generally the portion of the URL following the scheme and host portion without the query string.
The SetEnvIf directive has the following form:
SetEnvIf attribute regex env-variable
where attribute is one of the criteria I’ve just mentioned, and regex is a Perl compatible regular expression.
Now let’s assume that Monit is requesting the file /monit/token once a minute to check if Apache is still running. Obviously we don’t want to log these requests because they are not from a real user. Therefore we use the following SetEnvIf directive:
SetEnvIf Request_URI “^/monit/token$” dontlog
^ means that the Request_URI must begin with /monit/token, $ means that it must also end with /monit/token (so only /monit/token matches this regular expression). If we used “^/monit/token”, any URL beginning with /monit/token would match the regular expression, e.g. /monit/token/example.html; “/monit/token$” would match any URL ending in /monit/token, e.g. /example/monit/token.
Now we have an iframe in /iframe/iframe.html that we don’t want to log either. This is what we’d use:
SetEnvIf Request_URI “^/iframe/iframe.html$” dontlog
Now we must tell Apache that it must not log all requests labelled with dontlog. Find the CustomLog directive in your Apache configuration, e.g.
CustomLog /var/log/apache2/access.log combined
CustomLog “|/usr/bin/cronolog –symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d” combined
and add env=!dontlog to the line:
CustomLog /var/log/apache2/access.log combined env=!dontlog
CustomLog “|/usr/bin/cronolog –symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d” combined env=!dontlog
Restart Apache afterwards. Now it won’t log any request anymore that is labelled with dontlog.
Here are some further examples that I’ve found on these pages:
To prevent all requests made with a certain browser, e.g. Internet Explorer, from getting logged, you could use:
SetEnvIf User_Agent “(MSIE)” dontlog
To not log requests from any client whose hostname ends in bla.example.com, use:
SetEnvIf Remote_Host “bla.example.com$” dontlog
To not log requests from any client whose hostname begins with example, use:
SetEnvIf Remote_Host “^example” dontlog
To not log requests from a certain IP address, use something like:
SetEnvIf Remote_Addr “192\.168\.0\.154” dontlog
If you don’t want requests of your robots.txt to get logged, use:
SetEnvIf Request_URI “^/robots\.txt$” dontlog
Apart from SetEnvIf, which is case-sensitive, you can use SetEnvIfNoCase which is case-insensitive.
For example, in order not to log certain search engine spiders, you could use:
SetEnvIFNoCase User-Agent “Slurp/cat” dontlog
SetEnvIFNoCase User-Agent “Ask Jeeves/Teoma” dontlog
SetEnvIFNoCase User-Agent “Googlebot” dontlog
SetEnvIFNoCase Remote_Host “fastsearch.net$” dontlog
Or to not log certain file extensions, use something like this:
SetEnvIfNoCase Request_URI “\.(gif)|(jpg)|(png)|(css)|(js)|(ico)|(eot)$” dontlog
To not log certain referrals (e.g. from your own domain), use something like:
SetEnvIfNoCase Referer “www\.mydomain\.com” dontlog
- Apache Module mod_setenvif: http://httpd.apache.org/docs/2.0/mod/mod_setenvif.html