Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > c2c3e03ea94eddf2ad49757dbb3c7849 > files > 3

logfinder-0.1-4mdv2010.0.noarch.rpm

This document is intended for system administrators who are interested
in limiting the logging data they retain.

As EFF noted in our Online Service Provider Best Data Practices paper,
it is essential for an online service provider (OSP) to formulate a
data retention policy appropriate to its needs.  In many cases,
operational requirements will not call for the retention of data beyond
a few days or weeks; in other cases, there is no need to log certain
information at all.  For example, many web publishers could get all the
statistical information they need without logging visitors' IP addresses.
Since having more information about users than necessary can be a
liability rather than an advantage, we recommend that publishers in this
situation configure their web servers not to log IP addresses at all.

However, many system administrators don't know exactly what logs they
have until they have looked into the question.  Often, logging was
enabled by defaults -- or by previous system administrators -- and so
your systems may be keeping logs you never intended.

A significant portion of popular server software defaults to a policy of
logging all events or transactions and retaining those logs indefinitely.
Few organizations have an operational requirement for this sort of
logging, and logging defaults are unlikely to coincide exactly with a
carefully-considered logging policy for your organization.  Therefore,
to enforce your logging and data retention policy after you have
formulated it, you will probably need to use technical means to find
and delete logs, as well as changing software configurations or setting
up log-rotation scripts.

Some operating systems come with preinstalled log-rotation software.
However, the log-rotation software provided by an operating system
vendor is normally -- at best -- able to recognize and rotate logs
created by vendor-provided software.  If you have installed third-party
application software, or software you have written or compiled yourself,
it may keep logs completely outside the notice of log rotators.

Here are some of the questions system administrators can ask themselves
to ensure that their data retention policies are followed as
faithfully as possible.


Does my operating system have a log rotation utility such as logrotate?

Is the log rotation utility enabled and functioning?  Does it run
automatically at predetermined intervals?

Does the configuration of the log rotation software match my logging and
data retention policy?

Do I have any third-party application software or user-developed software
that keeps logs?  If so, is the log rotation software aware of them?

Are there any logs that might exist in an unexpected place, such as a
user's home directory?  (For example, Unix sites that use procmail for
e-mail delivery often have ~/.procmail/log files on a per-user basis,
in parallel to and often redundant with systemwide e-mail log files.
Similarly, a site with multiple virtually-hosted web sites may have
separate site-by-site web transaction logging -- or logs from user-created
CGI scripts -- within individual user home directories.  These logs can
be difficult to observe with a utility such as lsof, because they are
usually not held open by the software that creates them, and may be
updated relatively infrequently.  Therefore, merely looking for open
files or recently updated files may not unearth these sorts of logs.)

Do I have application software that logs into a relational database
table, such as an Oracle or MySQL database?  (For extremely large logs,
or logs that are intended to be routinely machine-readable, logging
into a database is more likely than logging into a text file.)  If so,
are the records in the table allowed to persist forever, or are they
periodically purged?

Do I have applications that are configured to log over a network to a
remote machine, using a facility such as syslog's loghost feature?
(This is especially common in clusters and in centrally-administered
networks.)  If so, what is that machine doing with the log data it
receives over the network?

Do I have logs in binary formats (such as Unix wtmp/utmp or the Windows
registry) that might be difficult to recognize as logs on sight?

If my data retention policy calls for secure deletion of log files, is
my log rotation software or other software that implements the policy
using an appropriate secure deletion utility?  (Files that are deleted
but not overwritten might be undeletable in whole or in part.  Some
experts have also recommended means of multiply-overwriting files to
reduce the chance that usable information might remain on magnetic media
even after a single overwriting.)



We have created a program called logfinder as a sample means of locating
files that might be logs on an existing system.  logfinder uses regular
expressions to find local files with "log-like" contents; you can
customize those expressions if necessary to meet your needs.

logfinder requires Python 2 or greater and finds logs in text files on
a POSIX-like system.  (It might also find some log-like data in binary
files if the binary files represent that data in textual form.)

logfinder can, if the lsof program is installed and when run with
appropriate privileges, detect open files systemwide that grow larger
over time.  It can also search for text that may indicate logging
activity within a given directory hierarchy, or systemwide.  As we
suggest above, a program like logfinder can find some, but not all,
kinds of logging activity.  For example, logfinder will generally
not identify logs in binary (non-text) formats or logs kept inside
databases.  Therefore, using a program like logfinder is usually a
supplement to, not a replacement for, answering questions like those
given above.

logfinder should be run as root.  If logfinder is invoked without any
arguments, it will examine open files systemwide to see whether they
grow larger, and then indicate whether files that appear to be growing
contain log-like text.  (This requires lsof to be installed, and lsof's
ability to report open files accurately may depend on your operating
system.  So far, we've had success with Linux and MacOS X, and some
difficulty with FreeBSD and OpenBSD.)

If logfinder is given one or more directory names as arguments, it will
search for log-like text in files in those directories.


For additional information for on-line service providers about their
legal rights and obligations, and about formulating a data-retention
policy, please consult EFF's OSP site at

http://www.eff.org/osp/

As a general resource on logging and data retention, we highly recommend
the Log Analysis web site at

http://www.loganalysis.org/

Among the useful resources collected there are pages on logfile rotation
(including scheduled log deletion and trimming)

http://www.loganalysis.org/sections/rotation-tools/index.html

and a set of general log analysis tools

http://www.loganalysis.org/sections/parsing/generic-log-parsers/index.html

While many of these tools are useful principally for retaining or analyzing
data, rather than for discarding it, understanding logs and knowing what
you have and what can be done with it can help any system administrator
in formulating and implementing logging policies.

EFF thanks Ben Laurie for helping us think about log recognition and
writing a prototype log-searching program.

We welcome your comments or enhancements; you can send them to
Seth Schoen <schoen@eff.org>.