SPECIAL THANKS TO John Seifarth FOR HIS SUPPORT. What is dspam-script ? ----------------------- This package holds the 'dspam-learn' bash script that will help greatly using dspam learning by looking at mailbox directories rather than a forwarding method. This is much as you could have set it with Spamassassin and the sa-learn. I had terrible experiences installing dspam, as I wanted to set it a way that doesn't seem to be thought for : I wanted to use it only through procmail recipes. This appear to me much more logical and less messy than the default installation. Actually, dspam can mark mail through recipes in procmail pretty much as spamassassin does : by adding a special line in the header of the mail where it puts its conclusion on the mail : is it spam or not ?. But dspam learning part involves (in dspam official documentation) forwarding mails to other mail boxes (which create lots of mail-boxes which I didn't want to... feeling this quite messy). What I've had understood of spam filtering and learning was really clear using SpamAssassin, but dspam managed to break all the simple concepts, and introduce us to new complicated ones (as the quarantine box, or the spam/false-positive distinction more tricky than the spam/non-spam distinction) where it seems there is no need to such concepts. Do I need dspam-scripts ? ------------------------- Only on very special cases : Required : - You want to use dspam for spam detection. - You are using procmail as MUA. - You have mail "directories" where mail are supposedly classified. - You have/use (or you are willing to create) mail "dir(s)" as : - spam boxes : which purpose is to contain SPAM - ham boxes : which contain non SPAM mail - You want to trigger dspam learning depending on where mails are classified in your mail directory structure. Letting the user teaching dspam by moving mail along the directories. NOTE : THIS MEAN THE USER MUST HAVE AN IMAP ACCESS TO HIS MAIL ACCOUNT. - It worked on courier IMAP / procmail / spamassassin / dspam combination Not required but possible : - You prefer using dspam invocation in procmail recipes. - You have spamassassin, or other automatic spam detection that move mail around in your IMAP dirs and want dspam to learn from these other method. - You would like that learning phase do not alter (delete or move) mail... - Or on contrary, you would like that after learning phase, mail are deleted. What does dspam-scripts ? ------------------------- dspam-scripts holds the dspam-learn script, and that's all for now. :) This dspam-learn scripts wraps the dspam binary to automagically learn what is spam and what isn't spam from where you actually classified your mail in your mail dirs. It will ensure that you won't feed dspam two times with the same SPAM by storing MD5 of each spam already fed to dspam. It'll call dspam with the correct arguments whether the mail was previously marked by dspam correctly or not. This means that if dspam do not catch all your SPAM, you'll only have to move the missed spam in a "SPAM" directory in your IMAP structure... Inversely, if dspam marked wrongly an email as spam, you'll have to move the mail in the proper directory (that contains no spam) to feed the mail as "false-positive". You could also do a "copy" of the mail you want to feed dspam in 2 IMAP dirs : one for SPAM and one for false-positives. But this seems more confusing for me. How does it work ? ------------------ It simply parses all your mail in the directory specified in the configuration file. When it finds a mail, it checks that it wasn't already fed to dspam by looking for its md5 in its list. It checks also that dspam hasn't already marked the mail as spam if it must be taught as being spam or "Innocent" if it must be taught as Innocent. If there's no mark, it'll send it as a "corpus" mail. If it is marked, it'll send it as classification error with the "--spam" or "--false-positive" arguments. You can safely launch several time dspam-learn. The MD5 list ensure that you won't teach the same mail two times. How do I use it ? ----------------- You must install it correctly (this involves setting up a correct config file), see the installation section that follows this one. Then you'll have to launch it : # dspam-learn That's all. It'll use the configuration file to fetch its informations. This will help if you want to use it as a cron job. Calling dspam-learn will feed dspam with all message that weren't already fed and thus upgrade dspam experiences with your mail found where you told it to look in the configuration file to find ham(non-spam) and spam. You can notice that it uses heavily pretty ASCII colors, that are not pretty at all actually in mail output (as cron could send to you). You can deactivate ascii colors by setting the environnement variable 'ascii_color' to "no" by doing for example : # export ascii_color=no # dspam-learn Or shorter : # ascii_color=no dspam-learn That can fit neatly in your cron job. How do I install it ? --------------------- This is a GNU packages, so a simple : # ./configure && make && make install should do the trick. It'll install a single dspam-learn script. Next, you should take a look in the source package at src/sample/dspam-learn.rc which is a good commented template for creating a correct configuration file. The configuration file is supposed to be found in "~/.dspam/dspam-learn.rc". Note : this could be tweaked depending on your configuration. You might even be able to do a single general "dspam-learn.rc" somewhere else. Just look at the corresponding section. So you could : $ cp src/sample/dspam-learn.rc ~/.dspam/dspam-learn.rc Note : this command assume that your current working directory is the package source directory. And that you are logged in as the destination user that will use dspam. and edit ~/.dspam/dspam-learn.rc When finished you can launch the first dspam-learn by launching $ dspam-learn If you have a lots of mail to be taught to dspam, this could take time. What procmail rules should I use ? ---------------------------------- With this config, you should make all your mail pass through dspam without interfering with the delivery : dspam should then be called in top of your procmailrc : :0fw: dspam.lock * < 256000 | dspam --user username --stdout You should read attentively the configuration of dspam and the --deliver-spam and --deliver-fp at runtime. They might be of use as you wich that innocent mail AND spam must be delivered. When you feel that dspam as a good experience (by looking to headers and looking if it marks correctly Innocent and Spam well or by launching dspam-learn and looking at output) you can add a rule to delete spam or as this example, to move spam detected by dspam to a special dir : :0 H: * $ ^(X-DSPAM-Result: Spam) ${MAILDIR}/.SPAM.dspam/ This is for maildir format (mails are in separate files in a folder). Or :0 H: * $ ^(X-DSPAM-Result: Spam) ${MAILDIR}/spam This is for mbox format (mails are concatenated in same file). Note: (MAILDIR var must have been defined before to use these rule). Can I use SpamAssassin AND Dspam ? ---------------------------------- Yes of course. In fact, this seems a good way to teach dspam in the beginning, and SpamAssassin uses a totaly different way of spam detection (in exception of the bayesian system). I use SpamAssassin to automatically move mail rated with more than 8.0 points to my spam dir. And when my cron job launches the dspam-learn, these mail are checked and learned if dspam didn't spot them as spam. I think this is a great combination. I actually have less than one spam a day managing to get thru the two filters, this out of 100-200 spam a day. Spam that goes thrue are very special : usually viruses (labeled "your file"), or empty mails. I HAVN'T THOUGHT OF SPAMASSASSIN AND DSPAM MISLEADING THEMSELF WITH THEIR MARKS. WHAT CAN BE TAKEN FOR SURE, IS THAT THESE SYSTEM ARE REALLY WORKING WELL ON MY CURRENT SYSTEM, AND COULD POSSIBLY BE EXPLAINED BY THE "LEARNING" ALGORITHMS OF DSPAM AND THIS COULD EVEN PRODUCE BETTER RESULTS BY JOINING THE QUALITIES OF EACH FILTERS. How must I set up dspam / procmail for a proper installation ? -------------------------------------------------------------- Go for : http://splodge.fluff.org/docs/dspam-for-sa-users Which speaks of dspam/procmail integration for non-IMAP integration, but a great part of info found there applies also in IMAP config. The ascii-colors display annoys me ! can I remove it ? ------------------------------------------------------ Yep, this is new in the version 0.0.2 . You can just set the shell variable 'ascii_color' to 'no'. So this could be a correct call : ascii_color=no dspam-learn This is highly recommended if the output must be mailed, as it could be in when called by a cron job. Or if you want a clean log by forwarding the output to a logfile. I have question, can I mail you ? --------------------------------- Of course, i'll try to reply quickly. Here's my email : <vaab@free.fr> I found a bug, or to modify the script... what should I do ? ------------------------------------------------------------ Contact me at <vaab@free.fr>. I have installed vlfs-shlib, is dspam-learn using them ? -------------------------------------------------------- Yes, these are included by default statically, but if you have installed vlfs-shlib, you could do : # shlib d dspam-learn this will greatly reduce the size of the script and its readability. What is this vlfs-shlib all about ? ----------------------------------- These are shell libraries i'm using quite often. Look for the package "vlfs-shlib", there's some info. YOU DO NOT NEED vlfs-shlib TO USE/INSTALL dspam-scripts. The libraries used by dspam-learn are included by default in the shell script. How do I modify the default location of the config script ? ----------------------------------------------------------- you can easily change at run-time the default location of the script by specifying your path to configuration file in the environnement variable DSPAMLEARN_RC. You could set for example : # export DSPAMLEARN_RC="/etc/mail/dspam-learn.rc" # dspam-learn This could offer the possibility to use on general config file. or shorter : # DSPAMLEARN_RC="/etc/mail/dspam-learn.rc" dspam-learn You can also modify the defaults in the bash script. (I'll think of a configure time option in next releases). And at last, you could specify several location separated by spaces in the DSPAMLEARN_RC, the first file found will be used. So you could : # export DSPAMLEARN_RC="~/.dspam/dspam-learn.rc /etc/mail/dspam-learn.rc" # dspam-learn or shorter : # DSPAMLEARN_RC="~/.dspam/dspam-learn.rc /etc/mail/dspam-learn.rc" dspam-learn Note : All the configuration file are read if present. They are read in the order they are listed in DSPAMLEARN_RC. For multiple option definitions, only the first definition will work. (This rules do not work for 'hambox' and 'spambox' options which will concatenate all values found). Hint : You could set DSPAMLEARN_RC with global file and a local file. The global will have the defaults. And the local leave the user free to redefine locally some variables. In this case you'll have to set DSPAMLEARN_RC with first the local file (ie : ~/.dspam-learn.rc) and last the server wide config file (ie : /etc/mail/dspam-learn.rc). This would do : # DSPAMLEARN_RC="~/.dspam-learn.rc /etc/mail/dspam-learn.rc" dspam-learn