Nethence Newdoc Olddoc Lab Your IP BBDock  


Warning: those guides are mostly obsolete, please have a look at the new documentation.

UnixWindowsOracleObsoleteHardwareDIYMechanicsScriptsConfigs

Configuring Bayesian Mail Filter
 
http://pbraun.nethence.com/unix/mail/procmail.html
http://pbraun.nethence.com/unix/mail/procmail-bmf.html
http://pbraun.nethence.com/unix/mail/procmail-qsf.html
 
Installation
Make sure Berkeley DB is available. On Redhat systems,
rpm -q db4 db4-devel
 
Fetch, compile and install,
wget http://sourceforge.net/projects/bmf/files/latest/download
tar xzf bmf-0.9.4.tar.gz
cd bmf-0.9.4/
./configure --help
./configure --without-mysql
make all
make install
 
Procmail configuration
Assuming you are using procmail (well, sorry but I am -- adapt to your needs, eventually),
cd ~/
vi .procmail
on top of all filter rules, add,
#
# Bayesian Mail Filter
#
:0 fw
| bmf -p
 
:0:
* ^X-Spam-Status: Yes
spam
Note. bmf removes all spam status headers and puts his own.
 
Crontab for learning
Use your IMAP client to put the spam into e.g. the _bmf.learn and _bmf.unloearn mboxes to respectively let BMF learn what is spam and what isn't. Move the spam messages to the former and when a few false-positives show up in the .spam folder at the beginning, move them to the latter.
 
You can then configure this script,
cd ~/
mkdir -p bin/
cd bin/
vi cron.bmf
like,
#!/bin/ksh
# proceeding as much as possible, no set -e
MAILDIR=/var/spool/virtual/example.net/user.imap 
 
learn() {
print learning what is spam...\\c
bmf -s < _bmf.learn && print \ done
 
print reprocessing the _bmf.learn mbox...\\c
reprocess-mbox-via-procmail _bmf.learn && print \ done
}
 
unlearn() {
print unlearning false positives
bmf -n < _bmf.unlearn && print WORKS
#bmf -N < _bmf.unlearn
 
print reprocessing the _bmf.unlearn mbox...\\c
reprocess-mbox-via-procmail _bmf.unlearn && print \ done
}
 
cd $MAILDIR/
test -s _bmf.learn && learn || print ok _bmf.learn is empty
test -s _bmf.unlearn && unlearn || print ok _bmf.unlearn is empty
note. Change the MAILDIR variable accordingly.
enable it,
chmod +x bmf_learn
and run it every night with e.g. that crontab,
SHELL=/bin/ksh
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/USERNAME/bin
HOME=/home/pbraun
MAILTO=root
LANG=en_US.UTF-8
59 2 * * * cron.bmf
Note. Change USERNAME accordingly.
Note. It has to be traditional unix mbox kind of mail storage, not something else otherwise you won't be able to trail bmf with several emails at once.
 
For Maildir folders,
vi reprocess_maildir.learn
like,
#!/bin/ksh
#
# Daily QSF learn & unlearn
#
set -e
 
fbayes() {
[[ -z $1 ]] && print \$1 missing && exit 1
[[ -z $2 ]] && print \$2 missing && exit 1
 
find $1/cur $1/new -type f | while read msg; do
print Processing $msg...\\c
/usr/local/bin/procmail -m $2 < $msg
/usr/local/bin/procmail < $msg
rm -f $msg
print \ Done
done
}
 
cd $HOME/Maildir/
 
print Learning
fbayes .spam_learn $HOME/.procmailrc.learn
print ''
 
print Unlearning
fbayes .spam_unlearn $HOME/.procmailrc.unlearn
print ''
 
.procmailrc.learn being,
SHELL=/bin/ksh
DROPPRIVS=yes
VERBOSE=no
ORGMAIL=$HOME/Maildir/
MAILDIR=$HOME/Maildir
DEFAULT=$ORGMAIL
SYSYEAR=`date +%Y`
LOGFILE=$HOME/.procmailrc.log.$SYSYEAR
 
#
# QSF learn
#
:0
| qsf -m
 
.procmailrc.unlearn being,
SHELL=/bin/ksh
DROPPRIVS=yes
VERBOSE=no
ORGMAIL=$HOME/Maildir/
MAILDIR=$HOME/Maildir
DEFAULT=$ORGMAIL
SYSYEAR=`date +%Y`
LOGFILE=$HOME/.procmailrc.log.$SYSYEAR
 
#
# QSF unlearn
#
:0
| qsf -M
 
References
Bayesian (http://acme.com/mail_filtering/bayesian_frameset.html)
bmf: Bayesian Mail Filter (http://jblevins.org/log/bmf)
bmf training from cron (http://comments.gmane.org/gmane.mail.bmf.user/38)
Filtering spam with bmf, procmail and mutt (http://e.molioner.dk/guides/bmfprocmailmutt)
Flail Spam Mitigation Setup (http://flail.org/spam.html)
 
Note. this benchmark has forgotten BMF !
The Grumpy Editor's guide to bayesian spam filters (http://lwn.net/Articles/172491/)
A grumpy editor's bayesian followup (http://lwn.net/Articles/173910/)
 
Original papers
A Plan for Spam (http://paulgraham.com/spam.html) 08.2002
Better Bayesian Filtering (http://paulgraham.com/better.html) 01.2003
 

(obsolete, see the new doc)