Spammers versus Anti-Spammers

I got an interesting spam today that seems to perhaps be an attempt to reduce the effectiveness of these Bayesian statistics-based spam filters that have become the rage recently. The message contains white text on a white background so it appears to be blank, but the words appear if you run a selection over them all. Once visible, they seem to be all random words. Most of the words are harmless, but some of them are words that are probably common to spam. You lose either way because marking it as spam will add in non-spam type words into the filter database potentially increasing the rate of false positives, but leaving it as non-spam reduces the score of the spam words contained in the message potentially increasing the rate of false negatives.

The ever-popular Spamassassin has a Bayesian component in its arsenal, but uses more traditional content-based filters as well. Apple’s Mail client has a built-in Bayesian Junk Filter, and Bogofilter is a unix-based Bayesian filter that I use myself.