The Daily Ping

Ain't no party like a Ping party!

June 7th, 2003

POPFile

I’ve only been using it for a day, but I’ve got to say, I’m already addicted to POPFile.

POPFile is an open-source “mail classification system,” a proxy primarily used to help classify incoming mail as spam or non-spam (though you can use it for more complex applications as well) using Bayesian analysis before it’s sent to your mail reader. I had read about POPFile months ago, but avoided trying it because their web site is organized in a kind of funky way that initially turned me off.

Here’s the simplest way I can explain it: Popfile acts as a proxy mail server on your local machine, so you set your mail reader to get mail through POPFile rather than directly from your ISP’s POP3 server. Some anti-virus and firewall programs work in a similar fashion. When you download your mail, POPFile analyses each piece (very quickly, I might add) and classifies it into different “buckets.” My buckets are simple: ok, spam, and virus. The mail is then passed onto your mail client as usual. You can then filter the mail based on extra headers added by POPFile and automatically delete spam, or just send it to a different folder. The coolest thing, though, is that as you use POPFile more, it learns.

POPFile keeps track of your e-mail and through a web interface, you can “teach” it what is spam and what isn’t. Initially, POPFile’s percentage sucks, but as hundreds of e-mails pass through and you help it learn what is spam and what isn’t, it gets better and better, reportedly getting up to 98-100% efficiency in a relatively short amount of time.

As mentioned earlier, it uses a Bayesian technique to analyse and classify mail. Though the nuts-and-bolts of Bayesian filtering are way beyond me, I think I can explain it in simple terms. Basically, when you classify an e-mail as spam or not-spam, the program looks at the words that are in the e-mail and keeps track of how many times certain words appear. So, for instance, it may eventually learn that “viagra” appears in almost all spam, but if it has also learned that anytime your mom writes to you, she signs her e-mail in a particular way it’s valid. Then when an e-mail from your mom comes in joking about getting viagra spam, Popfile will not simply filter the e-mail out because it mentions viagra, but it will look at the context of the letter, the rest of the words used, and determine that even though it contains “viagra,” it’s probably still a valid e-mail.

Pretty cool, huh?

POPFile also provides plenty of statistics for number geeks and lets you look under the hood as it does its job. It’s a great little program that takes a little bit of time to understand and set-up, but once it’s running you’ll notice no noticable speed difference in fetching of your e-mail and you’ll be able to filter out incoming spam much more easily and effectively. Plus it’s available in a Windows version and a cross-platform perl version, so just about anyone can use it. Good stuff.

Posted in Technology

What is this then?

The Daily Ping is the web's finest compendium of toilet information and Oreo™® research. Too much? Okay, okay, it's a daily opinion column written by two friends. Did we mention we've been doing this for over ten years? Tell me more!

Most Popular Pings