Student information processing boardStudent information processing board
Deluged with spam? In this column, we revisit the topic of spam filtering with SpamAssassin and discuss the new changes made to MIT’s SpamAssassin configuration.
Question: What is SpamAssassin?
Answer: SpamAssassin is a mail filter that allow users to control the junk mail they receive, and has been available to the MIT community since February 2003. It uses a set of rules to give each incoming e-mail message a numerical spam score. Messages with scores greater than a configurable threshold get marked as spam, allowing users to deal with them appropriately.
SpamAssassin tags your e-mail so that you can filter and delete messages that might be spam. While this service is optional and not enabled by default, we recommend that you use it if you get a lot of spam.
Keep in mind, however, that the filter is not perfect, so you should do at least a cursory check of your suspected spam before deleting it.
Question: How do I enable SpamAssassin?
Answer: If you are using an IMAP mail client, such as Evolution, Mozilla, Outlook, or Athena Pine, you can have all messages marked as spam filtered into a separate folder automatically. Simply create a new folder in your INBOX named Spamscreen.
Warning: If you create such a folder, you will not be able to use POP mail clients, such as Eudora, SIPB Pine, or nmh, to view e-mail tagged as spam.
For information on configuring SpamAssassin’s settings, or enabling spam filtering with non-IMAP mail clients, you can refer to our March 14 column at http://www.mit.edu/~asksipb/2003columns/2003-03-14-spamassassin/ and the I/S Spam Screening Web page at http://web.mit.edu/is/help/nospam/.
Question: How do I get zephyr notification of non-spam mail only?
Answer: If you’re using zwgc, the default zephyr client, you can do this by typing:
athena% zctl add mail inbox %me%
Then, to remove this setting:
athena% zctl del mail \* %me%
For more information on zephyr, you can refer to our August 27, 2003 column at http://www.mit.edu/~asksipb/2003columns/2003-08-27-zephyr/.
Question: What’s new in SpamAssassin?
Answer: In December, a new version of SpamAssassin was installed on each of the MIT Post Office servers. The major new feature is word-by-word statistical filtering, also known as Bayesian filtering.
This method analyzes messages considered spam and non-spam (called “ham”), and records what words are found in each. When new mail comes in, it analyzes the words in the message and uses the previously recorded statistics to determine whether the message is spam. This method allows the filters to be constantly updated, and is generally very effective.
Question: How do I alter what the Bayesian filter identifies as spam?
Answer: This process -- called training -- is only possible using an IMAP mail client. If you have created a Spamscreen folder as described above, but receive a piece of spam that is misclassified, copy the mail to your Spamscreen folder. Each night, the filter will be trained using the mail found in that folder.
If you are using a graphical mail client, drag the message into the spam folder with your mouse. If you are using Athena Pine, press S (for Save), and then type Spamscreen.
Conversely, if a legitimate message ends up in the Spamscreen folder, you should train the filter so that it can avoid making the same mistake in the future. To do so, create a Hamscreen folder in your INBOX. Then, copy the legitimate message into the Hamscreen folder, and the filter will be trained with the message that night.
In both cases, after at least one night has passed, you can go back and delete these messages from the Spamscreen and Hamscreen folders as they are no longer needed. Note that all of MIT shares a common Ham and Spam training database, so you will also benefit from other users’ training.
To ask us a question, send e-mail to firstname.lastname@example.org. We'll try to answer you quickly, and we might address your question in our next column. You can also stop by our office in W20-557 or call us at x3-7788 if you need help. Copies of each column and pointers to additional information will be posted on our website: http://www.mit.edu/~asksipb/