Spamassassin Custom Rules

WARNINGS #

WARNING 1: Writing spam rules involves Perl regular expressions. A mistake will very much break the spam scanning services on your server. After creating a new rule and after your configurations rebuild, you can double check by running the sa-compile command on your server. If there are no errors, then you are ok. 

WARNING 2: If you make a custom spam rule, you must include a description. 

WARNING 3: Only letters and underscores _ are allowed in spam rule names. 

WARNING 4: Mailborder custom spam rules are automatically prepended with MC_ when created via the Mailborder GUI. There is no need to prepend your own rules with LOCAL_ or any other convention.

 

Body rules #

Body rules search the body of the message with a regular expression. If the expression matches anything, the score is added to the grand total spam score. Body rules also include the Subject as the first line of the body content.

Here is an example rule:

body  MC_DEMONSTRATION_RULE   /test/
score  MC_DEMONSTRATION_RULE 0.1
describe  MC_DEMONSTRATION_RULE       This is a simple test rule

This rule does a simple case-sensitive search of the body of the email for the string “test” and adds a 0.1 to the score of the email if it finds it. Now, this rule is pretty simple as rules go. It will match “test” but also “testing” and “attest”. The describe statement contains the text which will be placed into the verbose report, if verbose reports are used. This is the default setting for the body in Spamassassin version 2.5x and upwards.

In regular expressions a \b can be used to indicate where a word-break (anything that isn’t an alphanumeric character or underscore) must exist for a match. Our rule above can be made to not match “testing” or “attest” like so:

body  MC_DEMONSTRATION_RULE   /\btest\b/

The rule can also be made case-insensitive by adding an i to the end, like this:

body  MC_DEMONSTRATION_RULE   /\btest\b/i
score  MC_DEMONSTRATION_RULE 0.1

Now the rule will match any combination of upper or lower case that spells “test” surrounded by word breaks of some form.

 

Resources on Perl Regex Syntax #

Perl Regular expressions are quite flexible and powerful.There are entire books on the different kinds of syntax you can use. At this point, I’ve given you a taste of some of the syntax, but if you’re not familiar with regular expressions, you can read one of the many tutorials on the web regarding them.

Here’s some sites you might want to check out for information on regular expressions:

You could use your linux box and it’s perl documentation:

  • If you have perl-doc installed you can type at a linux shell prompt:
perldoc perlretut
perldoc perlre

Recommended book to learn the Perl programming language:

  • – Learning Perl,  8th Edition – By Biran Foy, Tom Phoenix, Randal Schwartz – Publisher: O’Reilly

 

Header rules #

Now let’s move on to header rules. Header rules let you check a message header for a string. Most commonly these rules check the Subject, From, or To, but they can be written to check any message header, including non-standard ones. Let’s pick up our “test” rule and change it into one that checks the subject line.

header  MC_DEMONSTRATION_SUBJECT      Subject =~ /\btest\b/i
score  MC_DEMONSTRATION_SUBJECT       0.1

In these rules, the first part before the =~ indicates what the name of the header you want to check is, and the rest is a familiar regular expression. The header name itself is always case-insensitive, so the above rule will match a subject: line containing “test” or a SUBJECT: line containing “test”.

Checking the From: line, or any other header, works much the same:

header  MC_DEMONSTRATION_FROM From =~ /test\.com/i
score  MC_DEMONSTRATION_FROM  0.1

Now, that rule is pretty silly, as it doesn’t do much that a blacklist_from can’t. Usually if you’re making a From line rule you’ll be doing so to use more sophisticated rules. However, I wanted to illustrate how to make one, and also point out that some punctuation characters are part of the regex syntax, and if you want to use them literally, you need to put a \ in front of them. Normally in a perl regex the period . is a wildcard, but the backslash \ escapes the period . so it becomes part of the match string. 

There’s also an option to look at all the headers and match if any of them contain the specified regex:

header  MC_DEMONSTRATION_ALL  ALL =~ /test\.com/i
score  MC_DEMONSTRATION_ALL   0.1

Not very commonly used, but this feature can also be used to do a case-sensitive check on a header name (it will look at the whole lines, not just the parts after the colon). Note that if you want to use the ‘^’ character here, you put an m at the end of your line, which will look at the header one line at a time.

header  MC_DEMONSTRATION_WEIRD_FROM  ALL =~ /^FrOM\:/m

 

Notes about rule scores #

A few short words about the behavior of the “score” command.

  1. Rules with a score set to 0 are not evaluated at all.
  2. Rules with no score statement will be scored at 1.0, unless 3 or 4 is true.
  3. Rules starting with a double underscore are evaluated with no score, and are intended for use in meta rules where you don’t want the sub-rules to have a score.
  4. Although intended for the spamassassin development effort, any rule starting with T_ will be treated as a “test” rule and will be run with a score of 0.01. 

Last in this section I’ll leave you with a word about choosing scores. I’d suggest starting off with a very low score that won’t impact messages very much, like 0.1. Watch your rule and make sure it fires when you want and isn’t firing when you don’t want. Then start increasing the score to make it have more effect, but try not to go overboard. You should be very reluctant to have a custom rule with a score over 1.0 unless you’re sure it’s not going to hit _any_ nonspam messages. Also keep in mind that you can write rules to only match on non-spam messages and give them negative scores to try to correct false-positive problems. Strong negative scores should also be treated with a bit of caution, but aren’t quite as serious. Generally false positives can cause problems as valuable mail might get skipped over, but false-negatives are a minor nuisance, so you can be a bit more liberal with negative scores.

 

Advanced Scoring #

Score commands can have 1 or 4 parameters. If there is only one parameter (the norm) then that score is used all the time. Example:

score  MC_DEMONSTRATION_ALL   0.1

With four parameters:

  • The first parameter applies when the Bayesian classifier and network tests are not in use.
  • The second parameter applies when the Bayesian classifier is not in use, but the network tests are.
  • The third parameter applies when the Bayesian classifier is in use, but network tests are not.
  • The fourth parameter applies when the Bayesian classifier and network tests are both in use.

Example:

score  MC_DEMONSTRATION_ALL   0.1 0.3 0.3. 0.1

Note: Mailborder will only allow one score to be specified. 

 

Advanced rule types (meta, uri, rawbody and friends) #

In addition to body and header rules, spamassassin supports several other kinds of rules. In general these are used much less often than the header and body types, but it is still worth having a short introduction to these and what they do.

 

URI rules #

URI rules are very simple, they only match text in the URI’s contained in plain text and HTML sections of mail. This is very handy for searching for links containing spam advertised sites.

For example This rule will look for web links to www.example.com/OrderViagra/

uri  MC_URI_EXAMPLE   /www\.example\.com\/OrderViagra\//
score  MC_URI_EXAMPLE 0.1

 

Rawbody rules #

Rawbody rules allow you to search the body of the email without certain kinds of preprocessing that spamassassin normally does before trying body rules. In particular HTML tags won’t be stripped and line breaks will still be present. This allows you to create rules searching for HTML tags or HTML comments that are signs of spam or nonspam, or particular patterns of line-break.

As an example this rule looks for a HTML comment claiming the message was “created with spamware 1.0”:

rawbody  MC_RAWBODY_EXAMPLE /\<\-\-! created with spamware 1\.0 \-\-\>/
score  MC_RAWBODY_EXAMPLE 0.1

 

Meta rules #

Meta rules are rules that are boolean or arithmetic combinations of other rules. This allows you to do things like create a meta rule which fires off when both a header and a body rule are true at the same time.The following example uses a boolean check and will add a negative score to emails from ne**@ex*****.com containing the body text “Monthly Sales Figures”

header __ MC_FROM_NEWS  From =~ /news\@example\.com/i
body   __ MC_SALES_FIGURES    /\bMonthly Sales Figures\b/
meta  MC_NEWS_SALES_FIGURES  (__ MC_FROM_NEWS && __ MC_SALES_FIGURES)
score   MC_NEWS_SALES_FIGURES -1.0

Note that the two sub rules start with a double underscore, so they are run and treated as having no score.

Also note the slash placed before the @ sign. This is important otherwise perl will try to interpret it as an array.

Meta rules can also be arithmetic, but this feature was absent from the original implementation of meta rules in 2.4x. An arithmetic meta rule can be used to tell if more than a certain number of sub rules matched. For example this meta rule will fire if 2 or more of the strings “test1” “test2” and “test3” are found anywhere in the body:

body __ MC_TEST1      /\btest1\b/
body __ MC_TEST2      /\btest2\b/
body __ MC_TEST3      /\btest3\b/
meta  MC_MULTIPLE_TESTS (( __ MC_TEST1 + __ MC_TEST2 + __ MC_TEST3) > 1)
score  MC_MULTIPLE_TESTS 0.1

The value of the sub rule in an arithmetic meta rule is the true/false (1/0) value for whether or not the rule hit.

If you want to weight your sub rules differently, you can apply weights to them like you would in any standard equation:

meta  MC_MULTIPLE_TESTS (((0.8 * __ MC_TEST1) + (0.5 * __ MC_TEST2)) > 1)

 

Writing better rules #

Ok, so you now know the basics of how to make some simple rules, and where to read up to build more complicated rules. So you’re all ready to get started, right? Well, not quite yet. Let me take a minute to give you some advice to make good, or at least better rules than you might make on your first try.

Rule name suggestions #

You’ll probably want to have an easy way to tell your rules from the distribution set, so what you’ll want to do is use some kind of naming convention.

Note: Mailborder supplied spam rules are prepended with MB_

 

Picking good strings to search for #

As you can see from the example rules above, picking a text pattern to search on to make a basic rule for SpamAssassin is fairly easy. However it’s also easy to create rules which seem good at casual glance but have unintended consequences. Here’s some of my generic advice on rule writing.

First, be as specific as you possibly can. In general single word checks make lousy rules as they don’t take into account the context the word is used in. In most cases, phrases are much better to look for, unless the word has no use in normal text.

Next, think about possible alternate uses for the word or phrase you came up with. Many phrases that might sound like good rules for porn spam also wind up matching messages regarding health issues. Think about health newsletters, order status messages, and legitimate financial newsletters. Is your phrase likely to be in them? What about personal emails that have a little off-color humor (to the extent that’s acceptable to you at least)? Personal love letters? A note from your insurance company or doctor? Don’t forget to think about alternate meanings for words, check in a dictionary. You can also try a web search for your phrase or word.

This was a partial duplication summary of: https://wiki.apache.org/spamassassin/WritingRules by Matt Kettler.