Filter Sequence

CastleCops -> Mailwasher - Troubleshooting / General

Author: AlLocation: Australia PostPosted: Tue Oct 28, 2003 12:19 am    Post subject: Filter Sequence

Looking at my statistics, I find that two filters and CFS catch 93% of my SPAM:
My own "CRAP" filter - 62% (2 simple rules)
Invalid Header F/R/H - 16%
CFS - 15%

Does it make sense to rearrange my filters to "match" the "Top 10" sequence?

Author: TimeGhostLocation: USA PostPosted: Tue Oct 28, 2003 4:07 pm    Post subject:

I find rearranging filters to be helpful. This aids in debugging a new filter (putting it at the top to see how effective it is). In "production" I have my "friendly" filter first to minimise false positives. This is followed by a junk filter with a zero false positive rate (so far). Last comes a filter which has triggered falsely.

Yes, I have only three filters.

The first two catch nearly everything they're supposed to.

Speed is another issue. If you know you have filters that run slowly, put them near the bottom so that the faster filters can "shortcircuit" them.

Please note that I'm not running the CFS-equipped beta, so if it implements filter order differently than the production version, my contribution may have little value to you.

Author: IP: 66.44.*.* PostPosted: Tue Oct 28, 2003 6:36 pm    Post subject: Re: Filter Sequence

Al wrote:
Looking at my statistics, I find that two filters and CFS catch 93% of my SPAM:
My own "CRAP" filter - 62% (2 simple rules)
Invalid Header F/R/H - 16%
CFS - 15%

Does it make sense to rearrange my filters to "match" the "Top 10" sequence?


You may want to order your filters by reliability. In other words place the filter that is most reliable with regard to the fewest false positives at the top of the list. It may be the filter that catches the least amount of SPAM but if you know that its possible rate of 'False Positives' is way down, a trap on that one would be the most reliable SPAM flag.

I am running 10 personal filters with no other spam detectors turned on. They all look into the headers only (body filters take too much time) and I am getting about a 99% trap rate with less than 0.5% false positives.

I wish I could get the false positives down but they are almost entirely from my 'Direct to MX' filter. This is also a filter that traps about 95% of the SPAM that I receive so it is a pretty good catchall.

By having that one last in the list, I can let the more reliable filters trap as much as possible before it gets to my catchall. By doing it this way, if the high reliability filters trap anything, I can be sure that it was spam.

Here is one of those HIREL filters, if you are interested. It looks for IP addresses in the "Recieved:from" line in the header that are listed as IANA-Reserved. ANY "Recieved:from" line in a msg header that contains one of these Reserved IP addresses was the result of a header Forgery.

This filter may only trap 20-30% of the SPAM that I receive, but it is HIREL and anything trapped by it automatically qualifies for immediate deletion, without any further consideration.

The filter is as follows:

[enabled],"IANA RESERVED","IANA RESERVED",180,AND,Delete,TakesPrecedence,EntireHeader,containsRE,"(?# list of IP blocks reserved by IANA - must be updated as required )^Received: from [^[]*?\[([1257]|2[37]|3[1679]|4[129]|5[089]|7\d|8[3-9]|9\d|1[01]\d|12[0-6]|17[3-9]|18[0-79]|19[07]|22[3-9]|2[34]\d|25[0-5])(\.[1-2]?\d?\d?){3}\][^;:]*?"

Cut and paste it into your 'filters.txt' file and see if you get any traps.

Author: Skah_TLocation: USA PostPosted: Tue Oct 28, 2003 11:24 pm    Post subject:

I like this filter! Thanks for sharing. One question. What's the purpose of the trailing question mark? Since it comes after an "*" I'm not sure how it's treated. Also, why even bother having the [^;:]*? at the end anyway?

Al & TimeGhost I would be interested to see your filters. I have over 100 myself and would love to cut down that numbers.

Author: AlLocation: Australia PostPosted: Wed Oct 29, 2003 2:35 am    Post subject:

Skah

The one filter that catches ~62% of my SPAM is set up to catch mail addressed to "old" email addresses, that I no longer use, but are still active.

Let me explain that. I have a dialup account that I use when on the road. I've had it "forever" and it must be on every SPAM list in the world, so any variaton on that is in my [C R A P] filter. The account is only used to get Web access, not email.

The other filter (16%) is one that came with the "set" available for download [Invalid Header ....]

Finally CFS (14%).

The remainder are pretty well what came with the "set". They account for the remaining 8%.

I get well over 300 SPAM emails per day.

HTH

Author: IP: 66.44.*.* PostPosted: Wed Oct 29, 2003 3:32 am    Post subject:

Skah_T wrote:
I like this filter! Thanks for sharing. One question. What's the purpose of the trailing question mark? Since it comes after an "*" I'm not sure how it's treated. Also, why even bother having the [^;:]*? at the end anyway?

Al & TimeGhost I would be interested to see your filters. I have over 100 myself and would love to cut down that numbers.


The trailing ? makes the * non-greedy. In other words, the expression takes the least possible number of charactors that fit the expression rather than the most possible number of charactors. It may be possible that the use of a non-greedy may not be needed, but its is better to use it unless it use is proven harmful

For a better understanding, see the section about "Metacharactors - iterators" in the Syntax of Regular Expressions.

The [^;:]*? is the expression terminator. It prevents the expression from continuing on into the rest of the header in those "Received: from" lines that do not contain an IANA-Reserved IP address. If you look at those header lines, there is usually a timedate in the line. just prior to the timedate is the semicolon [;] separator, with colon [:] separators in the timedate itself.

If there are no colon or semicolon separators in the "Received: from" header line, it is usually because of a poorly written forgery. In that case, the next line ("Message-id:", "To:", "From:", etc) will provide the terminator for the Regular Expression. Without this terminator ([^;:]*?), the expression would continue beyond this point in a wasteful and CPU consuming search.

Timeghost actually provided very useful feedback on another filter (DIRECT to MX) that led to that terminator.

Author: Skah_TLocation: USA PostPosted: Wed Oct 29, 2003 5:46 am    Post subject:

Anonymous wrote:
The [^;:]*? is the expression terminator. It prevents the expression from continuing on into the rest of the header in those "Received: from" lines that do not contain an IANA-Reserved IP address.


Which part is it terminating? The IP address search is a very defined expression that doesn't include any unbound searches. No *'s, +'s, or anything else that could go on forever. The only thing I can see that it's terminating is the initial [^[]*?.

Author: IP: 66.44.*.* PostPosted: Wed Oct 29, 2003 12:45 pm    Post subject:

Skah_T,

You are correct if you are thinking of a properly written "Received:from" line.

The problem is that a few of the forged "Received:from" lines are written so badly that this expression will not find a 'properly formatted' IP address in the line. When that happens, the expression does require the termination to keep from wasting any time by searching through the remainder of the header.

If you want to remove the terminator, feel free to do so. You may never receive an e-mail with the kind of forged line that causes the problem.

Author: IP: 66.44.*.* PostPosted: Wed Oct 29, 2003 1:04 pm    Post subject:

Skah_T,

Here is a link to another filter that you may find useful. This is the 'DIRECT to MX' fiter that I mentioned previously.

Most (almost all) of the SPAM that I receive is sent by the use of Bulk Maiking software that will bypass the sender ISP's outgoung mailserver and send directly to the recipients MX server. It's not perfect, but works pretty well at trapping SPAM.

It will require that you substitute the name of your ISP's own MX server in the expression, but you seem to have a good understanding of RegExp, so it will not be a problem for you.

Author: IP: 66.44.*.* PostPosted: Wed Oct 29, 2003 2:16 pm    Post subject:

Also....


Here is a link to the Internet Protocol v4 Address Space that lists the IP Address blocks used in the IANA-RESERVED filter.

You may want to check on this link periodically to ensure that the blocks in the filter are updated as required.



Note:

The following three blocks are reserved for private internets:

10.0.0.0 - 10.255.255.255 (10/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168/16 prefix)

These may appear in legitimate headers (along with 127.0.0.1) and should not be included in the BLOCK list for the filter.

See RFC-1918

Author: TimeGhostLocation: USA PostPosted: Wed Oct 29, 2003 5:51 pm    Post subject:

Skah_T wrote:
Al & TimeGhost I would be interested to see your filters. I have over 100 myself and would love to cut down that numbers.

My first filter merely looks for friendly subjects and marks matching mail as legit.

The second one is the one that catches nearly everything now. But don't get your hopes up -- my ISP uses a spam hueristics tool (Spam Assassin, I think), which inserts something like this in the header:
X-Spam-Score: 13.141 (*************)
They kept it secret -- I discovered it on my own one day. Maybe you should look to see if yours does something similar.

The last one is one that the guest in this thread wrote, and I modified slightly to make it work with my ISP's header format. It's discussed here.

Cutting down on the number of filters you use should speed MWP up. Reordering them, as discussed in this thread, should help, too.

Author: Skah_TLocation: USA PostPosted: Wed Oct 29, 2003 6:11 pm    Post subject:

Thanks all. I may have to do an overhaul of my filters some time when I have a free day or two Smile

Author: IP: 66.44.*.* PostPosted: Wed Oct 29, 2003 10:45 pm    Post subject:

TimeGhost wrote:
The second one is the one that catches nearly everything now. But don't get your hopes up -- my ISP uses a spam hueristics tool (Spam Assassin, I think), which inserts something like this in the header:
X-Spam-Score: 13.141 (*************)
They kept it secret -- I discovered it on my own one day. Maybe you should look to see if yours does something similar.



A filter scoring system would be something that Firetrust might look into for MWP. I realize that it would take some codewriting time away from other things they want to accomplish right now so think of this as just a friendly suggestion...

Perhaps someday.....



CastleCops -> Mailwasher - Troubleshooting / General

All times are GMT

Page 1 of 1


Powered by phpBB © 2001 phpBB Group