| View previous topic :: View next topic |
| Author |
Message |
Neep_heid
Sergeant

 Joined: Mar 31, 2003 Posts: 101 Location: Scotland
|
Posted: Mon May 26, 2003 7:05 am Post subject: Experimenting with some simple filters |
|
|
Good morning ladies and gentlemen (well it's morning in Scotland),
I've just had a couple of ideas for very simple filters in the subject field and am going to try them out then report on the number of false negatives.
I'm going to search each subject line for "." Obviously, if someone legitimately "r.e's" me, the filter will pick that up, but anybody who r.e's me should be on my "friends list". I'm doing this because a lot of spam seems to start with "r.e" and also spammers seem to use "." to break up words that other filters would pick up.
Can any of you think of legitimate mails you've had with this character in their subject lines (how often do legitimate mails have a web address or e-mail address in their subject lines?), save for the aforementioned "r.e"? This was a "Eureka" moment in the bath last night but I maybe haven't thought it through.
Also, has anyone ever had a legitimate mail with the dollar sign in the subject line?
As I say, I'm experimenting here. Maybe it's a bad idea, maybe not.
Thanks for listening,
Gregor
|
|
| Back to top |
|
 |
Neep_heid
Sergeant

 Joined: Mar 31, 2003 Posts: 101 Location: Scotland
|
Posted: Mon May 26, 2003 7:18 am Post subject: |
|
|
What an eejit I've been. On inspection of my spam log, it doesn't look as if "RE:'s" have dots in them. Why did I think they had????
I've set up a separate Re: / RE: filter but am keeping the "dot" one to see what happens.
Very definitely a neep heid!
|
|
| Back to top |
|
 |
gary
Lieutenant
 Premium Member
 Joined: Dec 22, 2002 Posts: 260 Location: Dallas/Ft. Worth, USA
|
Posted: Mon May 26, 2003 2:12 pm Post subject: |
|
|
As far as the words spaced out in the subject, you might give a try to something like this:
[A-Z]([ \.\*:_-]+)([A-Z]\1){2,}[A-Z]
and
([A-Z]+[:\.\-_]){3,}
I'll be putting this in the next sample filter release, and could certainly use a hand testing! The first should pick up most letters separated with spaces, periods, colons dashes, etc. (S P A C E D). The second will look for entire words separated in this manner.
Cheers!
Last edited by gary on Thu May 29, 2003 5:45 am, edited 1 time in total |
|
| Back to top |
|
 |
Neep_heid
Sergeant

 Joined: Mar 31, 2003 Posts: 101 Location: Scotland
|
Posted: Mon May 26, 2003 5:06 pm Post subject: |
|
|
Thanks, Gary
|
|
| Back to top |
|
 |
Neep_heid
Sergeant

 Joined: Mar 31, 2003 Posts: 101 Location: Scotland
|
Posted: Tue May 27, 2003 6:39 pm Post subject: |
|
|
The latest- I've junked my "." filter- far too many messages being marked as spam that aren't. On the other hand, my RE: filter and $ filter are proving to be quite successful, though in the case of the RE: one I'll have to be scrupulous about adding people to my Friends List.
|
|
| Back to top |
|
 |
TimeGhost
Captain

 Joined: Apr 11, 2003 Posts: 747 Location: USA
|
Posted: Wed May 28, 2003 8:44 pm Post subject: |
|
|
Neep_heid:
Remember that dot is a wildcard that matches any character. If you want to match an actual dot, precede it with the escape character this way: \.
Gary's regexp would be the way to go IMHO, since I often send myself file attachements from work, and the subject becomes the filename. Thus I often have a single dot in the subject of my emails. Of course, I do have myself on the friends list...
This just occurred to me. At one time I used Gnus (running on Emacs) as a newsreader. This wonderful program has very similar regexp syntax, and the spam filters we came up with were very effective.
|
|
| Back to top |
|
 |
Neep_heid
Sergeant

 Joined: Mar 31, 2003 Posts: 101 Location: Scotland
|
Posted: Wed May 28, 2003 9:38 pm Post subject: |
|
|
| TimeGhost wrote: | Neep_heid:
Remember that dot is a wildcard that matches any character. If you want to match an actual dot, precede it with the escape character this way: \.
|
Oops!
Seriously, the RE: filter and dollar one are reasonably good but as you say, I should take the trouble to get my head around Gary's filters.
Thanks for the advice,
Gregor
|
|
| Back to top |
|
 |
|
|