|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
IP: 213.84.*.*
Guest
|
Posted: Wed Aug 13, 2003 10:13 pm Post subject: How do you filter for stuff like this? |
|
|
This must have been asked many times before, but I can't seem to find a satisfactory answer:
How does one filter to remove disguised words in the e-mail header like these:
ama<jerjtkelflgh>zing pr<rjrbryyi>ofit margin
I've concocted the following, but it seems rather clumsy.
(am.{2,16}ng|pr.{2,16}t margin).{1,10}(am.{2,16}ng|pr.{2,16}t margin)
Is there a better way of tackling this stuff?
TIA! 
|
|
| Back to top |
|
 |
TonyKlein
Site Moderator Microsoft MVP
 Joined: Oct 15, 2002 Posts: 13120 Location: Netherlands
|
Posted: Wed Aug 13, 2003 10:19 pm Post subject: |
|
|
Sorry, that was me; I forgot to log in... _________________ Tony CLSID List
|
|
| Back to top |
|
 |
TonyKlein
Site Moderator Microsoft MVP
 Joined: Oct 15, 2002 Posts: 13120 Location: Netherlands
|
|
| Back to top |
|
 |
TimeGhost
Major

 Joined: Apr 11, 2003 Posts: 750 Location: USA
|
Posted: Thu Aug 14, 2003 3:15 pm Post subject: |
|
|
This has come up before, and the solution I came up would work in Perl but not MWP whose RegExp syntax is merely a subset of Perl's. Even so, the Perl solution was unwieldy because it required a RegExp with a list of all valid HTML tags.
One thing that might help is another type of filter that looks for the occurance of five or more consonants in a row, which Gary suggested a few weeks ago. The RegExp rule looks something like this (untested):
[bcdfghjklmnpqrstvwxz]{5,}
This is also an effective rule to apply to domain names and subjects.
Unfortunately, most invalid tags tend to be only a few characters long.
Please read the posts in this thread for more ideas.
HTH
|
|
| Back to top |
|
 |
Jerry1970
Sergeant

 Joined: Jun 09, 2003 Posts: 140 Location: Netherlands
|
Posted: Fri Aug 15, 2003 7:48 am Post subject: |
|
|
What about a function like PHP's stip_tags? This will strip out all html tags (everything between < and >) and leave the rest as plain text. Any filter to be applied after stripping the tags.
Jerry
|
|
| Back to top |
|
 |
AlphaCentauri
Guest IP: 168.103.*.*
|
Posted: Fri Aug 15, 2003 9:46 pm Post subject: key words broken up by html tags |
|
|
It seemed in the ones I was getting, all the crazy tags in the middle of words began with an exclaimation point. When I filter for "<!" it catches them.
|
|
| Back to top |
|
 |
Jerry1970
Sergeant

 Joined: Jun 09, 2003 Posts: 140 Location: Netherlands
|
Posted: Sat Aug 16, 2003 6:13 am Post subject: |
|
|
But then you have to search for "<!", which is the beginning of a remark tag. In any html message this can happen. I want to filter out "viagra", which is hard when people write "v<blah>iagra" and "vi<blah>agra".
Jerry
|
|
| Back to top |
|
 |
Jerry1970
Sergeant

 Joined: Jun 09, 2003 Posts: 140 Location: Netherlands
|
Posted: Sat Aug 16, 2003 7:09 am Post subject: |
|
|
Right, forgot the forum takes html tags...
I meant this: I want to filter out "viagra", which is hard when people write "v<blah>iagra" and "vi<blah>agra" and every other combination.
Jerry
|
|
| Back to top |
|
 |
TonyKlein
Site Moderator Microsoft MVP
 Joined: Oct 15, 2002 Posts: 13120 Location: Netherlands
|
|
| Back to top |
|
 |
TonyKlein
Site Moderator Microsoft MVP
 Joined: Oct 15, 2002 Posts: 13120 Location: Netherlands
|
Posted: Sat Aug 16, 2003 5:15 pm Post subject: |
|
|
| Jerry1970 wrote: | Right, forgot the forum takes html tags...
I meant this: I want to filter out "viagra", which is hard when people write "v<blah>iagra" and "vi<blah>agra" and every other combination.
Jerry |
Jerry,
I have the following, which is once again not too elegant, but turns out to work well, and should be pretty bug-free:
Vi.{0,4}gr.{1,4}|V.{0,3}agr.{1,4}
|
|
| Back to top |
|
 |
rusticdog
Site Moderator Premium Member
 Joined: Aug 12, 2002 Posts: 5850 Location: New_Zealand
|
Posted: Tue Aug 19, 2003 3:36 am Post subject: |
|
|
Gary's HTML Tags filter also would help, as then you don't need to come up with an expression for every different keyword you want to filter for
So if the Body RegEx
((<![\w\s,\.\-]+>)([\w\s,\.\-]){1,10}){3,}
Delete/Bounce...whatever.....
Ask Gary though if you need an example of how this filter works, try as I might I get lost
|
|
| Back to top |
|
 |
gary
Lieutenant
 Premium Member
 Joined: Dec 22, 2002 Posts: 260 Location: Dallas/Ft. Worth, USA
|
Posted: Wed Aug 20, 2003 12:50 pm Post subject: |
|
|
You might want to try the whole "HTML Spam Tricks" filter. Here's the latest one I've been working with:
| Code: | [enabled],"[2] HTML Spam Tricks (B)","[2] HTML Spam Tricks (B)",16711680,OR,Blacklist,Delete,Body,containsRE,"font size=""?0""?",Body,containsRE,"((<![\w\s,\.\-]+>)+([\w\s,\.\-]){1,20}){3}",Body,containsRE,"(</\w>)[\w\s,\.\-]{1,20}(\1([\w\s,\.\-]){1,20}){2}",Body,containsRE,"(<\w{5,}>[\w\s,\.\-]{1,20}){5}",Body,containsRE,<[\w\s]{50}
|
Be careful about being too specific, and including portions of words in your filters, as often these tags are inserted at random, so will make your filter useless except on that specific message. Also, you can't be too lax, and filter for any tag, or you will catch legitimate HTML tags and comments. I opted for sort of a probability check, where there must be a certain number of occurrances of something. However, other folks in this forum have some great ideas, too! It's not an easy thing to catch. _________________ Gary
|
|
| Back to top |
|
 |
TimeGhost
Major

 Joined: Apr 11, 2003 Posts: 750 Location: USA
|
Posted: Wed Aug 20, 2003 2:04 pm Post subject: |
|
|
If Firetrust were to implement a scoring system, that might eliminate the need for your probability check.
|
|
| Back to top |
|
 |
gary
Lieutenant
 Premium Member
 Joined: Dec 22, 2002 Posts: 260 Location: Dallas/Ft. Worth, USA
|
Posted: Wed Aug 20, 2003 8:24 pm Post subject: |
|
|
Actually, that will be coming in the future in the form of Bayesian filtering!  _________________ Gary
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|