CastleCops, Internet Crime Fighters
Need help? Click here to register for free! Absolutely zero advertisements on this site!

$9736.22 of $21422.68
left sidedonated so farneed $11686.46 donated to reach our goalright side, our goal
Help CastleCops serve the community on new servers, Donate Here to reach our goal.

Donation/Premium
spacer
block bottom
Security Central
spacer
· Home
· PIRT/Fried Phish
· MIRT
· SIRT
· Deutsch
· Wiki
· Newsletter
· O16/ActiveX
· CLSID List
· Contest2007
· Downloads
· Feedback (send)
· Forums
· HijackThis
· Hijacktrend
· LSPs
· My Downloads
· O18
· O20
· O21
· O22
· O23
· O9
· Premium
· Private Messages
· Proxomitron
· Reviews
· Search
· StartupList
· Stories Archive
· Submit News
· WsIRT
· Your Account
· Acceptable Use Policy
block bottom
Survey
spacer
Was 2007 a good year?

Yes it was a wonderful year
Yes, but there is always room for improvement
Status quo
It was a challenge
Other (leave comment)



Results
Polls

Votes: 937
Comments: 25
block bottom
spacer spacer

How do you filter for stuff like this?

 
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General [del.icio.us!] [digg it!] [reddit!]
View previous topic :: View next topic  
Author Message
IP: 213.84.*.*

Guest






PostPosted: Wed Aug 13, 2003 10:13 pm    Post subject: How do you filter for stuff like this?
Reply with quote

This must have been asked many times before, but I can't seem to find a satisfactory answer:

How does one filter to remove disguised words in the e-mail header like these:

ama<jerjtkelflgh>zing pr<rjrbryyi>ofit margin

I've concocted the following, but it seems rather clumsy.

(am.{2,16}ng|pr.{2,16}t margin).{1,10}(am.{2,16}ng|pr.{2,16}t margin)

Is there a better way of tackling this stuff?

TIA! Smile

Back to top
TonyKlein

Site Moderator
Microsoft MVP

Joined: Oct 15, 2002
Posts: 13113
Location: Netherlands
MIRT Moderators MVP Premium Security Experts

PostPosted: Wed Aug 13, 2003 10:19 pm    Post subject:
Reply with quote

Sorry, that was me; I forgot to log in...


_________________
Tony image CLSID List
Back to top
View users profile Send private message
TonyKlein

Site Moderator
Microsoft MVP

Joined: Oct 15, 2002
Posts: 13113
Location: Netherlands
MIRT Moderators MVP Premium Security Experts

PostPosted: Wed Aug 13, 2003 10:34 pm    Post subject:
Reply with quote

I said "header" but I believe I meant "body" (or did I... image)


_________________
Tony image CLSID List
Back to top
View users profile Send private message
TimeGhost

Captain
Captain


Joined: Apr 11, 2003
Posts: 747
Location: USA
Team F@H

PostPosted: Thu Aug 14, 2003 3:15 pm    Post subject:
Reply with quote

This has come up before, and the solution I came up would work in Perl but not MWP whose RegExp syntax is merely a subset of Perl's. Even so, the Perl solution was unwieldy because it required a RegExp with a list of all valid HTML tags.

One thing that might help is another type of filter that looks for the occurance of five or more consonants in a row, which Gary suggested a few weeks ago. The RegExp rule looks something like this (untested):
[bcdfghjklmnpqrstvwxz]{5,}

This is also an effective rule to apply to domain names and subjects.

Unfortunately, most invalid tags tend to be only a few characters long.

Please read the posts in this thread for more ideas.

HTH

Back to top
View users profile Send private message
Jerry1970

Sergeant
Sergeant


Joined: Jun 09, 2003
Posts: 140
Location: Netherlands

PostPosted: Fri Aug 15, 2003 7:48 am    Post subject:
Reply with quote

What about a function like PHP's stip_tags? This will strip out all html tags (everything between < and >) and leave the rest as plain text. Any filter to be applied after stripping the tags.

Jerry

Back to top
View users profile Send private message
AlphaCentauri

Guest
IP: 168.103.*.*






PostPosted: Fri Aug 15, 2003 9:46 pm    Post subject: key words broken up by html tags
Reply with quote

It seemed in the ones I was getting, all the crazy tags in the middle of words began with an exclaimation point. When I filter for "<!" it catches them.

Back to top
Jerry1970

Sergeant
Sergeant


Joined: Jun 09, 2003
Posts: 140
Location: Netherlands

PostPosted: Sat Aug 16, 2003 6:13 am    Post subject:
Reply with quote

But then you have to search for "<!", which is the beginning of a remark tag. In any html message this can happen. I want to filter out "viagra", which is hard when people write "v<blah>iagra" and "vi<blah>agra".

Jerry

Back to top
View users profile Send private message
Jerry1970

Sergeant
Sergeant


Joined: Jun 09, 2003
Posts: 140
Location: Netherlands

PostPosted: Sat Aug 16, 2003 7:09 am    Post subject:
Reply with quote

Right, forgot the forum takes html tags... Wink
I meant this: I want to filter out "viagra", which is hard when people write "v<blah>iagra" and "vi<blah>agra" and every other combination.

Jerry

Back to top
View users profile Send private message
TonyKlein

Site Moderator
Microsoft MVP

Joined: Oct 15, 2002
Posts: 13113
Location: Netherlands
MIRT Moderators MVP Premium Security Experts

PostPosted: Sat Aug 16, 2003 5:14 pm    Post subject:
Reply with quote

Thanks for the suggestions, TimeGhost.

Much appreciated! Smile


_________________
Tony image CLSID List
Back to top
View users profile Send private message
TonyKlein

Site Moderator
Microsoft MVP

Joined: Oct 15, 2002
Posts: 13113
Location: Netherlands
MIRT Moderators MVP Premium Security Experts

PostPosted: Sat Aug 16, 2003 5:15 pm    Post subject:
Reply with quote

Jerry1970 wrote:
Right, forgot the forum takes html tags... Wink
I meant this: I want to filter out "viagra", which is hard when people write "v<blah>iagra" and "vi<blah>agra" and every other combination.

Jerry


Jerry,

I have the following, which is once again not too elegant, but turns out to work well, and should be pretty bug-free:

Vi.{0,4}gr.{1,4}|V.{0,3}agr.{1,4}

Back to top
View users profile Send private message
rusticdog

Site Moderator
Premium Member

Joined: Aug 12, 2002
Posts: 5844
Location: New_Zealand
Blue Security Firetrust Moderators Premium

PostPosted: Tue Aug 19, 2003 3:36 am    Post subject:
Reply with quote

Gary's HTML Tags filter also would help, as then you don't need to come up with an expression for every different keyword you want to filter for

So if the Body RegEx
((<![\w\s,\.\-]+>)([\w\s,\.\-]){1,10}){3,}
Delete/Bounce...whatever.....

Ask Gary though if you need an example of how this filter works, try as I might I get lost

Back to top
View users profile Send private message Send email Visit posters website Yahoo Messenger MSN Messenger
gary

Lieutenant
Lieutenant
Premium Member

Joined: Dec 22, 2002
Posts: 260
Location: Dallas/Ft. Worth, USA
Premium

PostPosted: Wed Aug 20, 2003 12:50 pm    Post subject:
Reply with quote

You might want to try the whole "HTML Spam Tricks" filter. Here's the latest one I've been working with:

Code:
[enabled],"[2] HTML Spam Tricks (B)","[2] HTML Spam Tricks (B)",16711680,OR,Blacklist,Delete,Body,containsRE,"font size=""?0""?",Body,containsRE,"((<![\w\s,\.\-]+>)+([\w\s,\.\-]){1,20}){3}",Body,containsRE,"(</\w>)[\w\s,\.\-]{1,20}(\1([\w\s,\.\-]){1,20}){2}",Body,containsRE,"(<\w{5,}>[\w\s,\.\-]{1,20}){5}",Body,containsRE,<[\w\s]{50}


Be careful about being too specific, and including portions of words in your filters, as often these tags are inserted at random, so will make your filter useless except on that specific message. Also, you can't be too lax, and filter for any tag, or you will catch legitimate HTML tags and comments. I opted for sort of a probability check, where there must be a certain number of occurrances of something. However, other folks in this forum have some great ideas, too! It's not an easy thing to catch.


_________________
Gary
Back to top
View users profile Send private message
TimeGhost

Captain
Captain


Joined: Apr 11, 2003
Posts: 747
Location: USA
Team F@H

PostPosted: Wed Aug 20, 2003 2:04 pm    Post subject:
Reply with quote

If Firetrust were to implement a scoring system, that might eliminate the need for your probability check.

Back to top
View users profile Send private message
gary

Lieutenant
Lieutenant
Premium Member

Joined: Dec 22, 2002
Posts: 260
Location: Dallas/Ft. Worth, USA
Premium

PostPosted: Wed Aug 20, 2003 8:24 pm    Post subject:
Reply with quote

Actually, that will be coming in the future in the form of Bayesian filtering! Smile


_________________
Gary
Back to top
View users profile Send private message
Display posts from previous:   
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General All times are GMT
Page 1 of 1

 
Quick Reply:
Username: 

Quote the last message
Attach signature (signatures can be changed in profile)
 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001 phpBB Group
spacer spacer