CastleCops, Internet Crime Fighters
Need help? Click here to register for free! Absolutely zero advertisements on this site!

$9736.22 of $21422.68
left sidedonated so farneed $11686.46 donated to reach our goalright side, our goal
Help CastleCops serve the community on new servers, Donate Here to reach our goal.

Donation/Premium
spacer
block bottom
Security Central
spacer
· Home
· PIRT/Fried Phish
· MIRT
· SIRT
· Deutsch
· Wiki
· Newsletter
· O16/ActiveX
· CLSID List
· Contest2007
· Downloads
· Feedback (send)
· Forums
· HijackThis
· Hijacktrend
· LSPs
· My Downloads
· O18
· O20
· O21
· O22
· O23
· O9
· Premium
· Private Messages
· Proxomitron
· Reviews
· Search
· StartupList
· Stories Archive
· Submit News
· WsIRT
· Your Account
· Acceptable Use Policy
block bottom
Survey
spacer
Was 2007 a good year?

Yes it was a wonderful year
Yes, but there is always room for improvement
Status quo
It was a challenge
Other (leave comment)



Results
Polls

Votes: 940
Comments: 25
block bottom
spacer spacer

Need help with regex filter.
Goto page 1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General [del.icio.us!] [digg it!] [reddit!]
View previous topic :: View next topic  
Author Message
Cowboy

Guest
IP: 213.112.*.*






PostPosted: Sat Nov 22, 2003 11:49 am    Post subject: Need help with regex filter.
Reply with quote

I need a filter that will weed out comments placed in the middle of a word.

It would delete:
Buy my delicious sp<!kfgkh8899>am.

But it would not delete:
Buy my delicious <!kfgkh8899>spam.

I can not write the filter for myself, so if someone could help me with this it would do a lot for my spam filtering.

Thanks! Very Happy

Back to top
stan_qaz

Premium Member


Joined: Mar 31, 2003
Posts: 10612

Premium

PostPosted: Sat Nov 22, 2003 6:09 pm    Post subject:
Reply with quote

Go to the search function and select the firetrust catagory and search on html, you will find plenty of discussion of the topic and several suggestions for filters.


_________________
Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
Back to top
View users profile Send private message
Cowboy

Guest
IP: 213.112.*.*






PostPosted: Sat Nov 22, 2003 7:44 pm    Post subject:
Reply with quote

There is no filter like I need. At least not that I can find.
The closest is the filter that counts the comments but I think that is not what I need.

Back to top
stan_qaz

Premium Member


Joined: Mar 31, 2003
Posts: 10612

Premium

PostPosted: Sat Nov 22, 2003 10:26 pm    Post subject:
Reply with quote

That is as good as it is going to get, the problem isn't easy to solve as you saw from the posts you looked at.

Are you willing to pay to have a filter written? Make an offer and see if someone is willing to tackle the problem for some cash.

If not chip into the threads asking for a processed message option for the filters, the fix I like best.


_________________
Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
Back to top
View users profile Send private message
Cowboy

Guest
IP: 213.112.*.*






PostPosted: Sun Nov 23, 2003 12:59 am    Post subject:
Reply with quote

Nonsense. It can not be as good as it gets until someone tries to write the filter. Nobody has tried yet!

Back to top
denn988

Guest
IP: 66.44.*.*






PostPosted: Sun Nov 23, 2003 2:31 am    Post subject:
Reply with quote

Cowboy wrote:
Nonsense. It can not be as good as it gets until someone tries to write the filter. Nobody has tried yet!


So....Why don't you try???
(?# finds words broken by html comments )[a-z](<[!/].*?>)[a-z]
You might find it to be easier than you thought possible...

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16509

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sun Nov 23, 2003 6:12 am    Post subject:
Reply with quote

Cowboy wrote:
Nonsense. It can not be as good as it gets until someone tries to write the filter. Nobody has tried yet!

Ride 'em cowboy! Razz

Shoot first, ask questions later! Rolling Eyes

Back to top
View users profile Send private message
Cowboy

Guest
IP: 213.112.*.*






PostPosted: Sun Nov 23, 2003 12:47 pm    Post subject:
Reply with quote

I have tried. I've read the help files. I've tried to put together the parameters to make such a filter. I've sat for hours trying everything I can think of. Never once did I get it to work. So I decided I needed help. All I got was a bunch of comments about as useful as my filter attempts.

In the best of worlds I would have gotten a "That's a good idea to filter out just html comments that are used to disguise words, instead of trying to count the comments. Here's your filter.", or I would get a "That's a bad idea because it's impossible to make such a filter." Or at least not a thread bogged down by the assume patrol.

Back to top
denn988

Guest
IP: 66.44.*.*






PostPosted: Sun Nov 23, 2003 1:29 pm    Post subject:
Reply with quote

As long as you have already tried....

See if this will help:

The body....
contains Regular Expr...

Quote:

(?# words broken by html comments )[a-z]<[!/][^<]*?>[a-z]


Anyone who wishes can make any improvments to the filter as they see fit. This is just the simplest version that would seem to work.

If you decide you have to AUTO-DELETE based on this, don't blame me if you loose a few legitimate mails.

Back to top
denn988

Guest
IP: 66.44.*.*






PostPosted: Mon Nov 24, 2003 1:11 am    Post subject:
Reply with quote

Cowboy,

I have had a day to see how the above filter works and it looks pretty good so far.

There are a couple of mods that I have made to it that have improved its trap rate.

Change the above RegExp to:

Quote:

(?# words broken by html comments )[a-z]<[^<]*?>[a-z]


I removed the '!/' from the filter, so it will trap any word that has the html brackets in between the letters.

Examples:

s<!tytyt>pam
sp<wretser>am
sp</font>am

are trapped

this </font>is a test

is NOT trapped.

This filter can still result in false positives, so don't auto-delete.

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16509

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Mon Nov 24, 2003 6:54 am    Post subject:
Reply with quote

Denn988, thanks for another good one.

Do you think it's OK to have the filter fire on a single hit?

Also, instead of the [^<] negation, why not use [^>] since it's the closing ">" bracket that will follow this part of the match?

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Mon Nov 24, 2003 1:05 pm    Post subject:
Reply with quote

Ikeb wrote:
Denn988, thanks for another good one.

Do you think it's OK to have the filter fire on a single hit?

Also, instead of the [^<] negation, why not use [^>] since it's the closing ">" bracket that will follow this part of the match?


First...

I don't think it would be a good idea to write this type of filter to look for multiple hits. The reason is that is starts with a wildcard ([a-z]). If you were to write the filter to continue looking for more than one instance it would require a lot of CPU time to do each iteration, and with the 'a-z' at the beginning it would do it for each charactor in the message.

That would probably cause the filter to be more time intensive that you would consider acceptable.


Second...

As to the '[^<]' in the Regex...

It is there to prevent the filter from trapping a situation where there are two opening brackets prior to a closing bracket.

Example:

10<20<30
30>20>10

The above is NOT html, but represent two valid mathematical expressions.

You don't want the filter to trap on something like that.


Before you ask....

You could have another rule in the filter that looks for a "Content-Type: text/html"....but it would be something of a useless rule. There would be no easy way to write the filter so that it would only look at the message part that was HTML, in those cases that were multipart messages.

Anything that you would try to do with regex to try to do that would be even more CPU intensive than the 'multi-hit' filter would be.

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16509

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Mon Nov 24, 2003 2:45 pm    Post subject:
Reply with quote

denn988 wrote:
I don't think it would be a good idea to write this type of filter to look for multiple hits. The reason is that is starts with a wildcard ([a-z]). If you were to write the filter to continue looking for more than one instance it would require a lot of CPU time to do each iteration, and with the 'a-z' at the beginning it would do it for each charactor in the message.

That would probably cause the filter to be more time intensive that you would consider acceptable.


Second...

As to the '[^<]' in the Regex...

It is there to prevent the filter from trapping a situation where there are two opening brackets prior to a closing bracket.

Example:

10<20<30
30>20>10

The above is NOT html, but represent two valid mathematical expressions.

You don't want the filter to trap on something like that.

OK thanks for the clarification.

denn988 wrote:
Before you ask....

You could have another rule in the filter that looks for a "Content-Type: text/html"....but it would be something of a useless rule. There would be no easy way to write the filter so that it would only look at the message part that was HTML, in those cases that were multipart messages.

Anything that you would try to do with regex to try to do that would be even more CPU intensive than the 'multi-hit' filter would be.

You give me too much credit! I hadn't thought of attempting to check the html parts only. Besides I think the math expressions you gave as examples could also occur with HTML messages.

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Mon Nov 24, 2003 3:13 pm    Post subject:
Reply with quote

Quote:
You give me too much credit! I hadn't thought of attempting to check the html parts only. Besides I think the math expressions you gave as examples could also occur with HTML messages.


Those examples would look totally different if they appeared in an HTML part than they would in a Plain Text part.

Those examples, if sent as HTML, would appear in the raw text as:

10<20<30

and

30>20>10

The brackets must be sustituted when converting them to the HTML raw text in order to keep the translator from being confused.

Back to top
IP: 66.44.*.*

Guest






PostPosted: Mon Nov 24, 2003 3:22 pm    Post subject:
Reply with quote

Sorry,

I forgot to turn th e HTML off when I posted

Those examples, if sent as HTML, would appear in the raw text as:

1 0 & l t ; 2 0 & l t ; 3 0

and

3 0 & g t ; 2 0 & g t ; 1 0

I had to place spaces between each charactor above to get them to post.

The brackets must be sustituted when converting them to the HTML raw text in order to keep the translator from being confused.[/quote]

Back to top
Display posts from previous:   
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General All times are GMT
Goto page 1, 2, 3, 4, 5, 6  Next
Page 1 of 6

 
Quick Reply:
Username: 

Quote the last message
Attach signature (signatures can be changed in profile)
 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001 phpBB Group
spacer spacer