CastleCops, Internet Crime Fighters
Need help? Click here to register for free! Absolutely zero advertisements on this site!

Donation/Premium
spacer
block bottom
Security Central
spacer
· Home
· PIRT/Fried Phish
· MIRT
· SIRT
· Deutsch
· Wiki
· Newsletter
· O16/ActiveX
· CLSID List
· Contest2007
· Downloads
· Feedback (send)
· Forums
· HijackThis
· Hijacktrend
· LSPs
· My Downloads
· O18
· O20
· O21
· O22
· O23
· O9
· Premium
· Private Messages
· Proxomitron
· Reviews
· Search
· StartupList
· Stories Archive
· Submit News
· WsIRT
· Your Account
· Acceptable Use Policy
block bottom
spacer spacer

Need help with regex filter.
Goto page Previous  1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General [del.icio.us!] [digg it!] [reddit!]
View previous topic :: View next topic  
Author Message
Cowboy

Guest
IP: 213.112.*.*






PostPosted: Thu Nov 27, 2003 1:30 am    Post subject:
Reply with quote

I'm stupid, I don't get it.
What is the problem with Ikeb's last filter?

Back to top
denn988

Guest
IP: 66.44.*.*






PostPosted: Thu Nov 27, 2003 2:40 am    Post subject:
Reply with quote

Ikeb,

Have you noticed something in 'TRegExpr' that might help you in another way....provided that the software that was calling the RegExp Engine in MWP were written to use it?

I am talking about the charactor position information that is returned with each match.

A program that was written to use that information properly would be able to specify the actual positional occurences of a RegExp match. Through the use of followup matches, by position, the kind of thing that you are looking for could be easily done.

MWP would have to re-write their filter routine to take advantage of this....but could you imagine the kind of filter power that such a re-write could provide to the user?

Think about it.....

Back to top
denn988

Guest
IP: 66.44.*.*






PostPosted: Thu Nov 27, 2003 3:19 am    Post subject:
Reply with quote

Cowboy wrote:
I'm stupid, I don't get it.
What is the problem with Ikeb's last filter?


Cowboy,

My mother's last (and final) husband could not use a screwdriver to save his life. That did not mean he was stupid, it just meant he was not mechanically inclined. Do you know what his job was before he retired?

He was one of the top analysts of 'Soviet Logistics' in one of our Intelligence agencies. Stupid??? Not by a long shot!!!!

You are not 'stupid' either. You just don't have enough of an understanding of Regular Expressions (as they are used in MWP) to realize what Ikeb has realized.

The perfect filter for detecting HTML camoflage attempts using RegExps cannot be created unless the strategy that MWP uses in creating RegExp filters were to change.

It would require a major re-write of the code for using Regular Expressions in order to do it.


Ikeb,

If I have mistated your realization in this matter...I apologize.


By the way....not only does 'TRegExpr' return the charactor position range when it finds a match...it also returns the charactors that matched the Regex. Imagine the advantage that would give you in writing filters if you had, for each matching occurence, the positions...and the matching phrase....instead of just a simple True or False to work with.

You could do create some extremely powerful filters with that!!!

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16543

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Thu Nov 27, 2003 7:12 am    Post subject:
Reply with quote

denn988 wrote:
If I have mistated your realization in this matter...I apologize.

No need! That's exactly what I realized. TimeGhost reminded me of previous discussions that took place regarding solutions to the very problem Cowboy presented but which at the time went right over my head. Rolling Eyes

denn988 wrote:
By the way....not only does 'TRegExpr' return the charactor position range when it finds a match...it also returns the charactors that matched the Regex. Imagine the advantage that would give you in writing filters if you had, for each matching occurence, the positions...and the matching phrase....instead of just a simple True or False to work with.

You could do create some extremely powerful filters with that!!!

You mean like a multi-level filter? That would be kewl! Cool

By the way, the topics TimeGhost referenced reminded me about negative lookahead, a Regular Expression means of checking ahead for a match of a set of characters before proceeding with the match -- sort of a mini-two level match scheme (at least that's how I interprete TimeGhost's explanation). I can't see where TRegExpr supports that and Gary confirmed some time ago that neither does MWP. Perhaps FireTrust should be looking for a more capable Regular Expression search engine?

As an aside, TRegExpr supports -- and thus presumably so does MWP -- something called backreference metacharacters which I haven't sorted out and which could be used to advantage in other situations.

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16543

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Thu Nov 27, 2003 8:00 am    Post subject:
Reply with quote

Cowboy wrote:
I'm stupid, I don't get it.
What is the problem with Ikeb's last filter?

The problem is that I'm stupid. Stupid in the sense that I know HTML fairly well. Sure the filter I came up with comes a lot closer in the sense that it prevents any false positive due to a 'word<br>another_word' sequence, but I neglected to consider other valid HTML tag sequences that could just as easily cuase a false positive. TimeGhost pointed out a few common tags . For example if someone sent you an email quoting some other document it could easily be coded as:
"The document states<blockquote>This is the best way of doing that.</blockquote>How was I to know?

The filter I came up with would fire at both tags.

There are actually lots of HTML tags that could be used this way. To cover them all would require a very long regex which would unavoidably leave lots of undetectable invalid tags undetectable. In summary, to make a zero false positive regex would be extremely difficult and would allow false negatives.

That said, I did work out a better regex that allows the popular <br> (actually allows <br*>), <blockquote> (actually allows <bl*>), <big> (actually <bi*> (I threw that in), <b>, <i>, and <u> opening tags as well as the matching </b>, etc tags as follows:
Code:
(?# word broken by invalid html tag)[a-z]</??([^b<>/][^>]+?|b[^ilr<>][^>]*?|[^bui])>[a-z]

Starting with the < character match, this filter now accepts a '/' tag close character if it's there, then matches on any of the following explained in the order they're listed:
  1. Anything but b, <, >, or /' as the first character, then anything before the closing > (b*), or
  2. b as the first character, followed by anything except the 'i, l, r, <, or > character, followed by any other characters before the closing > (bx*) , or
  3. Anything but 'b, u, or i' as a single character before the closing >


That's as far as I'm going to take this but with a list of all valid HTML tags, one could extend this filter to include many more tags ... at the expense of letting undetected invalid character sequences through.

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Thu Nov 27, 2003 2:20 pm    Post subject:
Reply with quote

Ikeb,

I am glad that I didn't mistate what you realized...

As far as a multi-level filter goes, the fact that the RegExp engine in MWP does not support 'negative lookahead' could be remedied by a filter that does a second comparison of any phrase matching the first level RegExp.

For instance....

If the first stage of the RegExp looked for:

Code:
[a-z]<[^<]*?>[a-z]


you could have a follow on RegExp (as a subrule to the top level rule) that looked for the following RegExp within any matching phrase found in the top-level:

Code:
</?(br|blockquote)>


You could place all the 'legal' HTML in that subrule.

If the second level RegExp is found within any of the first level matches, the subrule could be designed to treat it as a positive or a negative.

In the above case, you would want to treat it as a negative...in other words...if the second level expression is found, the filter will not trigger on that first level match.

The filter could then continue its way through looking for any further matches of the top-level rule.

Would something like this be of any help to you?

By the way....Happy Thanksgiving all Cool Smile

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16543

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Thu Nov 27, 2003 4:21 pm    Post subject:
Reply with quote

denn988 wrote:
Would something like this be of any help to you?

Duh! Incidently, the way TimeGhost explained it, that is exactly what negative lookahead would do. Perhaps it's the same thing with a different name?

denn988 wrote:
By the way....Happy Thanksgiving all Cool Smile

Well I had my turkey about a month ago, but that doesn't mean I won't be watching the NFL games this afternoon!

Happy Thanksgiving to all you Yanks! Smile

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Fri Nov 28, 2003 2:40 am    Post subject:
Reply with quote

Ikeb wrote:
denn988 wrote:
Would something like this be of any help to you?

Duh! Incidently, the way TimeGhost explained it, that is exactly what negative lookahead would do. Perhaps it's the same thing with a different name?

denn988 wrote:
By the way....Happy Thanksgiving all Cool Smile

Well I had my turkey about a month ago, but that doesn't mean I won't be watching the NFL games this afternoon!

Happy Thanksgiving to all you Yanks! Smile


Ikeb,


Perhaps we had better get over to the 'suggestion' board and start a forum requesting "Sub-Rules" for filters. If 'Negative Lookahead' is not built into the Regex Engine (which it is not) then there is a way to get it through a small rewrite of the fiters in MWP.

That would come in very handy for a number of filter strategies.

Back to top
stan_qaz

Premium Member


Joined: Mar 31, 2003
Posts: 10635

Premium

PostPosted: Fri Nov 28, 2003 2:59 am    Post subject:
Reply with quote

Might want to look at the folks that created the regex code that mailwasher uses (if I understood that correctly) and getting them to add the code there at the source and then getting mailwasher to pick up the changes.


_________________
Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16543

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Fri Nov 28, 2003 4:40 am    Post subject:
Reply with quote

The fellow who receives credit by FireTrust is one Andrey V. Sorokin who lives in St-Petersburg, Russia. FireTrust's credit references a web site which was giving Perry some grief a couple of weeks ago. It looks like Andrey hasn't yet cleaned up the site because I still get an MS Security popup asking if it's OK to install a certain 'Precision Time' program from GAIN Publishing (Terms and Conditions reference the Gator Corporation, an infamous Spyware generator).

The web site offers the TRegExpr utility. The last version of TregExpr was issued in October 2001, so it would appear Andrey is no longer actively developing his software.

Back to top
View users profile Send private message
stan_qaz

Premium Member


Joined: Mar 31, 2003
Posts: 10635

Premium

PostPosted: Fri Nov 28, 2003 5:33 am    Post subject:
Reply with quote

Thanks for the program install warning, I never see that type of thing since I don't use explorer unless I'm desperate.

Has anyone tried to contact Andrey to see if he is interested or if the site is truly abandonded? If it is the firetrust crew might want to take some action to duplicate the information there elsewhere in case the site goes away someday.


_________________
Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Fri Nov 28, 2003 6:15 am    Post subject:
Reply with quote

It would be much easier to modify the calling routines in MWP than it would be to modify the RegExp engine.

From the looks of things, MWP is currently only looking for a TRUE-FALSE return from the RegExp routines used.

From what I see when I run 'TRegExpr', the RegExp routines will return not only the text that matches the RegExp, but the start and end position within the search string that the matching text was found.

With that information returned to the filter routine (the calling routine) it would not be all that difficult to have that calling routine do a second stage of filtering...the 'sub-rule' that I mentioned previously.

The only thing that might hinder such an endeavor by MWP would be if the license agreement between them and Andrey V. Sorokin prohibited them from pursuing this.

If it does not prohibit this, all MWP would have to do would be to modify the 'Edit Filter' GUI to allow a sub-rule to any rule specified. That sub-rule would simply be:

Code:

Each matching sub-string found within the specified text...

contains...does not contain...contains RegExp...does not contain RegExp

the sub-rule...either a text string or RegExp.


When the filter is run it would look through the text specified by the parent rule. Any text string that matches would then be tested by the sub-rule.

If the result is TRUE...the filter rule would return TRUE and go on to the next RULE. If the result is false, the filter would continue looking through the specified text for the next match of the parent rule.

The cycle would continue until either the sub-rule returned a TRUE or until the entire string specified by the parent rule was searched without a match of both the parent rule and the subrule.

This would allow a type of 'negative lookahead' to be used by MWPs filters and would be far easier to implement than modifying the RegExp engine would be.

Back to top
IP: 66.44.*.*

Guest






PostPosted: Fri Nov 28, 2003 6:21 am    Post subject:
Reply with quote

Ikeb wrote:
The fellow who receives credit by FireTrust is one Andrey V. Sorokin who lives in St-Petersburg, Russia. FireTrust's credit references a web site which was giving Perry some grief a couple of weeks ago. It looks like Andrey hasn't yet cleaned up the site because I still get an MS Security popup asking if it's OK to install a certain 'Precision Time' program from GAIN Publishing (Terms and Conditions reference the Gator Corporation, an infamous Spyware generator).

The web site offers the TRegExpr utility. The last version of TregExpr was issued in October 2001, so it would appear Andrey is no longer actively developing his software.


Are you sure that was 'TRegExpr' that was doing that???

I thought you all were saying that it was 'Visual RegExp' that was causing those problems.

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16543

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Fri Nov 28, 2003 7:48 am    Post subject:
Reply with quote

Anonymous wrote:
Are you sure that was 'TRegExpr' that was doing that???

Not TRegExpr itself. The web site from which TRegExpr can be obtained.

Anonymous wrote:
I thought you all were saying that it was 'Visual RegExp' that was causing those problems.

Where did you get that impression?

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16543

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Fri Nov 28, 2003 8:42 am    Post subject:
Reply with quote

denn988 wrote:
It would be much easier to modify the calling routines in MWP than it would be to modify the RegExp engine.


Shocked Shocked Shocked Shocked ---Snipped a great explanation of a multi-layer filtering concept based on Sorokin's regex code----- Shocked Shocked Shocked Shocked

denn988 wrote:
This would allow a type of 'negative lookahead' to be used by MWPs filters and would be far easier to implement than modifying the RegExp engine would be.


Sounds like it's time for you to post this most wonderful idea to the Suggestions forum Denn988!!! Yes

One thing you might give some thought .... debugging those filters! I can see how debugging such a filter could easily get very complicated, especially if multi-layer nesting of filters is attempted. Shocked Wink

Thinker Just for starters, the Testrexp tool -- pretty much essential when developing any more complicated regex -- just wouldn't cut it.
Thinker FireTrust would have to develop their own regex filter test interface -- as part of MWP perhaps.
Thinker Some sort of "single-step" function could allow tracking of parent filter behaviour as child filter(s) do their stuff.

Post your idea as a suggestion and let's explore it further!!

Back to top
View users profile Send private message
Display posts from previous:   
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 4 of 6

 
Quick Reply:
Username: 

Quote the last message
Attach signature (signatures can be changed in profile)
 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001 phpBB Group
spacer spacer