|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
Cowboy
Guest IP: 213.112.*.*
|
Posted: Thu Nov 27, 2003 1:30 am Post subject: |
|
|
I'm stupid, I don't get it.
What is the problem with Ikeb's last filter?
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Thu Nov 27, 2003 2:40 am Post subject: |
|
|
Ikeb,
Have you noticed something in 'TRegExpr' that might help you in another way....provided that the software that was calling the RegExp Engine in MWP were written to use it?
I am talking about the charactor position information that is returned with each match.
A program that was written to use that information properly would be able to specify the actual positional occurences of a RegExp match. Through the use of followup matches, by position, the kind of thing that you are looking for could be easily done.
MWP would have to re-write their filter routine to take advantage of this....but could you imagine the kind of filter power that such a re-write could provide to the user?
Think about it.....
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Thu Nov 27, 2003 3:19 am Post subject: |
|
|
| Cowboy wrote: | I'm stupid, I don't get it.
What is the problem with Ikeb's last filter? |
Cowboy,
My mother's last (and final) husband could not use a screwdriver to save his life. That did not mean he was stupid, it just meant he was not mechanically inclined. Do you know what his job was before he retired?
He was one of the top analysts of 'Soviet Logistics' in one of our Intelligence agencies. Stupid??? Not by a long shot!!!!
You are not 'stupid' either. You just don't have enough of an understanding of Regular Expressions (as they are used in MWP) to realize what Ikeb has realized.
The perfect filter for detecting HTML camoflage attempts using RegExps cannot be created unless the strategy that MWP uses in creating RegExp filters were to change.
It would require a major re-write of the code for using Regular Expressions in order to do it.
Ikeb,
If I have mistated your realization in this matter...I apologize.
By the way....not only does 'TRegExpr' return the charactor position range when it finds a match...it also returns the charactors that matched the Regex. Imagine the advantage that would give you in writing filters if you had, for each matching occurence, the positions...and the matching phrase....instead of just a simple True or False to work with.
You could do create some extremely powerful filters with that!!!
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Thu Nov 27, 2003 8:00 am Post subject: |
|
|
| Cowboy wrote: | I'm stupid, I don't get it.
What is the problem with Ikeb's last filter? |
The problem is that I'm stupid. Stupid in the sense that I know HTML fairly well. Sure the filter I came up with comes a lot closer in the sense that it prevents any false positive due to a 'word<br>another_word' sequence, but I neglected to consider other valid HTML tag sequences that could just as easily cuase a false positive. TimeGhost pointed out a few common tags . For example if someone sent you an email quoting some other document it could easily be coded as:
"The document states<blockquote>This is the best way of doing that.</blockquote>How was I to know?
The filter I came up with would fire at both tags.
There are actually lots of HTML tags that could be used this way. To cover them all would require a very long regex which would unavoidably leave lots of undetectable invalid tags undetectable. In summary, to make a zero false positive regex would be extremely difficult and would allow false negatives.
That said, I did work out a better regex that allows the popular <br> (actually allows <br*>), <blockquote> (actually allows <bl*>), <big> (actually <bi*> (I threw that in), <b>, <i>, and <u> opening tags as well as the matching </b>, etc tags as follows:
| Code: | | (?# word broken by invalid html tag)[a-z]</??([^b<>/][^>]+?|b[^ilr<>][^>]*?|[^bui])>[a-z] |
Starting with the < character match, this filter now accepts a '/' tag close character if it's there, then matches on any of the following explained in the order they're listed:
- Anything but b, <, >, or /' as the first character, then anything before the closing > (b*), or
- b as the first character, followed by anything except the 'i, l, r, <, or > character, followed by any other characters before the closing > (bx*) , or
- Anything but 'b, u, or i' as a single character before the closing >
That's as far as I'm going to take this but with a list of all valid HTML tags, one could extend this filter to include many more tags ... at the expense of letting undetected invalid character sequences through.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10635
|
Posted: Fri Nov 28, 2003 2:59 am Post subject: |
|
|
Might want to look at the folks that created the regex code that mailwasher uses (if I understood that correctly) and getting them to add the code there at the source and then getting mailwasher to pick up the changes. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Fri Nov 28, 2003 4:40 am Post subject: |
|
|
The fellow who receives credit by FireTrust is one Andrey V. Sorokin who lives in St-Petersburg, Russia. FireTrust's credit references a web site which was giving Perry some grief a couple of weeks ago. It looks like Andrey hasn't yet cleaned up the site because I still get an MS Security popup asking if it's OK to install a certain 'Precision Time' program from GAIN Publishing (Terms and Conditions reference the Gator Corporation, an infamous Spyware generator).
The web site offers the TRegExpr utility. The last version of TregExpr was issued in October 2001, so it would appear Andrey is no longer actively developing his software.
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10635
|
Posted: Fri Nov 28, 2003 5:33 am Post subject: |
|
|
Thanks for the program install warning, I never see that type of thing since I don't use explorer unless I'm desperate.
Has anyone tried to contact Andrey to see if he is interested or if the site is truly abandonded? If it is the firetrust crew might want to take some action to duplicate the information there elsewhere in case the site goes away someday. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Fri Nov 28, 2003 6:15 am Post subject: |
|
|
It would be much easier to modify the calling routines in MWP than it would be to modify the RegExp engine.
From the looks of things, MWP is currently only looking for a TRUE-FALSE return from the RegExp routines used.
From what I see when I run 'TRegExpr', the RegExp routines will return not only the text that matches the RegExp, but the start and end position within the search string that the matching text was found.
With that information returned to the filter routine (the calling routine) it would not be all that difficult to have that calling routine do a second stage of filtering...the 'sub-rule' that I mentioned previously.
The only thing that might hinder such an endeavor by MWP would be if the license agreement between them and Andrey V. Sorokin prohibited them from pursuing this.
If it does not prohibit this, all MWP would have to do would be to modify the 'Edit Filter' GUI to allow a sub-rule to any rule specified. That sub-rule would simply be:
| Code: |
Each matching sub-string found within the specified text...
contains...does not contain...contains RegExp...does not contain RegExp
the sub-rule...either a text string or RegExp.
|
When the filter is run it would look through the text specified by the parent rule. Any text string that matches would then be tested by the sub-rule.
If the result is TRUE...the filter rule would return TRUE and go on to the next RULE. If the result is false, the filter would continue looking through the specified text for the next match of the parent rule.
The cycle would continue until either the sub-rule returned a TRUE or until the entire string specified by the parent rule was searched without a match of both the parent rule and the subrule.
This would allow a type of 'negative lookahead' to be used by MWPs filters and would be far easier to implement than modifying the RegExp engine would be.
|
|
| Back to top |
|
 |
IP: 66.44.*.*
Guest
|
Posted: Fri Nov 28, 2003 6:21 am Post subject: |
|
|
| Ikeb wrote: | The fellow who receives credit by FireTrust is one Andrey V. Sorokin who lives in St-Petersburg, Russia. FireTrust's credit references a web site which was giving Perry some grief a couple of weeks ago. It looks like Andrey hasn't yet cleaned up the site because I still get an MS Security popup asking if it's OK to install a certain 'Precision Time' program from GAIN Publishing (Terms and Conditions reference the Gator Corporation, an infamous Spyware generator).
The web site offers the TRegExpr utility. The last version of TregExpr was issued in October 2001, so it would appear Andrey is no longer actively developing his software. |
Are you sure that was 'TRegExpr' that was doing that???
I thought you all were saying that it was 'Visual RegExp' that was causing those problems.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Fri Nov 28, 2003 7:48 am Post subject: |
|
|
| Anonymous wrote: | | Are you sure that was 'TRegExpr' that was doing that??? |
Not TRegExpr itself. The web site from which TRegExpr can be obtained.
| Anonymous wrote: | | I thought you all were saying that it was 'Visual RegExp' that was causing those problems. |
Where did you get that impression?
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|