|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
Survey |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Sat Dec 20, 2003 1:56 am Post subject: Camouflaged Subjects Can't Be Filtered? |
|
|
Using the "View Complete Header" function in MW, I see the following ...
In the scrollable "message" box:
| Code: | | Subject: =?ISO-8859-1?b?QSBkYXRlIHdpdGggYSBtaWxmIGFsd2F5cyBlbmRzIGluIHN1Y2tpbmcgICAgN3d6eGZpMg==?= |
In the "Titles" area (above the separator):
| Code: | | A date with a milf always ends in sucking 7wzxfi2 |
So I think, "Make a filter for < subject > < contains > < =?ISO- > and mark for delete.
But MW never hits it, which I'm guessing must be because the subject passed to the filters is the "decoded" version. Which I wasn't even aware (until now) that MW would do.
Sure, I have plain text filters that will catch this one. But my point was that no one I really want to hear from would be camouflaging the subject line anyway, so I don't need it decoded. And then I wouldn't have to see the (impolite/undesirable/offensive) text in the mesasage list, either.
While rusticdog has tried to convince me that this decoding does not pose a potential security "hole" in MW, I'm still holding on to a tiny bit of paranoia in that matter. I don't pretend to know how it could be done, but if decoding and viewing an encoded attachment is dangerous, how is decoding and viewing an encoded subject less so? Except, of course, that the subject line is probably shorter, but doesn't that just mean it would take a more experienced hacker to exploit it?
Questions:- Am I correct that MW is filtering the decoded subject only?
- Can it be made to do both?
- Why is MW decoding headers in the first place?
- Why is there not an option for decode/don't decode headers?
In fact, now that I'm thinking about it, if MW can safely decode this stuff, then why can't there be a filter for < If decoded attachment > < contains > < qwertyuiop > ? _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Sat Dec 20, 2003 4:09 am Post subject: |
|
|
Eggman5x,
I would also like to see MWP provide for the choice of filtering on RAW text or translated text in the body. The SUBJECT problem you outline is an example of how that ability could come in handy.
I will post a filter that may help you with your problem.
The fist thing that you should be aware of is that I used to think that there was no legitimate purpose for the '=?ISO-8859-' in the subject line. That was before Ghol told me that there actually was a legitimate purpose. In his case...it is German umlauts ( Ä Ë Ï, etc). These are among the group of ASCII charactors that are 8-bit.
Because the charactor encoding used in the header of e-mails is only 7-bit, there has to be a way to encode 8-bit charactors into the Subject line of a message....or you could not have those charactors in the Subject line (it is part of the header).
In order to get around that problem, the encoding system for Subject lines was created. It allows 8-bit charators to be sent in the subject line in spite of the fact that the header is a 7-bit format. You posted an example of it. It should only be used if required by the presence of 8-bit charactors but the Spammers use it to obscure the subject.
The way to filter around that is to first check the RAW subject line to see if that kind of encoding is being used...then to check the decoded Subject to see if it contains 8-bit charators. If that encoding is present in the RAW Subject line, and the translated subject does not contain 8-bit charactors...then you can be almost 100% positive that the subject was encoded strictly to obscure the contents.
Here is the filter:
There are two rules...AND both (all) rules must be satisfied.
Rule 1.
| Code: | The entire header....contain RegExpr....
^Subject:[^\n]*?=?ISO-8859-[^\n]*?\n |
This rule looks at the RAW 'Subject:' line in the header for the key that signifies that there are encoded charactors in the subject line ( =?ISO-8859- ). If this is present...this rule will return TRUE.
By the way...the RegExprs used by MWP are case insensitive by default.
Rule 2.
| Code: | the 'Subject' field....does not contain RegExpr....
[\x80-\xFF] |
If there are no 8-bit charactors in the translated subject.......this rule will return TRUE.
If both rules return TRUE...the filter fires.
This should allow you to at least have a filter for those cases when this method is used strictly for obfuscating the subject line.
Hope this helps....
By the way....
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Sat Dec 20, 2003 4:14 am Post subject: |
|
|
ooops, I clicked the wrong button above (Submit instead of Preview)...such is the price of not logging in to this forum and having edit capabilities....
By the way....it looks like the example you posted contains NO 8-bit charactors...so the filter should fire on that example.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Sat Dec 20, 2003 12:15 pm Post subject: |
|
|
denn988:
Well, okay, I suspected there must be a legitimate use for it, but in the last 5 years I've gotten zero legitimate message that needed it. And since I'm just "marking" and not "auto-deleting" I don't have a "false positives" problem here. Obviously that will be different for others.
In any case, MW did not 'fire' on the < Subject contains '=?ISO-' > so I don't see how the regex is going to help. Unless you're saying that MW applies "contains" filters to the decoded subject and "containsRE" filters to the RAW subject. Is that documented somewhere? I sure missed it.
I'm going to setup your filter anyway and see what happens. I'll post the results when I have some, which shouldn't be long as I get 2 or 3 of these messages every day.
Thanks for your comments denn988. _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Sat Dec 20, 2003 12:56 pm Post subject: |
|
|
Eggman,
The difference is that the filter I posted above looks into the 'entire header' for the RegExp:
| Code: | | ^Subject:[^\n]*?=?ISO-8859-[^\n]*?\n |
The 'entire header' is NOT translated by MWP before the filter is applied. This filter would see the target string.
What you were trying to filter was looking into the 'Subject' field for:
The 'Subject' field is translated by MWP before the filter is applied. Your filter would never see the target string.
INT QRK???
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Sat Dec 20, 2003 11:37 pm Post subject: |
|
|
| denn988 wrote: | | INT QRK??? |
Um - er - eh?
So again you're saying that MW does handle the subject line differently from the rest of the message?
OK, I'll take your word for it for now. But the following questions remain:
Where is this documented? I didn't know MW was going to decode the subject - or anything else. Nor did I know that filtering the subject was subject to different rules than any other part of the message. I'm not sure I want it to, in any case. If it's going to, I think it should definitely be in the docs - preferably with a great big WARNING! in front of it. And ideally it should be "configurable" by the user.
If MW can safely decode/display the subject, can it safely decode/display attachments as well? If not, why? What's the difference? (Actually I'm pretty sure the answer should be, "Yes, it can." because I do it all the time in perl.) If so, an "If an attachment (contains/containsRE/etc.)" filter is very desirable, or at least the ability to apply "body" filters to attachments, subject to a "Treat attachments as part of the body" user option, of course.
If subject decoding is automatic, can the encoded subject be retained for filtering? Yes, it might mean having the choice of "If decoded subject ..." or "If original subject ...", but it's got to save considerable time to be able to look for a short string like "=?ISO-" in just the subject line versus the "entire header", eh?.
Thanks again for your response, denn988.
Eggman5X _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10612
|
Posted: Sun Dec 21, 2003 12:57 am Post subject: |
|
|
I'd like to see the decoded / raw option for not only the subject but the body filters. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
Whisperer
Sergeant

 Joined: Mar 29, 2003 Posts: 134 Location: USA
|
Posted: Sun Dec 21, 2003 2:06 am Post subject: |
|
|
Wouldn't the following enable the user to choose having a filter work off of the raw Subject contents?
| Code: |
Entire Header contains: "Subject: specified text"
|
That way you're making "Subject:" part of the actual text string it's looking for in the header.
Just a thought.
|
|
| Back to top |
|
 |
denn988ß
Guest IP: 66.44.*.*
|
Posted: Sun Dec 21, 2003 2:56 am Post subject: |
|
|
| stan_qaz wrote: | | I'd like to see the decoded / raw option for not only the subject but the body filters. |
Stan,
That would be one of the improvements to the filters that Firetrust should really include in an upgrade. They could add a couple radio buttons that would be greyed out for those fields that would not use that option, but if you choose either- the 'Subject' field
- the body
you could have the radio buttons available to select NORMAL (translated) or RAW text.
I think that it might be unlikely for them to change the filtering though....simply because they are really focusing their efforts toward 'First Alert'. They can get a subscription fee from every user of 'First Alert' and they are more than likely going to concentrate their efforts on the things that will generate the most income.
It would be nice if I am wrong about that....because I would really like to see several improvements to their filter capabilities.
|
|
| Back to top |
|
 |
denn988¡
Guest IP: 66.44.*.*
|
Posted: Sun Dec 21, 2003 3:24 am Post subject: |
|
|
| Eggman5X wrote: | | denn988 wrote: | | INT QRK??? |
Um - er - eh?
So again you're saying that MW does handle the subject line differently from the rest of the message?
OK, I'll take your word for it for now. But the following questions remain:
Where is this documented? I didn't know MW was going to decode the subject - or anything else. Nor did I know that filtering the subject was subject to different rules than any other part of the message. I'm not sure I want it to, in any case. If it's going to, I think it should definitely be in the docs - preferably with a great big WARNING! in front of it. And ideally it should be "configurable" by the user.
If MW can safely decode/display the subject, can it safely decode/display attachments as well? If not, why? What's the difference? (Actually I'm pretty sure the answer should be, "Yes, it can." because I do it all the time in perl.) If so, an "If an attachment (contains/containsRE/etc.)" filter is very desirable, or at least the ability to apply "body" filters to attachments, subject to a "Treat attachments as part of the body" user option, of course.
If subject decoding is automatic, can the encoded subject be retained for filtering? Yes, it might mean having the choice of "If decoded subject ..." or "If original subject ...", but it's got to save considerable time to be able to look for a short string like "=?ISO-" in just the subject line versus the "entire header", eh?.
Thanks again for your response, denn988.
Eggman5X |
Eggman,
The attachments are body parts. Depending on the type of attachment, MWP will either display it in the preview window or not. If the attachment is a text file, MWP will probably display it there, if not, it probably won't.
By the way....
INT QRK is a Radio operators code using what are referred to as Q and Z codes. The QRK is a readability code and would normally be followed by a number between 1 and 5. The higher the number the better the signal is getting through.
When preceded by the INT, it becomes a question (interrogative). INT QRK translates to gHow are you reading me?h
Another one that you might be interested in is ZBM2. If a radio operator sends you that code, you can be sure that he is quite unhappy with you. It means "Put a qualified operator on the line".
I wonft even get into gINT WTFh, because there may be children on this topic.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16509
|
Posted: Sun Dec 21, 2003 7:48 am Post subject: |
|
|
| denn988ß wrote: | I think that it might be unlikely for them to change the filtering though....simply because they are really focusing their efforts toward 'First Alert'. They can get a subscription fee from every user of 'First Alert' and they are more than likely going to concentrate their efforts on the things that will generate the most income.
It would be nice if I am wrong about that....because I would really like to see several improvements to their filter capabilities. |
Frankly, if I believed that FireTrust is planning to focus only on FA!, I'd be looking for something else, because I still don't see any evidence that FA! has a future. Admittedly I was wondering if FireTrust's development plans might be ruled by fears of cannibalizing their own services. However RD says they are starting implementation of Bayesian filtering in the next beta release in the new year! So that's evidence they are not just banking on FA!.
Hopefully we'll see FireTrust further developing other filter capabilities including your subrule idea, a more flexilbe rules structure, the URL blacklist, "sans tags" text filtering, filter "weighting" (ala Spam Sleuth), and other excellent ideas. Perhaps we might even convince FireTrust to give us the means to peer into Base64 (and possibly other) encoded message streams for key SPAM indicators.
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10612
|
Posted: Sun Dec 21, 2003 10:31 pm Post subject: |
|
|
I have to agree that the First Alert isn't working out, my current hit rate is about 15 percent, worse than most of the filters I use and way worse than the combination of them. The beta program was getting hits on 40% plus and looked to be improving but the trend now is just creeping.
The no false positives has not worked well as several of my newsletters have been listed and unless they are in the friends list or I catch them manually they could be lost. So far it hasn't hit on non-bulk e-mail that I have seen, but this is hard to confirm since you have to disable the friends list to be sure it isn't hitting on them.
I am also getting many repetetive spams that look about the same but that First Alert isn't catching, something that never happens with my "url in body" filter that you folks helped me perfect.
I'd put my effort into finding bugs, improving the friends list (to check for the from string not just the e-mail address), improving the existing filtering and adding a new filter tool for the "url in body" trap that I and so many others have found simple and effective and finally adding Bayesian filtering. Might even look at doing the proxy thing to try to reduce the "spam gap" in the checking mail sequence but that would be the lowest priority of the above.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Mon Dec 22, 2003 12:11 am Post subject: |
|
|
I still think - as I think I suggested 6 months ago - the single, most effective and easiest thing to implement would be to allow any individual header to be specified in the first part of the filter rule.
It would ease the use of MW with other tools (including the beloved POPfile) without having to scan the "entire header". It would likewise make it easier to take advantage of the increasing number of tools being implemented by ISPs. Mine, for example, adds at least 6 headers to every message, including the result of a reverse DNS lookup on the originating IP. Right now I filter for | Code: | | entire header containsRE X-Note:(.+?)[NO REVERSE DNS] | but I'm sure | Code: | | X-Note contains [NO REVERSE DNS] | would process much faster, especially since X-Note usually appears in one of the last 3 lines of headers.
This would not require a change to the layout of the filters.txt file, as any value other than the current 7 "The"'s could be considered a header "name". The UI could use a "combo" box for entry instead of a "list" box. Internally MW could build an indexed array of all headers in a message using the header "name" as the index.
Am I missing something or could it really be that easy? _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Mon Dec 22, 2003 1:27 am Post subject: |
|
|
| Eggman5x wrote: | I still think - as I think I suggested 6 months ago - the single, most effective and easiest thing to implement would be to allow any individual header to be specified in the first part of the filter rule.
It would ease the use of MW with other tools (including the beloved POPfile) without having to scan the "entire header". It would likewise make it easier to take advantage of the increasing number of tools being implemented by ISPs. Mine, for example, adds at least 6 headers to every message, including the result of a reverse DNS lookup on the originating IP. Right now I filter for | Code: | | entire header containsRE X-Note:(.+?)[NO REVERSE DNS] | but I'm sure | Code: | | X-Note contains [NO REVERSE DNS] |
would process much faster, especially since X-Note usually appears in one of the last 3 lines of headers.
This would not require a change to the layout of the filters.txt file, as any value other than the current 7 "The"'s could be considered a header "name". The UI could use a "combo" box for entry instead of a "list" box. Internally MW could build an indexed array of all headers in a message using the header "name" as the index.
Am I missing something or could it really be that easy? |
Eggman,
Your proposal would be extremely difficult for MWP to implement, and not worth any software writer's effort.
The reason....You can place any kind of 'Title' in the header that you can imagine. There would be no possible way for MWP or any other software vendor to be able to give you all those choices as 'canned' options.
There is also nothing that requires such a 'Title' as "X-Note:" to be contained in the header. You may see that often because your ISP is putting it there...but how is MWP to know what the many thousands of ISPs all over the world may be adding to the headers.
Also....
In order to find any given 'Title' in the header, you still have to search through the header...from the beginning. MWP is already doing that now for each of the different fields that they now provide as an option. The search would have to be performed regardless of whether the 'Title' field is included in their optional search fields or not.
If you really want to speed up your RegExp filters, add the ^ charactor at the beginning of the line when searching for a header 'Title' as follows: | Code: | | entire header containsRE ^X-Note:(.+?)[NO REVERSE DNS] |
This will effectively have the RegExp ignore any two charactor pattern that does not start with \nx (newline followed by x). As you have written the expression, every time the charactor 'x' appears in the header, it will be tested to see if it matches the expression. That may not be so bad as far as the letter 'x' goes, but what if it was a letter that was used more often?
By coding the RegExp to look at the beginning of the line for 'Titles' in the header lines, you are speeding up the process dramatically.
The other thing that you can do to speed up your RegExps is to place a terminator on the RegExp.
The way that your example is written, it will continue its search clear on through to the end of the header if the matching text at the end of the expression is not there.
For example. If the line that you posted will never go on past the first newline, you could re-write it as follows: | Code: | | ^X-Note:[^\n]*?[NO REVERSE DNS] |
If the text contained in that line would never contain a :' you could use that as a limiter as follows: | Code: | | ^X-Note:[^:]*?[NO REVERSE DNS] |
That would prevent the RegExp from looking beyond the next 'Header Title' as the header titles are terminated by the : charactor.
In those cases where the line may continue for several newlines, you could do it this way: | Code: | | ^X-Note:(.*?[\n]){0,4}[^\n]*?[NO REVERSE DNS] |
That would limit your search to no more than four lines beyond the beginning of the search string. (Credit to IKEB for a very good limiting strategy on this one)
Any of those three examples listed directly above would prevent your RegExp from searching beyond the point that is needed in those cases where the key text at the end of the expression is not there.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Mon Dec 22, 2003 2:58 am Post subject: |
|
|
| denn988 wrote: | Your proposal would be extremely difficult for MWP to implement, and not worth any software writer's effort.
The reason....You can place any kind of 'Title' in the header that you can imagine. There would be no possible way for MWP or any other software vendor to be able to give you all those choices as 'canned' options.
There is also nothing that requires such a 'Title' as "X-Note:" to be contained in the header. You may see that often because your ISP is putting it there...but how is MWP to know what the many thousands of ISPs all over the world may be adding to the headers. |
Bulldroppings!
MW doesn't NEED to know all the possibilities. Anything that is not in the current list - "The 'From' field", etc. - is consider a user header and accepted without validation. If a matching header is found in a message, the expression is applied to the "content" or "value" of the header. If not, the filter is ignored.
Likewise, MW doesn't need to know what the possibilities are in a message. It simply parses the headers, splitting on the RFC required colon-space. Whatever's on the left is the header 'name', what's on the right is the value.
If you wish to debate the simplicity further, please PM.
| denn988 wrote: | | If you really want to speed up your RegExp filters, add the ^ charactor at the beginning of the line when searching for a header 'Title' as follows: |
Good point here. Especially when filtering "the entire header".
| denn988 wrote: | | For example. If the line that you posted will never go on past the first newline, you could re-write it as follows: |
As I understand the RFC, the "value" of any header may span any number of multiple lines as long as each line after the first begins with a space. Therefore any limiting strategy is arbitrary and can be defeated by the simple insertion of additional lines containing at least one space.
| denn988 wrote: | | If the text contained in that line would never contain a :' you could use that as a limiter as follows: |
The pattern in question contains (.+?) which means anything might or might not be in between the "header title" and the part of the string I'm interested in and that its length is variable. There is no known, always present value that tells me I can stop looking, except as you point out later, the next "header title". But even then the check would be for "^\w" not ": " because ": " could be a legitimate part of the "value".
But thanks for commenting, anyway. _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|