| View previous topic :: View next topic |
| Author |
Message |
Karel
Guest IP: 192.167.*.*
|
Posted: Wed Oct 08, 2003 7:09 pm Post subject: filter word in the body |
|
|
Hi
I have set a filter for some words in the body, but some spams are not filtered and marked for deletion even though I can see this word in the preview. If I send normal mail to my address with this word, then it is filtered OK. How is it possible?
Karel
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10629
|
Posted: Wed Oct 08, 2003 7:37 pm Post subject: |
|
|
Look at the source of the e-mail (right click and select view complete header) spammers do tricks to get around simple filters.
Look around here with the search function on the FireTrust catagory for "gary filters" and you will find some good discussion and sample filters. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
Hugh
Captain
 Premium Member
 Joined: Jun 03, 2003 Posts: 456
|
Posted: Wed Oct 08, 2003 7:50 pm Post subject: |
|
|
Well, it seems that your filter is working correctly, as you have tested it for yourself. Are you absolutely sure the words in the filter and the body are _exactly_ the same? (Look into the HTML). There isn't a dot or space in it you might have overseen?
Hugh
|
|
| Back to top |
|
 |
Karel
Guest IP: 192.167.*.*
|
Posted: Thu Oct 09, 2003 8:00 am Post subject: |
|
|
I was looking at the mail and the word Enlargement was written as
E<kehejlbcuqxcn>nl<khoejladpwslyp>arg<kfntuzbcoyml>em<kgkkmlgdeqz>e<kfrnvwvabwitlsp>n<koguobsroxvhibi>t
so the body filter will not recognize it.
But what I meant was, when in the MWP's preview window I see the word "Enlargement' displayed correctly, why it is not filtered. Or how can I set a filter for the text in the preview window.
Karel
|
|
| Back to top |
|
 |
TimeGhost
Captain

 Joined: Apr 11, 2003 Posts: 747 Location: USA
|
Posted: Thu Oct 09, 2003 1:26 pm Post subject: |
|
|
The trick is to write a filter that looks for a string of several nonsense characters enclosed in angle brackets. In a sense, this makes filtering a great deal easier -- you don't need a list of keywords and their mis-spelled variants (such as enlargement/enlargment). And you can be reasonably certain none of your friends would send a message to you like that, whereas a filter on enlargement might filter someone intersted in photography.
There have been several posts that discuss filters of this type. Off the top of my head, I'd create the filter this way (untested):
Body contains RegEx <[a-z]{5,}>
Of course, filters that rely on the message body need to download the message body....
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Thu Oct 09, 2003 3:28 pm Post subject: |
|
|
| Karel wrote: | I was looking at the mail and the word Enlargement was written as
E<kehejlbcuqxcn>nl<khoejladpwslyp>arg<kfntuzbcoyml>em<kgkkmlgdeqz>e<kfrnvwvabwitlsp>n<koguobsroxvhibi>t
so the body filter will not recognize it.
But what I meant was, when in the MWP's preview window I see the word "Enlargement' displayed correctly, why it is not filtered. Or how can I set a filter for the text in the preview window. |
At present, MWP isn't capable of performing a "filter on visible text" although it has been requested. As TimeGhost points out, you can however filter for the hidden "nonsense" HTML tags. Some MWP users find the presence of such tags to be a better SPAM indicator than the visible text itself (i.e. there's no need to look for specific words, just the weird tag).
|
|
| Back to top |
|
 |
Hugh
Captain
 Premium Member
 Joined: Jun 03, 2003 Posts: 456
|
Posted: Thu Oct 09, 2003 7:13 pm Post subject: |
|
|
| Karel wrote: | the word Enlargement was written as
E<kehejlbcuqxcn>nl<khoejladpwslyp>arg<kfntuzbcoyml>em<kgkkmlgdeqz>e<kfrnvwvabwitlsp>n<koguobsroxvhibi>t
so the body filter will not recognize it. |
Perhaps spammers try to bypass filters this way, I don't know
Hugh
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10629
|
Posted: Thu Oct 09, 2003 7:55 pm Post subject: |
|
|
That is the purpose of these kind of tricks.
Possibly as some have suggested mailwasher could implement an optional HTML stripping function for their filters. It would apply the filter conditions after first removing any html code. This wouldn't be clear cut as there is no positive way to tell HTML from normal text, just guesses that can be made based on probability. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Fri Oct 10, 2003 7:39 am Post subject: |
|
|
| stan_qaz wrote: | | Possibly as some have suggested mailwasher could implement an optional HTML stripping function for their filters. It would apply the filter conditions after first removing any html code. This wouldn't be clear cut as there is no positive way to tell HTML from normal text, just guesses that can be made based on probability. |
Huh? Then how come a browser or email reader can strip the HTML and display only the "normal" text reliably? Simply by treating tags (the text within the <> angle brackets) differently from the remaining text.
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10629
|
Posted: Fri Oct 10, 2003 6:57 pm Post subject: |
|
|
Ikeb, Yes the browsers do treat tags within the angle brackets different from normal text. But what they do is just ignore any text within the angle brackets that does not parse correctly into an expected format. To display an angle bracket in HTML you have to escape it or use a char entity.
Without building a more complete HTML parser into MW dealing with the bogus HTML tags problem will be tricky. Just dropping anything between angle brackets for e-mail with a MIME type of HTML would work for mailwasher but might cause problems when a mail program sends mail without the proper MIME settings. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Fri Oct 10, 2003 8:28 pm Post subject: |
|
|
| stan_qaz wrote: | | Just dropping anything between angle brackets for e-mail with a MIME type of HTML would work for mailwasher but might cause problems when a mail program sends mail without the proper MIME settings. |
Of course the email client needs to examine the HTML tags more closely however we're only considering MWP filtering. It seems to me that all MWP has to do is ignore any text within the angle brackets in order to filter on the displayed text.
|
|
| Back to top |
|
 |
DearWebby
Lieutenant

 Joined: Oct 03, 2003 Posts: 262 Location: Canada
|
Posted: Fri Oct 10, 2003 10:02 pm Post subject: |
|
|
If you NEVER have to send or request a code snippet, that regular expression should work fine, at least until MW learns to ignore stuff between brackets when filtering.
That would be the proper way to do it, not use the occurrence of text between brackets as a trigger.
Possibly you could rewrite that expression so that the filter ignores the brackets and anything that might be in between them, including foreign characters and smileys or high ASCII stuff.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Sat Oct 11, 2003 5:13 am Post subject: |
|
|
DearWebby, I'm confused. The Regular Expression TimeGhost suggested is intended to key on the nonsense characters between the <> brackets. Your comments seem to suggest that the filter should do something else. Or am I missing something?
|
|
| Back to top |
|
 |
DearWebby
Lieutenant

 Joined: Oct 03, 2003 Posts: 262 Location: Canada
|
Posted: Sat Oct 11, 2003 5:36 am Post subject: |
|
|
Ikester, he is proposing to use the occurrence of brackets and western alphabet characters as a trigger to delete mail.
If somebody sends him a tech support question and has an example of HTML in it, that would trigger the filter, the mail would be deleted, and the sender would think he is being ignored. The same would occur with mathematical formulas. They should not trigger a deletion and blacklisting.
Instead of using that as a trigger, the brackets and the stuff between them should simply be ignored for filtering purposes. That way the spread words will be shrunk together and can be seen by the filter the same way you see them in the list.
Ideally, that shrinking should be done at the program level and prefaced to any filter, not manually added as an "OR" choice to a filter.
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10629
|
Posted: Sat Oct 11, 2003 5:50 am Post subject: |
|
|
Again, I'm not a filter expert here... but wouldn't just ignoring the brackets and the text between them make it easy to give us a spam surprise by just putting brackets around it with a non html mime type? _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
|
|