|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
denn988
Guest IP: 66.44.*.*
|
Posted: Mon Dec 22, 2003 4:07 am Post subject: |
|
|
| Eggman5X wrote: | | denn988 wrote: | Your proposal would be extremely difficult for MWP to implement, and not worth any software writer's effort.
The reason....You can place any kind of 'Title' in the header that you can imagine. There would be no possible way for MWP or any other software vendor to be able to give you all those choices as 'canned' options.
There is also nothing that requires such a 'Title' as "X-Note:" to be contained in the header. You may see that often because your ISP is putting it there...but how is MWP to know what the many thousands of ISPs all over the world may be adding to the headers. |
Bulldroppings!
MW doesn't NEED to know all the possibilities. Anything that is not in the current list - "The 'From' field", etc. - is consider a user header and accepted without validation. If a matching header is found in a message, the expression is applied to the "content" or "value" of the header. If not, the filter is ignored.
Likewise, MW doesn't need to know what the possibilities are in a message. It simply parses the headers, splitting on the RFC required colon-space. Whatever's on the left is the header 'name', what's on the right is the value.
If you wish to debate the simplicity further, please PM. |
The addition of a text box to insert a header 'field' in another place on the GUI, or re-writing the GUI to accept any arbitrary header 'field' in the current 'fields' options would be a total waste of programming time. The option for any header 'field' you could possibly want to look into is already available through the proper use of RegExps.
I am not going to debate what would be a total waste of programmer time with you for a concept that can easily be done with the program as written. Your idea would not speed up the process one bit, and it would waste valuable codewriting time that could be spent developing more worthwhile features.
| Eggman5x wrote: |
| denn988 wrote: | | If you really want to speed up your RegExp filters, add the ^ charactor at the beginning of the line when searching for a header 'Title' as follows: |
Good point here. Especially when filtering "the entire header".
| denn988 wrote: | | For example. If the line that you posted will never go on past the first newline, you could re-write it as follows: |
As I understand the RFC, the "value" of any header may span any number of multiple lines as long as each line after the first begins with a space. Therefore any limiting strategy is arbitrary and can be defeated by the simple insertion of additional lines containing at least one space.
| denn988 wrote: | | If the text contained in that line would never contain a :' you could use that as a limiter as follows: |
The pattern in question contains (.+?) which means anything might or might not be in between the "header title" and the part of the string I'm interested in and that its length is variable. There is no known, always present value that tells me I can stop looking, except as you point out later, the next "header title". But even then the check would be for "^\w" not ": " because ": " could be a legitimate part of the "value".
But thanks for commenting, anyway. |
You seem concerned about speeding up your filters. I offered a suggestion that would help by terminating the RegExp in those cases where the following target string is not present. I use terminations of some kind or another on all of my filters and they speed through the messages. In the past, I used RegExps that were not teminated as explained in my previous post, and the results were that the filters would take much longer to process....
Using (.+?) in your filter without a terminator will result in the expression searching completely through the test string if the next target is not present. That suggestion was only intended to be good filter writing advice. Take it as you wish.
As far as your concern about: | Eggman5x wrote: | As I understand the RFC, the "value" of any header may span any number of multiple lines as long as each line after the first begins with a space. Therefore any limiting strategy is arbitrary and can be defeated by the simple insertion of additional lines containing at least one space (emphasis added).
|
Anyone who tried to use that kind of strategy to defeat a filter would be caught so easily by another filter tuned for just such a tactic.
As far as 'Bulldroppings!' goes.
I will remember that. You can be assured that I will no longer offer YOU any useful advice, nor try answer any of your questions if that is your attitude.
|
|
| Back to top |
|
 |
Whisperer
Sergeant

 Joined: Mar 29, 2003 Posts: 134 Location: USA
|
Posted: Mon Dec 22, 2003 5:38 am Post subject: |
|
|
| stan_qaz wrote: | | -snip- ...adding a new filter tool for the "url in body" trap that I and so many others have found simple and effective... -snip- |
I'd love to know what that is.
Again, I'm trying to use filters in a hide-and-delete and no-false-postiives manner by manually pasting URLs from spam bodies into filters. I'll see if the effort proves worth the time. I'd love to know a reliable shortcut!!!
The thing is, NO FALSE POSITIVES is my first priority.
IMHO, if MW were to add a very easy way to do the following, it might be a very useful addition:
1. Search the raw source body for all instances of "http" or "www."
2. Highlight as much of the URL that's found and with a click add it to a sepcial filter that can have many more than ten trigger strings and that knows it should look in the body.
Not in this lifetime, I know. Just daydreaming.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Mon Dec 22, 2003 7:35 am Post subject: |
|
|
| Eggman5X wrote: | | I still think - as I think I suggested 6 months ago - the single, most effective and easiest thing to implement would be to allow any individual header to be specified in the first part of the filter rule. |
I take it that when you say "header" you are in fact referring to each 'parameter' in the message header?
| Eggman5X wrote: | | It would ease the use of MW with other tools (including the beloved POPfile) without having to scan the "entire header". |
Not sure how POFile gets into the picture here but it really doesn't take a lot of time to check the header. That part of the message is what ... perhaps 20 lines max?
| Eggman5X wrote: | | It would likewise make it easier to take advantage of the increasing number of tools being implemented by ISPs. Mine, for example, adds at least 6 headers to every message, including the result of a reverse DNS lookup on the originating IP. |
Keep in mind that my ISP isn't likely to format the header 'parameters' the same way your ISP happens to do it.
| Eggman5X wrote: | Right now I filter for | Code: | | entire header containsRE X-Note:(.+?)[NO REVERSE DNS] | |
In a regex, the [ and ] brackets are used to specify character classes. I would think you need to structure the regex as:
| Code: | | entire header containsRE X-Note:(.+?)\[NO REVERSE DNS\] |
| Eggman5X wrote: | but I'm sure | Code: | | X-Note contains [NO REVERSE DNS] | would process much faster, especially since X-Note usually appears in one of the last 3 lines of headers. |
I don't follow this. Why would this process faster? ... Or does Denn988's suggested use of the ^ regex feature supercede this?
| Eggman5X wrote: | This would not require a change to the layout of the filters.txt file, as any value other than the current 7 "The"'s could be considered a header "name". The UI could use a "combo" box for entry instead of a "list" box. Internally MW could build an indexed array of all headers in a message using the header "name" as the index.
Am I missing something or could it really be that easy? |
Sounds like an interesting idea but I'm just not so sure that it buys you all that much.
Edited the filter I proposed. (Added a space and then took it back out.)
Last edited by Ikeb on Mon Dec 22, 2003 9:24 am, edited 2 times in total |
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Mon Dec 22, 2003 8:27 am Post subject: |
|
|
| Eggman5X wrote: | Bulldroppings!
MW doesn't NEED to know all the possibilities. Anything that is not in the current list - "The 'From' field", etc. - is consider a user header and accepted without validation. If a matching header is found in a message, the expression is applied to the "content" or "value" of the header. If not, the filter is ignored.
Likewise, MW doesn't need to know what the possibilities are in a message. It simply parses the headers, splitting on the RFC required colon-space. Whatever's on the left is the header 'name', what's on the right is the value.
If you wish to debate the simplicity further, please PM. |
While I agree with you that this doesn't seem all that difficult, it would be one more thing that takes time to do. I just don't see the value in doing something like this that can be handled equally well with the current regex search engine ... assuming it is "tuned" as suggested by Denn988. I'd much prefer FireTrust spend their time developing other features.
BTW, why PM stuff like this? Wouldn't it be better to discuss in open forum?
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Mon Dec 22, 2003 9:16 am Post subject: |
|
|
| Whisperer wrote: | | stan_qaz wrote: | | -snip- ...adding a new filter tool for the "url in body" trap that I and so many others have found simple and effective... -snip- |
IMHO, if MW were to add a very easy way to do the following, it might be a very useful addition:
1. Search the raw source body for all instances of "http" or "www."
2. Highlight as much of the URL that's found and with a click add it to a sepcial filter that can have many more than ten trigger strings and that knows it should look in the body.
Not in this lifetime, I know. Just daydreaming. |
Why say that? FireTrust has been responsive to ideas expressed here. They can't do everything at once though. But if there's enough folks chanting for it.....
WRT adding the URL to the blacklist, the current 'Normal view' shows the links being referenced already. When clicking on the link a window pops up that warns that the link to be opened may be harmful and gives a choice as to whether to continue. I'd say that window could be modified to add a button to drop the URL into the blacklist. Also, a right-click menu item could be added when moused over a link that does the same thing.
WRT what gets added to the list .... that should normally be just the domain name part. An option should allow wildcarding of subdomains without the current email address wildcarding dilemma (i.e. automatically remove any previously blacklisted subdomains now covered by the wildcard). When a subaccount is indicated with a subdomain.domain.TLD/~accountname/etc... everything to the accountname should be the default URL. An option to wildcard accountnames could be allowed, again with the provision to clear out any listed URLs covered by the wildcard.
The list itself could also be added to or edited directly from the SPAM Tools and IMO is important enough to warrant a tab in the filter sidebar. An import/export function would be a good idea so that MWP users can share SPAMversized site URLs.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Mon Dec 22, 2003 9:34 am Post subject: |
|
|
| Ikeb wrote: | | Eggman5X wrote: | | I still think - as I think I suggested 6 months ago - the single, most effective and easiest thing to implement would be to allow any individual header to be specified in the first part of the filter rule. |
I take it that when you say "header" you are in fact referring to each 'parameter' in the message header? |
If by parameter you mean "X-user-defined-header: some data", yes.
| Ikeb wrote: | | Eggman5X wrote: | | It would ease the use of MW with other tools (including the beloved POPfile) without having to scan the "entire header". |
Not sure how POFile gets into the picture here but it really doesn't take a lot of time to check the header. That part of the message is what ... perhaps 20 lines max? |
POPfile uses headers to "classify" the messages, correct? Such as "X-Text-Classification: spam". The rest is basic math. It is going to take less time to apply any string comparision or regex to one line of data than to 20. The amount of time saved is proportionate to the number of messages, the number of hits, and the relative position of the desired text within the "entire header".
| Ikeb wrote: | | Eggman5X wrote: | | It would likewise make it easier to take advantage of the increasing number of tools being implemented by ISPs. Mine, for example, adds at least 6 headers to every message, including the result of a reverse DNS lookup on the originating IP. |
Keep in mind that my ISP isn't likely to format the header 'parameters' the same way your ISP happens to do it. |
Ahh, but they must, Most Honorable Ikeb-san. The format is defined by an RFC. (Sorry, I don't know the specific number off the top of my feeble little head). There are the standard headers - From:, To:, Received:, Subject:, etc. - optional headers - Message-Id:, Content-type:, etc. - and user-defined headers, which by agreed to convention are of the form "X-something: data". The header "title" or "name" must begin in column 1, and be followed by a colon and a space, then the data. The data may be split over multiple lines, by leaving the first column blank. Even MW relies on adherence to this format.
| Ikeb wrote: | | Eggman5X wrote: | Right now I filter for | Code: | | entire header containsRE X-Note:(.+?)[NO REVERSE DNS] | |
In a regex, the [ and ] brackets are used to specify character classes. I would think you need to structure the regex as:
| Code: | | entire header containsRE X-Note:(.+?)\[NO REVERSE DNS\] |
|
You're absolutely right. My mistake. Should have done a copy/paste instead of trying to type from memory.
| Ikeb wrote: | | Eggman5X wrote: | but I'm sure | Code: | | X-Note contains [NO REVERSE DNS] | would process much faster, especially since X-Note usually appears in one of the last 3 lines of headers. |
I don't follow this. Why would this process faster? ... Or does Denn988's suggested use of the ^ regex feature supercede this? |
While the use of the "^" anchor will speed things up, there is still the question of skipping past n headers to get to "^X-Note: ". What I'm suggesting is an internal, indexed array which would immediately access the corresponding value, if present, or return 'false' if no such header was in the message. For those who are familiar with writing CGI programs, it is very similar to the standard approach for processing name=value pairs of form data.
| Ikeb wrote: | | Eggman5X wrote: | This would not require a change to the layout of the filters.txt file, as any value other than the current 7 "The"'s could be considered a header "name". The UI could use a "combo" box for entry instead of a "list" box. Internally MW could build an indexed array of all headers in a message using the header "name" as the index.
Am I missing something or could it really be that easy? |
Sounds like an interesting idea but I'm just not so sure that it buys you all that much. |
Again, it depends on the number of tests, number of messages, etc. But what it does offer in all cases, is an extremely flexible means of accessing a particular substring of "The entire header". And since the headers must conform to the protocol it is forward-compatible for any header added by any program or service that complies with the POP3 protocol, which we're likely to be using for some time.
The problem of SPAM is being attacked from many angles. And at this time the biggest effort is being made by ISPs who are implementing all sorts of spam-hunting/killing tools on their servers. For many users this will make MW redundant and obsolete, as they will simply let their ISP's tools auto-delete whatever they sees fit to consider spam. Some ISP's may choose to not even give the end-user a choice.
Those of us who hang out here, I dare say, are those who are not quite so ready and willing to let someone else make our spam/not spam decisions for us. Even so our configurations vary widely, and are likely to become more so every day. What I think I'm suggesting is a way to make it easy to take maximum advantage of whatever additional tools any of us might have available, without negatively impacting MW's performance.
I'm sure there are many who are not getting the results they could get from MW, or perhaps not using it at all, because they are put off by those scary, weird looking regexes. Heck, even those of us who are competent or better complain about the struggle to get just the right regex from time to time. What I'm seeing - or hallucinating, perhaps - is that many filters that are now "containsRE" could be expressed in the more easily-understood "contains" form. Note also that MW does not even define all the standard headers in the filter rules dropdown list, such as the oft referenced "Content-transfer-encoding:"
Obviously it would be most desirable to do some performance-testing analysis of this technique before putting it in a public release. Is there an overhead to the indexed array that I am not aware of? Perhaps. I don't claim to be an expert on modern OS/compiler/interpreter internals. But I'm sure that one thing that was deeply-drilled into our heads back in my IBM/360 assembler classes has not changed: There is no operation more time consuming that a string within a string comparison.
If someone can prove me wrong, please feel free to do so. _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Mon Dec 22, 2003 9:44 am Post subject: |
|
|
| Ikeb wrote: | | BTW, why PM stuff like this? Wouldn't it be better to discuss in open forum? |
I was inviting denn988 to PM if he/she wished to continue his/her personal assault on my understanding of programming techniques as evidenced by the remark:
| denn988 wrote: | | Your proposal would be extremely difficult for MWP to implement, and not worth any software writer's effort. |
After 30 years of writing code in dozens of languages from assembler to C++, perl, and Java - and being paid quite nicely for it - I do know perhaps a little bit whereof I speak. If anyone wants to challenge that I suggest it would be in everyone's best interests to do so privately. _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Mon Dec 22, 2003 10:50 am Post subject: |
|
|
M2C re: "url in body" trap, if I may ...
I have been tempted to do this many times myself, but as yet have not found a message that would be caught by this method that would not be caught by some less-specific means. Even so, if it were available as a drag/drop, or right-click "add to list" option, I might give it a test drive. Although I would suggest a separate "body blacklist" would perform better than mingling these items with the "from blacklist".
There are a few other things that may be worth keeping in mind here though.
First, spammers are going to change URLs as frequently as most of us change underwear. Maybe more often. So this is a list which will likely fill quickly but where individual entries will have a relatively short useful lifespan.
Second, as has been discussed in other threads, there are many ways of obsfucating a URL. This can only mean much more time spent "selecting" URLs to blacklist, which has also been commented on, and which seems in direct opposition to what most of us seek to gain by using MW in the first place, which is less time spent dealing with spam.
Third, is the ever-present of Base64, and quoted-printable encoding both of which are going to make such a list less "productive".
As an alternative, here is my latest version of "Removers" which I find particularly valuable. I suggest placing it near the bottom of your filters list to start and gradually moving it higher if you like the results.
| Code: | [enabled],Removers,Removers,6815952,OR,Delete,
Body,containsRE,\/remove\.(asp|cfm|cgi|html|jsp|php),
Body,containsRE,\/un?sub(scribe)?\.(asp|cfm|cgi|html|jsp|php),
Body,containsRE,\/opt-?out\.shtml,
Body,containsRE,mailto:(.+?)\?subject=(opt-?out|remove|unsub(scribe)?),
Body,containsRE,(^|\W+)To\W+Be\W+Removed(\W+|$)
|
Happy spam-hunting to all, and to all a goodnight.
 _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Mon Dec 22, 2003 4:41 pm Post subject: |
|
|
| Eggman5X wrote: | | Ikeb wrote: | BTW, why PM stuff like this? Wouldn't it be better to discuss
in open forum? |
I was inviting denn988 to PM if he/she wished to continue his/her personal assault on my
understanding of programming techniques as evidenced by the remark:
| denn988 wrote: | Your proposal would be extremely difficult for MWP to implement, and not worth
any software writer's effort. |
|
Eggman,
That was not a personal assault. It may have been a poorly written statement that could have been interpreted as such, but it was not intended in that manner. That does explain the 'Bulldroppings' comment though, as a response to a perceived personal attack. Given that I responded in kind...I apologize for that.
| Eggman5X wrote: |
After 30 years of writing code in dozens of languages from assembler to C++, perl, and Java - and being paid quite nicely for it - I do know perhaps a little bit whereof I speak. If anyone wants to challenge that I suggest it would be in everyone's best interests to do so privately. |
While my overall programming experience may not be as extensive as yours (I am primarily a hardware developer) I have done quite a bit of programming in the past 27 years myself. The first programs I wrote had to be entered into the computer in binary format by flipping toggle switches, one 16-byte word at a time. Most tedious.
I am not trying to challenge your abilities in programming here. I was just trying to point out that the usefulness of the code has to be weighed against the time it takes to write it, and the priority given to the function that it is accomplishing.
It is possible that MWP is using some other form than simple text comparisons to parse the headers into the 'fields' that they allow the user to look into. They could conceivably be doing that function as you suggest, in a binary form...creating lookup tables to begin and end a text search within a header field.
The fact that the subject headers are REQUIRED to be in US-ASCII and follow the form:- Beginning of text (or after HEX code sequence 0D0A (CR LF) - followed by
- any number of printable ASCII charactors (HEX code 21 to 7E - 20 (a space) is excluded because at the beginning of the line it would signify a 'foldover' of the last line) - followed by
- the ASCII charactor ':' (Hex code 3A) The presence of this charactor terminates the header 'Title'
They could then set up a lookup table for header titles that contains a pointer to the locations within the header text (by header 'Title') that contain the 'data' for that 'Title'
They could also continue to parse the header for that 'Title' to find the end point of the 'data' - by looking for the specific hex code combination of 0D0A followed by
- 21 to 7E (CR LF followed by any printable ASCII charactor other than a 'space').
The position just prior to that HEX code combination could be placed in the lookup table as the pointer that
indicates the end of the 'data' field for that title.
The user could do as you suggest...enter any header 'Title' they wish and MWP could immediately find the field (based on the pointers in the lookup table) and perform the string match specified by the rule (contains, contains RegExpr, etc).
<blockquote>NOTE:
The above sequences are to be considered as examples only. The 'space' charactor would include
the TAB (HEX 09) and the FORMFEED (HEX 0C). Also....The sequence 0D0A0D0A would signify the
end of the 'header', and the beginning of the 'body'.</blockquote>
I want you to know that as I have been writing this out, I have seen the advantages to such a method. You would no longer be required to 'terminate' the RegExp for use in the header, as the field is already pre-limited by the lookup table.
The creation of the lookup table would also be done much faster than parsing based on 'string' searches, so there would also be that advantage.
In light of what I now understand concerning your suggestion...I withdraw any arguments that I have made against it. It would probably be worth the effort to do it, because the advantages gained by simplifying the header 'Title' search would be well worth the codewriting time that it would take to implement...and the extra 'box' that may be required in the GUI for such a feature would be justified by the overall improvement in user friendliness
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Mon Dec 22, 2003 9:21 pm Post subject: |
|
|
| denn988 wrote: | | That was not a personal assault. It may have been a poorly written statement that could have been interpreted as such, but it was not intended in that manner. That does explain the 'Bulldroppings' comment though, as a response to a perceived personal attack. Given that I responded in kind...I apologize for that. |
Accepted. And my apology to you for any misinterpretation and overreaction on my part
As to the rest, yeah, that's what I meant. Perhaps my first explanation was a little too terse. I never was that good at writing specs for others to code. _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Mon Dec 22, 2003 9:56 pm Post subject: |
|
|
| Eggman5X wrote: | | denn988 wrote: | | That was not a personal assault. It may have been a poorly written statement that could have been interpreted as such, but it was not intended in that manner. That does explain the 'Bulldroppings' comment though, as a response to a perceived personal attack. Given that I responded in kind...I apologize for that. |
Accepted. And my apology to you for any misinterpretation and overreaction on my part
As to the rest, yeah, that's what I meant. Perhaps my first explanation was a little too terse. I never was that good at writing specs for others to code. |
Accepted here also.
I understand your suggestion now, but I had to put myself through the exercise of modeling it before I did understand. All in all I now think that it would be an excellent investment in programming time for Firetrust to work on.
I also understand where you are coming from as far as writing specs for others to code. I have developed software models that demonstrate any and all functions that the targeted 'hardware' will provide, including the full repertoire of commands and responses associated with the hardware. I even demonstrate any self-troubleshooting capabilities that may or may not be readily apparent.
The 'real' (in other words 'Professional') programmers can use that model to understand, and take advantage of, all the possibilities that are available when they write the actual 'operational' software for that hardware.
Building those software models is also 'most' tedious, and requires extensive comments in the 'source' code of the model. The programmers do appreciate it though, because most (not all) of them are not hardware oriented. They do appreciate a 'hardware type' that can put together such a software model for them. It makes their job a lot easier.
|
|
| Back to top |
|
 |
Whisperer
Sergeant

 Joined: Mar 29, 2003 Posts: 134 Location: USA
|
Posted: Mon Dec 22, 2003 11:34 pm Post subject: |
|
|
| Eggman5X wrote: | | M2C re: "url in body" trap, if I may ... I have been tempted to do this many times myself, but as yet have not found a message that would be caught by this method that would not be caught by some less-specific means. -snip- |
I can appreciate that, plus please remember two things:
1. You have about a zillion times more programming experience and understanding of writing expressions for MWP filters than me or the vast majority of MWP users.
2. The risk of false positives when dealing with filters that hide and delete is not acceptable for a business owner like myself, and even your "remove" filters are likely to lose some wanted if not important mail.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Tue Dec 23, 2003 1:12 am Post subject: |
|
|
| Ikeb wrote: | | Eggman5X wrote: | Right now I filter for | Code: | | entire header containsRE X-Note:(.+?)[NO REVERSE DNS] | |
In a regex, the [ and ] brackets are used to specify character classes. I would think you need to structure the regex as:
| Code: | | entire header containsRE X-Note:(.+?)\[NO REVERSE DNS\] |
|
Good catch Ikeb...
I was looking at another aspect of the expression and did not notice that myself.
| Ikeb wrote: | | Eggman5X wrote: | but I'm sure | Code: | | X-Note contains [NO REVERSE DNS] | would process much faster, especially since X-Note usually appears in one of the last 3 lines of headers. |
I don't follow this. Why would this process faster? ... Or does Denn988's suggested use of the ^ regex feature supercede this?
| Eggman5X wrote: | This would not require a change to the layout of the filters.txt file, as any value other than the current 7 "The"'s could be considered a header "name". The UI could use a "combo" box for entry instead of a "list" box. Internally MW could build an indexed array of all headers in a message using the header "name" as the index.
Am I missing something or could it really be that easy? |
Sounds like an interesting idea but I'm just not so sure that it buys you all that much.
|
Ikeb,
After examining what Eggmann suggested more closely (see my post above) I have to agree with him on this one. His suggestion would indeed make searching through headers easier. By limiting the portion of the header searched to the individual 'field', an expression 'terminator' would not be required.<blockquote>Note: RegExp filters to be used on the body would still benefit from the use of a 'terminator'.</blockquote>
There is one thing that I think would be an additional improvement on his idea. It would be the way the the 'Received:' fields are handled.
Unlike most other fields in the header, the 'Received:' fields will usually occur multiple times in the header, and each time is meaningful on its own. It would be most desirable to make certain that if there are multiple instances of the same named field, that MWP's filters would search through each instance of the named 'field' separately.
You would want to be sure that each instance of a named field was looked into for a match....at least to the point that no prior instance of that field returned a positive (triggered the rule).
Between Eggman's suggestion and the addition of 'Sub-Rules' to the filter scheme you could have a very powerful filter tool available that each user could easily tailor to their own situation.
|
|
| Back to top |
|
 |
Eggman5X
Captain

 Joined: Mar 13, 2003 Posts: 699 Location: HOU TX USA
|
Posted: Tue Dec 23, 2003 2:03 am Post subject: |
|
|
| Whisperer wrote: | | 2. The risk of false positives when dealing with filters that hide and delete is not acceptable for a business owner like myself, and even your "remove" filters are likely to lose some wanted if not important mail. |
I totally agree on avoiding false positives. But I'd like to repeat here - and elaborate further on - something I just sent you in a PM ...
First, I'm going to assume that you've got a working blacklist, friends list, and a few "legitimate" filters for newsgroups, mailing list subscriptions, etc.
Now obviously if you were going to use "Removers" - or any other "mail is spam" filter, you want in your filter after your legitimate mail filters.
So the mail you can defintely recognize as good is going to be caught by your friends list or your "legitimate" filters without "Removers" ever seeing it.
So, what we're concerned about here is false positives on "unsure" mail. Things like first contacts from new customers, and so on.
Say your product is a nutritional supplement. (Where did I get that idea? ) How often does someone write to inquire about your product and use the words "unsubscribe", "remove", "optout" etc.?
| Code: | | I was looking at your product and thought it might help me, but just last week I had my doctor remove one lung and one kidney and am wondering if it would still be safe for me to use with all the other medications I'm taking. What do you think? |
All the time? Really? Ok, you've got a problem ...
But let's go a step further. Look at those spams again. It's not just "remove" or "unsubscribe", but many times it's "remove.php" or "unsubscribe.asp". And unless one wanted to unsubscribe their snake, that's probably not going to give you a false positive any time soon, eh?
One more thing (and then I'll shut up) ...
{collective sign of relief from audience}
If absolutely zero false positives is your goal, and you don't want any more software to deal with than MW and an email program, then make sure you're using an email program that has it's own friends list or filtering scheme and can deliver mail to multiple mailboxes or folders or whatever it wants to call them.
Next, make sure that your friends list in MW is duplicated in your email program, and have it deliver mail from friends to "Now" and the rest to "Later".
Theoretically, if you want to go the POPfile route you can use the classification header POPfile adds to filter the mail to separate destinations in your email program. But as Ikeb has pointed out, then your statistics in POPfile are going to be doo-doo. (Which amuses the daylights out of me since POPfile is Bayesian, and Bayesian is really nothing more than an implementation of a statisitical analysis. But there I go picking those nits again ...)
OK. I shut up now. _________________ Lightly scrambled, over-easy and stuffed with all sorts of goodies.
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|