|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
Survey |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
IP: 24.136.*.*
Guest
|
Posted: Thu Nov 27, 2003 5:18 pm Post subject: Filter nonexistent email addresses? |
|
|
How would I filter out nonexistant email addresses? For example, I might get an email from something like this:
?B8L?#LSJK<@>
MWP somehow knows that it's not a real email address, because it shows all that crap as the friendly name, and puts empty brackets for the email address. It doesn't let you blacklist it. Well, I'd like to filter out or blacklist empty email addresses. How do you do this? TIA
-Jeremy
|
|
| Back to top |
|
 |
IP: 68.51.*.*
Guest
|
Posted: Thu Nov 27, 2003 11:07 pm Post subject: |
|
|
You can try this filter:
| Code: | | [enabled],"[1] Bad ""From"" (F)","[1] Blank From: Address (F)",255,OR,Delete,From,doesn'tContainRE,"[\w.-]+@([\w-]+\.)+[A-Z]{2,4}" |
It is copied from Gary P.'s filter page here:
http://www.w5hq.com/MailWasher/MailWasherFilters.txt
I do not use this filter so I cannot reply on how it functions in real life but you'll need to test run it to see if it gives any false positives. The filter sets catches to be deleted since in an incomplete from field their is no address that can be blacklisted.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16506
|
Posted: Fri Nov 28, 2003 6:29 am Post subject: |
|
|
| Anonymous wrote: | | The filter sets catches to be deleted since in an incomplete from field their is no address that can be blacklisted. |
Presumably one could set a wildcard entry in the blacklist for '?B8L?#LSJK@* in this case. I don't imagine it would ever be a valid email address.
Do you get repeats of the same address and/or common character patterns Jeremy?
|
|
| Back to top |
|
 |
Anonymous, I guess
Guest IP: 67.2.*.*
|
Posted: Fri Nov 28, 2003 7:01 am Post subject: I'm glad I found this post |
|
|
| Anonymous wrote: | You can try this filter:
| Code: | | [enabled],"[1] Bad ""From"" (F)","[1] Blank From: Address (F)",255,OR,Delete,From,doesn'tContainRE,"[\w.-]+@([\w-]+\.)+[A-Z]{2,4}" |
It is copied from Gary P.'s filter page here:
http://www.w5hq.com/MailWasher/MailWasherFilters.txt
I do not use this filter so I cannot reply on how it functions in real life but you'll need to test run it to see if it gives any false positives. The filter sets catches to be deleted since in an incomplete from field their is no address that can be blacklisted. |
I'm glad I found this post because I found that filter and I do not believe it does what is intended. I have two issues with it:
- I looked in the ICANN website and there appear to be legal TLD's of more than 4 letters (.museum)[/i]
- That feature doesn't really work anyways because, as soon as it finds a presumed TLD of 2-4 characters it will consider it a match since the filter doesn't look for an end of the string and thus would fail to exclude TLD's which are too long to be legal.[/i]
With simple modifications these problems are resolved and
| Code: |
[\w.-]+@([\w-]+\.)+[A-Z]{2,4}
|
becomes
| Code: |
[\w\.-]+@([\w-]+\.)+[A-Z]{2,6}>$
|
The ">$" prevents something like "no_one@nowhere.way_too_long_tld" from being accepted as valid. The 6 instead of 4 is there to accomodate ".museum".
|
|
| Back to top |
|
 |
IP: 68.51.*.*
Guest
|
Posted: Fri Nov 28, 2003 7:27 am Post subject: Re: I'm glad I found this post |
|
|
| Anonymous, I guess wrote: |
I'm glad I found this post because I found that filter and I do not believe it does what is intended. I have two issues with it: |
Hello Anonymous, I guess,
As I mentioned in the above post I do not use the filter (since I get almost no spam with an invalid from address) so I have no real life experience with the filter but for the common knowledge how is the real life with your modification?
Btw, thanks for posting a further development of the filter.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16506
|
Posted: Fri Nov 28, 2003 7:32 am Post subject: Re: I'm glad I found this post |
|
|
| Anonymous, I guess wrote: |
| Code: |
[\w\.-]+@([\w-]+\.)+[A-Z]{2,6}>$
|
The ">$" prevents something like "no_one@nowhere.way_too_long_tld" from being accepted as valid. The 6 instead of 4 is there to accomodate ".museum". |
Good points ... except that the '>' isn't always present. So I'd say the regex should be
| Code: | | [\w\.-]+@([\w-]+\.)+[A-Z]{2,6}>??$ |
|
|
| Back to top |
|
 |
Anonymous, I guess
Guest IP: 67.2.*.*
|
Posted: Fri Nov 28, 2003 9:04 am Post subject: Re: I'm glad I found this post |
|
|
| Ikeb wrote: | | Anonymous, I guess wrote: |
| Code: |
[\w\.-]+@([\w-]+\.)+[A-Z]{2,6}>$
|
The ">$" prevents something like "no_one@nowhere.way_too_long_tld" from being accepted as valid. The 6 instead of 4 is there to accomodate ".museum". |
Good points ... except that the '>' isn't always present. So I'd say the regex should be
| Code: | | [\w\.-]+@([\w-]+\.)+[A-Z]{2,6}>??$ |
|
I didn't think of that. I suppose there's no reason why ">" should be there if some spam mailing program or worm SMTP client doesn't choose to put it there. BTW, what's the difference between the "?" and "??" iterators?
| Anonymous wrote: |
Hello Anonymous, I guess,
As I mentioned in the above post I do not use the filter (since I get almost no spam with an invalid from address) so I have no real life experience with the filter but for the common knowledge how is the real life with your modification?
Btw, thanks for posting a further development of the filter. |
Actually, I thought the same thing about it not being very practical but I've seen it be triggered several times in a couple of days. I've started using MailWasher a couple of days ago because I'm being flooded by the Swen.a worm and I wanted to be able to delete those messages on the server without having to go to the webmail (in addition to POP, my ISP allows access through the web) every time. I actually put together a set of filters that seem to work well for the purpose of excluding Swen.a worm infected messages. I was surprised to find that a few messages (not many, I grant you) where filtered by the non-existent address filter (most or all were using an empty e-mail address).
So yes, some real world e-mails come with a malformed email address. However, counting on this to catch Swen.a won't work as the wormy e-mails produced by Swen.a with a malformed e-mail address are in the minority.
I won't post those here because they'll mess with the formatting (being rather long) but they can be found at the following address:
http://webspace4me.net/~cosmicaug/msf.html#swenfil
I think the best working filter in there is the last one which wasn't really somethig I put together (I just read about that approach elsewhere and tried to rewrite so that mailwasher would understand it).
It's has catch everything so far that something else (like a black hole or the non-existent mail address filter) hasn't caught first.
I also turned it into a batch file that appends the filters to "filters.txt" (adding an "echo" to the beginning of the lines and an append redirector --">>"-- to the end of the lines) to get around the problem of unwittingly adding unwanted linefeeds when cutting and pasting by hand.
Though the filters have produced zero false positives, I can't claim they won't do so as I haven't really received that much e-mail the last day (other than the wormy messages, of which I've received plenty).
BTW, I'm not really intending to be anonymous, I'm just too lazy to register.
--August Pamplona
|
|
| Back to top |
|
 |
Anonymous, I guess
Guest IP: 67.2.*.*
|
Posted: Fri Nov 28, 2003 9:09 am Post subject: Re: I'm glad I found this post |
|
|
| Anonymous, I guess wrote: |
BTW, I'm not really intending to be anonymous, I'm just too lazy to register.
--August Pamplona |
Which is too bad bacause as an unregistered guest I can't go back and edit the many spelling mistakes one only manages to see after clicking the 'Submit' button.
--August Pamplona
|
|
| Back to top |
|
 |
IP: 68.51.*.*
Guest
|
Posted: Fri Nov 28, 2003 9:50 am Post subject: |
|
|
Welcome to the MailWasher forums August Pamplona. Looks like the forum has gained another great poster.
For those who use IE:
http://www.iespell.com/
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16506
|
Posted: Fri Nov 28, 2003 7:52 pm Post subject: Re: I'm glad I found this post |
|
|
| Anonymous, I guess wrote: | | Ikeb wrote: | Good points ... except that the '>' isn't always present. So I'd say the regex should be
| Code: | | [\w\.-]+@([\w-]+\.)+[A-Z]{2,6}>??$ |
|
I didn't think of that. I suppose there's no reason why ">" should be there if some spam mailing program or worm SMTP client doesn't choose to put it there. BTW, what's the difference between the "?" and "??" iterators? |
From the TRegExpr help document:
| Code: | ? zero or one ("greedy"), similar to {0,1}
?? zero or one ("non-greedy"), similar to {0,1}? |
So if the search engine had a choice, the former would chose one while the latter would chose zero. To be honest I don't know if there could be something that would make a difference in this case. Denn988 has convinced me to default to the non-greedy form unless there's a reason to do otherwise.
Yeah, even if applying the 'Code' and '/Code' BBCode tags to bracket such filter expressions, the formatting is screwed up.
Thanks for putting up the web page and providing the reference.
| Anonymous, I guess wrote: | BTW, I'm not really intending to be anonymous, I'm just too lazy to register.
--August Pamplona |
But not too lazy to put up a web page of some neat Swen filters? What's up with that?
|
|
| Back to top |
|
 |
Anonymous, I guess
Guest IP: 67.2.*.*
|
Posted: Fri Nov 28, 2003 8:46 pm Post subject: In form spell checkingand I've just |
|
|
| Anonymous wrote: | Welcome to the MailWasher forums August Pamplona. Looks like the forum has gained another great poster.
For those who use IE:
http://www.iespell.com/ |
Thanks, I didn't know about that. I'm actually using Firebird most of the time which doesn't have a spellchecker extension yet but I've just found out that there's one under development right now which I've downloaded and may test if I feel brave enough. The Mozilla suite (that's the older project which includes browser and mail/news client functionality --whereas the newer pre-release project separate it to Firebird and Thunderbird respectively) has had spellchecking built into builds starting at 1.5b and the Mozilla Thunderbird mail/news client also has it built in.
|
|
| Back to top |
|
 |
IP: 67.2.*.*
Guest
|
Posted: Fri Nov 28, 2003 9:10 pm Post subject: Re: I'm glad I found this post |
|
|
| Ikeb wrote: | | Anonymous, I guess wrote: |
I didn't think of that. I suppose there's no reason why ">" should be there if some spam mailing program or worm SMTP client doesn't choose to put it there. BTW, what's the difference between the "?" and "??" iterators? |
From the TRegExpr help document:
| Code: | ? zero or one ("greedy"), similar to {0,1}
?? zero or one ("non-greedy"), similar to {0,1}? |
So if the search engine had a choice, the former would chose one while the latter would chose zero. To be honest I don't know if there could be something that would make a difference in this case. Denn988 has convinced me to default to the non-greedy form unless there's a reason to do otherwise.
|
Got it! Greedy vs. non-greedy was a little hard to grasp on the first couple of readings (for me in any case) but I think I've got it now.
However, giving it a second look I'm not sure why we should care what the regexp returns since the filters, in the context of MailWasher, operate in a boolean fashion (a match either exists or it doesn't --with actual contents of the match not being of great consequence).
By the way, I've already had a false positive (it was actually spam but the e-mail address was valid) because of the issue corrected by the addition of "?". Which shows the issue you raised does have "real world" application.
Yeah, even if applying the 'Code' and '/Code' BBCode tags to bracket such filter expressions, the formatting is screwed up.
Thanks for putting up the web page and providing the reference.
| Ikeb wrote: | | Anonymous, I guess wrote: | BTW, I'm not really intending to be anonymous, I'm just too lazy to register.
--August Pamplona |
But not too lazy to put up a web page of some neat Swen filters? What's up with that?  |
Just weird, I guess.
--August Pamplona
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16506
|
|
| Back to top |
|
 |
IP: 24.136.*.*
Guest
|
Posted: Wed Dec 03, 2003 12:00 am Post subject: |
|
|
| Ikeb wrote: |
Presumably one could set a wildcard entry in the blacklist for '?B8L?#LSJK@* in this case. I don't imagine it would ever be a valid email address.
Do you get repeats of the same address and/or common character patterns Jeremy? |
No, it's always something different.
-Jeremy
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|