|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
Survey |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
LordLiverpool
Sergeant

 Joined: Feb 19, 2004 Posts: 76 Location: Madrid, Spain (UK exile)
|
Posted: Sat Dec 30, 2006 7:21 pm Post subject: Targetted Bayesian filters & spam scoring |
|
|
I imagine these ideas have been suggested before (I did look but I couldn't see anything recent). Anyway, here goes:
1) Targetted Bayesian filters
I don't find the learning filter very useful because it's too broad-brush to avoid plenty of false positives. I like to delete spam automatically if possible, but the learning filter doesn't give me the confidence to do this. It would be nice to be able to define, say, a "Stock pick spam" filter and feed all the stock pick emails into it so that MW learns the pattern of that type of email. I imagine it would need to draw on the pattern of legitimate emails too, to avoid tagging those. This specific type of filter might be accurate enough to allow the emails it tagged to be deleted with user intervention. This would be basically what I do manually with filters like "stock pick", "penile enhancement" and all the rest.
2) Spam scoring
This would allow a score to be calculated for emails based on rules which can be configured by the user, and execute actions (like automatic deletion) based on the score. Imagine that the following features tend to indicate spam:
1. Origin is the a "biz" domain
2. Origin is a hotmail address
3. It's been "spam-tagged"
4. The learning filter has identified it as spam
5. The subject contains words such as "opportunity", "rolex" or "casino"
6. The format of the header is invalid
7. The first part of your email appears in the subject as if it were a name ("dear Davidp" etc)
8. The text part of the email is identified as gobbledegook (see previous suggestion) and the email contains HTML
You might not have enough confidence to automatically delete emails based on any one of these, and no emails will comply with all of them. But if you could assign a "spam probability" score to each rule, and automatically delete if the score exceeded a certain number. _________________ "I'd rather laugh with the sinners than cry with the saints"--Billy Joel
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10612
|
Posted: Sat Dec 30, 2006 9:53 pm Post subject: |
|
|
If your learning tool isn't catching almost all your spam and rarely making errors something is wrong. Mine hasn't tagged a good message in months and on 100-200 spams per day it rarely misses one.
You can't "target" a bayesian filter other than by training it, for targeting you'd have to do a regular filter.
Weighted results from the various tools would be excellent, several of us have discussed this here in the past but it isn't an easy add to the current way MW works. Check for posts by Ikeb on popfile and you'll see somethings that might work for you. _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
LordLiverpool
Sergeant

 Joined: Feb 19, 2004 Posts: 76 Location: Madrid, Spain (UK exile)
|
Posted: Sat Dec 30, 2006 11:36 pm Post subject: Bayesian filters |
|
|
Well, I know you can't apply Bayesian filtering in a targetted way now, that's why I putting it forward as a suggestion. It's simply a matter of having a series of labelled "boxes" into which you direct your spam for training, rather than having one large box called "spam", as we have now.
MW catches practically all spam as it stands, "catch" meaning "correctly identify and label". There are very few false positives, too, so there's no complaint there. My suggestion is aimed at using Bayesian logic in a more targetted way to create very accurate automatic deletion filters. Obviously, when you're deleting automatically, you have to have a higher level of confidence. One false positive in three months is a big problem if I don't even realise the email ever existed without checking the stats.
I receive between 100 and 300 spam emails every day, so automatic deletion is a must to cut the numbers down. My current auto-delete filters work very well - just one of my "stock pick" filters accounts for 25% of spam. But they're hand-crafted, and what I've realised is that I'm really doing something by hand which Bayesian logic could do for me. _________________ "I'd rather laugh with the sinners than cry with the saints"--Billy Joel
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16509
|
Posted: Sun Dec 31, 2006 4:58 am Post subject: |
|
|
Might be worth looking at ... or perhaps classification ala POPFile would accomplish the same thing? Still not accurate enough IMO. I'd sooner have the ability to combine regex filters with learning filter in some sort of weighted scheme. With just a logical AND of POPFile results and MWP regex filters, I get a FP rate of around 1 in 10,000 msgs.
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You can attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|