CastleCops, Internet Crime Fighters
Need help? Click here to register for free! Absolutely zero advertisements on this site!

$9736.22 of $21422.68
left sidedonated so farneed $11686.46 donated to reach our goalright side, our goal
Help CastleCops serve the community on new servers, Donate Here to reach our goal.

Donation/Premium
spacer
block bottom
Security Central
spacer
· Home
· PIRT/Fried Phish
· MIRT
· SIRT
· Deutsch
· Wiki
· Newsletter
· O16/ActiveX
· CLSID List
· Contest2007
· Downloads
· Feedback (send)
· Forums
· HijackThis
· Hijacktrend
· LSPs
· My Downloads
· O18
· O20
· O21
· O22
· O23
· O9
· Premium
· Private Messages
· Proxomitron
· Reviews
· Search
· StartupList
· Stories Archive
· Submit News
· WsIRT
· Your Account
· Acceptable Use Policy
block bottom
Survey
spacer
Was 2007 a good year?

Yes it was a wonderful year
Yes, but there is always room for improvement
Status quo
It was a challenge
Other (leave comment)



Results
Polls

Votes: 940
Comments: 25
block bottom
spacer spacer

Targetted Bayesian filters & spam scoring

 
Post new topic   Reply to topic       All -> FavForums -> Product Suggestions [del.icio.us!] [digg it!] [reddit!]
View previous topic :: View next topic  
Author Message
LordLiverpool

Sergeant
Sergeant


Joined: Feb 19, 2004
Posts: 76
Location: Madrid, Spain (UK exile)

PostPosted: Sat Dec 30, 2006 7:21 pm    Post subject: Targetted Bayesian filters & spam scoring
Reply with quote

I imagine these ideas have been suggested before (I did look but I couldn't see anything recent). Anyway, here goes:

1) Targetted Bayesian filters

I don't find the learning filter very useful because it's too broad-brush to avoid plenty of false positives. I like to delete spam automatically if possible, but the learning filter doesn't give me the confidence to do this. It would be nice to be able to define, say, a "Stock pick spam" filter and feed all the stock pick emails into it so that MW learns the pattern of that type of email. I imagine it would need to draw on the pattern of legitimate emails too, to avoid tagging those. This specific type of filter might be accurate enough to allow the emails it tagged to be deleted with user intervention. This would be basically what I do manually with filters like "stock pick", "penile enhancement" and all the rest.

2) Spam scoring

This would allow a score to be calculated for emails based on rules which can be configured by the user, and execute actions (like automatic deletion) based on the score. Imagine that the following features tend to indicate spam:

1. Origin is the a "biz" domain
2. Origin is a hotmail address
3. It's been "spam-tagged"
4. The learning filter has identified it as spam
5. The subject contains words such as "opportunity", "rolex" or "casino"
6. The format of the header is invalid
7. The first part of your email appears in the subject as if it were a name ("dear Davidp" etc)
8. The text part of the email is identified as gobbledegook (see previous suggestion) and the email contains HTML

You might not have enough confidence to automatically delete emails based on any one of these, and no emails will comply with all of them. But if you could assign a "spam probability" score to each rule, and automatically delete if the score exceeded a certain number.


_________________
"I'd rather laugh with the sinners than cry with the saints"--Billy Joel
Back to top
View users profile Send private message
stan_qaz

Premium Member


Joined: Mar 31, 2003
Posts: 10612

Premium

PostPosted: Sat Dec 30, 2006 9:53 pm    Post subject:
Reply with quote

If your learning tool isn't catching almost all your spam and rarely making errors something is wrong. Mine hasn't tagged a good message in months and on 100-200 spams per day it rarely misses one.

You can't "target" a bayesian filter other than by training it, for targeting you'd have to do a regular filter.

Weighted results from the various tools would be excellent, several of us have discussed this here in the past but it isn't an easy add to the current way MW works. Check for posts by Ikeb on popfile and you'll see somethings that might work for you.


_________________
Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
Back to top
View users profile Send private message
LordLiverpool

Sergeant
Sergeant


Joined: Feb 19, 2004
Posts: 76
Location: Madrid, Spain (UK exile)

PostPosted: Sat Dec 30, 2006 11:36 pm    Post subject: Bayesian filters
Reply with quote

Well, I know you can't apply Bayesian filtering in a targetted way now, that's why I putting it forward as a suggestion. It's simply a matter of having a series of labelled "boxes" into which you direct your spam for training, rather than having one large box called "spam", as we have now.

MW catches practically all spam as it stands, "catch" meaning "correctly identify and label". There are very few false positives, too, so there's no complaint there. My suggestion is aimed at using Bayesian logic in a more targetted way to create very accurate automatic deletion filters. Obviously, when you're deleting automatically, you have to have a higher level of confidence. One false positive in three months is a big problem if I don't even realise the email ever existed without checking the stats.

I receive between 100 and 300 spam emails every day, so automatic deletion is a must to cut the numbers down. My current auto-delete filters work very well - just one of my "stock pick" filters accounts for 25% of spam. But they're hand-crafted, and what I've realised is that I'm really doing something by hand which Bayesian logic could do for me.


_________________
"I'd rather laugh with the sinners than cry with the saints"--Billy Joel
Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16509

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sun Dec 31, 2006 4:58 am    Post subject:
Reply with quote

Might be worth looking at ... or perhaps classification ala POPFile would accomplish the same thing? Still not accurate enough IMO. I'd sooner have the ability to combine regex filters with learning filter in some sort of weighted scheme. With just a logical AND of POPFile results and MWP regex filters, I get a FP rate of around 1 in 10,000 msgs.

Back to top
View users profile Send private message
Display posts from previous:   
Post new topic   Reply to topic       All -> FavForums -> Product Suggestions All times are GMT
Page 1 of 1

 
Quick Reply:
Username: 

Quote the last message
Attach signature (signatures can be changed in profile)
 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001 phpBB Group
spacer spacer