|
Donation/Premium |
|
 |
|
|
|
|
|
|
|
Survey |
|
 |
|
|
|
|
|
|
|
 |
 |
| View previous topic :: View next topic |
| Author |
Message |
Whisperer
Sergeant

 Joined: Mar 29, 2003 Posts: 134 Location: USA
|
Posted: Wed Dec 24, 2003 6:55 am Post subject: Filtering for URLs and other text in body |
|
|
I'm determined to avoid false positives and have decided to try making a lot of filters that contain links found in the body of spams I get, on the assumption that there are a lot of repeats coming in. Initial testing shows that within a few days this is catching about 20-50% of the spam I get.
I'm setting these as hide-and-delete filters, so I can't leave room for error. I figure that as long as I'm only using the URLs that the spams want me to click through to, or even the main domains as appears appropriate, I'm safe on that issue.
My question is this. Sometimes I see two URLs in sequence, separated by an asterisk, such as:
On the assumption that spammers will change things frequently, but also that they're less likely to change the URL they want you to click through to (yes, I realize that's a hypothetical assumption and that they can use redirects to keep us guessing until the cows come home), I'm just wondering which of the two you'd expect to remain as the more permanent one, the first or the second in the sequence.
I also note that usually when I see this the first one it at a major ISP like yahoo, while the latter is an unknown domain.
On a separate note, sometimes the body contains nothing but pure gibberish in a long string of characters making up a sizable paragraph. I'm just wondering if it might not be somewhat useful to take the first dozen or two dozen characters and a put them into a filter on the chance that the spammers will use the same gibberish at least for a while before changing it.
Thanks.
Last edited by Whisperer on Mon Dec 29, 2003 4:15 am, edited 1 time in total |
|
| Back to top |
|
 |
IP: 66.44.*.*
Guest
|
Posted: Wed Dec 24, 2003 7:35 pm Post subject: Re: Filtering for URLs and other text in body |
|
|
Whisperer,
Ikeb posted a filter that is designed to look for such tricks. I don't have the time to do a search for it right now...but perhaps Ikeb will post a link to it.
In the meantime....I have a RegExpr that could help for such a filter.
To prevent it from scrolling, I am posting it in smaller sections. Just cut and paste it together in the order posted. You can add more domains to it as required.
| Code: |
http://[^"<>]*?[.]? |
| Code: | | (aqmp\.net|asphost\.com|carlz\.us|carriespickspreview\.com| |
| Code: | | cherrypickedofferz\.com|controlz\.us|dkldk\.com|dock1\.com| |
| Code: | | dubnh\.us|ero-roots\.com|ecom-universe\.net|emedorders\.com|faithweb\.com| |
| Code: | | ff545zz\.com|figure7v\.com|ghkp\.us|goandbuyit\.com| |
| Code: | | gono\.us|hostgym\.com|imgehost\.com|kiffergly\.net| |
| Code: | | klinelenderspress\.com|linkcounter\.com|re55steel4\.com| |
| Code: | | remote-cars1\.com|rx359\.net|mdwebdoctor\.com|medsfactory\.com| |
| Code: | | netidcuh\.com|oldcactus\.com|paylesscanadiandrugs\.com| |
| Code: | | prefer\d+f\.com|pills\d+as\.com|rhinoceros\.us| |
| Code: | | sacrosanctraindrop\.net|shopnsavecentral\.com| |
| Code: | | savvypurchaser\.com|seeingnoone\.com|swena\.net| |
| Code: | | tashabo\.com|theholmesgroup\.com|unone\.us|webrxonline\.com|\.biz) |
You will note that at the end of the above RegExp...any link to a '.biz' address will trigger the expression. You can remove that, or any pther domain name if you would like. You
|
|
| Back to top |
|
 |
IP: 65.37.*.*
Guest
|
Posted: Sun Dec 28, 2003 6:37 am Post subject: Not too shabby |
|
|
I want to report my results so far with creating hide-and-delete filters based mainly on the URLs in the body but also using some key phrases in the body, Subject, and From fields.
It's been about a week to ten days and I have to say that, while it's been quite a lot of work so far to create -- whoa! -- almost 70 filters with anywhere from one or two to up to the max number of expressions each (I'm not using RegExpr), the work is decreasing dramatically day by day.
In the past few days, out of about fifty spams, I'd say the filter caught as much as 75% of them overall, and in the past 24-48 hours it's caught, well, about 12 out of 18, and then, just now, seven out of seven. And because of the way I'm doing it, I feel the chance of false positives is slim to none.
I'd say I've had to create maybe a dozen new entries in the past two days.
That's starting to become considerably less time than it takes me to visually scan through many dozens of spams before deleting them.
The question is how long these will continue to be valid and how fast new ones will keep appearing.
I'll try to post a follow-up... or, feel free to remind me to do so.
|
|
| Back to top |
|
 |
IP: 65.37.*.*
Guest
|
Posted: Sun Dec 28, 2003 6:37 am Post subject: Not too shabby |
|
|
I want to report my results so far with creating hide-and-delete filters based mainly on the URLs in the body but also using some key phrases in the body, Subject, and From fields.
It's been about a week to ten days and I have to say that, while it's been quite a lot of work so far to create -- whoa! -- almost 70 filters with anywhere from one or two to up to the max number of expressions each (I'm not using RegExpr), the work is decreasing dramatically day by day.
In the past few days, out of about fifty spams, I'd say the filter caught as much as 75% of them overall, and in the past 24-48 hours it's caught, well, about 12 out of 18, and then, just now, seven out of seven. And because of the way I'm doing it, I feel the chance of false positives is slim to none.
I'd say I've had to create maybe a dozen new entries in the past two days.
That's starting to become considerably less time than it takes me to visually scan through many dozens of spams before deleting them.
The question is how long these will continue to be valid and how fast new ones will keep appearing.
I'll try to post a follow-up... or, feel free to remind me to do so.
|
|
| Back to top |
|
 |
Whisperer
Sergeant

 Joined: Mar 29, 2003 Posts: 134 Location: USA
|
Posted: Sun Dec 28, 2003 6:41 am Post subject: |
|
|
That double-post, above, was a mistake -- obviously.
And I thought I was logged in but apparently I wasn't.
My bad.
Whisperer
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Sun Dec 28, 2003 7:34 am Post subject: |
|
|
Yes indeed. But you were logged in for your OP. That's the one that's screwing up this page's width. Just edit that post and break up that one long line that insists on posting without wrapping....
The filter for redirected HTTP links you were inquiring about:
| Code: | | If the Body contains the RegExpr "(?i)<\s*a[\s\w=]+(?s)href=(3D)??"?http://[\d\w\./]+(@|{0,5}64;|\*|{0,5}42;).+>" then hide the message from the messages list, and mark the message as mail to be deleted. This filter takes priority over the friends list. |
BTW, I'm finding that Denn988's Banned Dialup filter strategy is just the ticket for me! The strategy focuses on the header, doesn't involve endless filter additions, and can't be easily bypassed by a SPAMer. It really is brilliant ... as the Eggman stated right after Denn988 posted it but which took me a day or so -- along with some help from Denn988 -- to fully appreciate.
|
|
| Back to top |
|
 |
Whisperer
Sergeant

 Joined: Mar 29, 2003 Posts: 134 Location: USA
|
Posted: Mon Dec 29, 2003 4:21 am Post subject: |
|
|
| Ikeb wrote: | | Yes indeed. But you were logged in for your OP. That's the one that's screwing up this page's width. Just edit that post and break up that one long line that insists on posting without wrapping.... |
Gotcha. Done.
| Ikeb wrote: | The filter for redirected HTTP links you were inquiring about:
| Code: | | If the Body contains the RegExpr "(?i)<\s*a[\s\w=]+(?s)href=(3D)??"?http://[\d\w\./]+(@|{0,5}64;|\*|{0,5}42;).+>" then hide the message from the messages list, and mark the message as mail to be deleted. This filter takes priority over the friends list. |
|
Thanks. But how good is it for reliably avoiding false positives?
| Ikeb wrote: | | BTW, I'm finding that Denn988's Banned Dialup filter strategy is just the ticket for me! The strategy focuses on the header, doesn't involve endless filter additions, and can't be easily bypassed by a SPAMer. It really is brilliant ... as the Eggman stated right after Denn988 posted it but which took me a day or so -- along with some help from Denn988 -- to fully appreciate. |
Cool... but, uh, how good is it for reliably avoiding false positives?
Thanks!
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Mon Dec 29, 2003 5:30 am Post subject: |
|
|
| Whisperer wrote: |
Thanks. But how good is it for reliably avoiding false positives? |
Just within the last week, out of the 183 SPAM messages my filters detected, this filter detected 19 of them. Since I've been using it, I have not found a single false positive.
BTW the "Buried Email address" regex I also included in that same post has detected 30 SPAM messages over the last week. I fully expected that one to get false positives from my emag subscriptions but since I combine that one with another SPAM indicator, I haven't had a single false positive with that one either.
| Whisperer wrote: |
Cool... but, uh, how good is it for reliably avoiding false positives?
|
Too early to make a certainty of it, but so far so good! In the last couple of days the filters I've set up to trigger on suspect "Received: from" field SPAM indicators has caught almost ALL the SPAM detected! (I.e. other filters might well have detected the same message based on other indicators but the higher priority of the "Received: from" filters means they trigger if SPAM is detected that way.) In fact the only other filter that I can recall has detected SPAM is the "Redirected HTML" filter.
I have had false positives on a couple I'm still testing and which may yet prove to be dead end ideas but the one posted by Denn988, as well as a couple more I've developed from his, have yet to trigger falsely!
|
|
| Back to top |
|
 |
AlphaCentauri
SIRT Handler Premium Member
 Joined: Nov 20, 2003 Posts: 2763
|
Posted: Mon Dec 29, 2003 5:12 pm Post subject: autodeleting by url's in spam |
|
|
I also primarily screen by URL's. As far as the autodelete, there is one very big caveat: If you are using Reg Exp's so you can have lots of URL's in one filter, you may make a mistake and have two dividers next to each other (||) or end a line with a divider. It won't be easy to see because they look like l's. And it won't be easy to notice its effect because it will label everything that isn't on the Friends list as spam, and usually, it is. But it's an easy mistake to make even if you are aware of it, and you may not want to set any reg exps. to autodelete.
You are right that there are a lot of repeats. The spam is advertising the sites, and if the sites change names, the nitwits who answer the spam can't send money. So they fake the header, but never the URL, and at least for a few weeks, you get a lot of spam for the same URL.
They do put a lot of nonsense in front of the address. I only screen for the last domain name before the .com, .net, or whatever. The rest doesn't mean anything.
I also have a text file where I keep a copy of my filtered expressions. Basically, I just keep copying and pasting into that file with dividers. Then I copy and paste it into a mailwasher filter called "Link to Spam Recent" and make it my highest level filter after "fake from me." When I get to the point where word pad has to start a new line even without text wrap being on, it's time to start a new line in the Mailwasher filter, too. I guess when I get to 10 lines of filters, I'll swap out the first line and see if any of those URL's are still showing up. Anytime a piece of spam is caught by a lower level filter, I open up "show complete header" and harvest the URL for another filtering expression and paste it into my wordpad file and then into mailwasher.
The ones in base 64 are harder. I go to http://www.opinionatedgeek.com/dotnet/tools/Base64Decode/safedecode.aspx to decode it, and strip out letters four at a time and keep decoding until I get to the sequence I can filter for. I have a separate filter "base 64 href" that has the various permutations of letters that could code "href=http://" since I don't know of anyone who sends html code in base 64. Again, I don't autodelete, but it's been 100% specific so far. You can also strip out the code for the actual url's but since there are several ways to code each sequence of English letters, it's pretty tedious to do for every spammer. I just go for the ones that include an html link. I figure the odds that the same long sequence of letters will show up in a photo or something is pretty remote.
Another sneaky thing is spammers that don't actually include a link in the message -- they require the recipient to cut and paste the address into an email or browser. Of course, that's not as effective a way of marketing their sites. I screen for body contains "html" AND body contains ">g<|>o<|>e<" (e because it's common, g and o because gono.com is the main offender and they can't break up that name in too many different ways.)
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16515
|
Posted: Mon Dec 29, 2003 6:12 pm Post subject: |
|
|
I've toyed with the SPAMversized URL myself and built up the list of sites that will be detected. After only a week or less, I was beginning to see hits (i.e. repeated SPAMversizements for the same site). I was using this strategy to "blacklist" any SPAM that "leaked" through my other filters.
However since Denn988 posted his "Received: from" strategy and adding some of my own filters based on this strategy, I haven't had any further "leaks". Therefore I haven't made use of this strategy since Christmas.
But back to your post Alpha, I didn't bother with Base64 encoded messages. I figured the pain wasn't worth it. (Actually I never had one of those "leak" through anyway.) Of course if FireTrust were to add a decoder, that would allow easy extension of SPAMversized URL detection. But the best thing FireTrust could do to ease implementation of this strategy is to add a SPAMvertized URL blacklist feature.
|
|
| Back to top |
|
 |
stan_qaz
Premium Member
 Joined: Mar 31, 2003 Posts: 10626
|
Posted: Mon Dec 29, 2003 6:32 pm Post subject: |
|
|
It looks like this one is making the most requested list, hopefully we can get it added to an upcoming version in the new year.
rusticdog, what are the chances of seeing this and when should we look for it if it is possible? _________________ Questions? Try the wiki
http://wiki.castlecops.com/MailWasher_Pro
|
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001 phpBB Group
|