| View previous topic :: View next topic |
| Author |
Message |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Sun Nov 30, 2003 9:32 am Post subject: Filtering nasty URL References |
|
|
I have a couple of filters that I'd like to share. I'll dedicate one to Cowboy 'cause he got on his high horse and unwittingly goaded me to action WRT words broken by HTML tags. The other goes to Denn988 for challenging me to learn this Regular Expression gobbledy gook.
Both filters look for URLs with SPAM fingerprints all over them. One checks for the recipient's email address buried in the URL, the other for redirected URLs - URLs that appear to the browser as going one place (e.g. ebay.com) and actually going to some sleazebag site (e.g. 202.108.255.203).
| Code: | [enabled],"Email Account Buried in Link [B]","Email Address Buried in Link [B]",16711680,AND,Delete,TakesPrecedence,Body,containsRE,"<a .*href=""?http://.+=(uname1|uname2|etc)[^>]*?domain_name""?>"
[enabled],"Redirected HTTP Link [B]","Redirected HTTP Link [B]",16711680,AND,Delete,TakesPrecedence,Body,containsRE,"(?i)<\s*a[\s\w=]+(?s)href=(3D)??""?http://[\d\w\./]+(@|& #064;|\*|& #42;).+>" |
For the first one, just change the 'uname1|uname2|etc' part to whatever account names you use and 'domain_name' to your domain name.
In the second one, just get rid of the spaces between the two instances of the '&' and '#' characters (it seems neither disabling HTML nor using the [code] tags prevents conversion of an encoded character back to the character it represents) . Just to explain this filter a bit, the redirection is triggered by a '@' or '*' character (or the encoded equivalent) in the URL. Everything before such a character is ignored and the browser jumps to the hidden part behind that character. So that's what the filter looks for.
I've been using them for a week or so and haven't had a false positive. I was expecting the first one might trigger on some of my mailing lists and such but that hasn't happened to this point. BTW, I have found a strong correlation for hits on these filters and words broken by HTML tags.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Sat Dec 06, 2003 7:45 pm Post subject: |
|
|
Ikeb,
I just saw the above post and it looks rather interesting. I have added the two filters to my system for testing and evaluation.
As to special charactors used, you might find this link handy for many of the special charactors available.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Sat Dec 06, 2003 9:46 pm Post subject: |
|
|
| denn988 wrote: | | I just saw the above post and it looks rather interesting. I have added the two filters to my system for testing and evaluation. |
I look forward to your review.
| denn988 wrote: | | As to special charactors used, you might find this link handy for many of the special charactors available. |
Thanks for the reference. I think I fully covered the '*' and '@' characters. Certainly I couldn't find anything else at the page you referenced at Syd Allan's site. Dunno about other redirection characters although I certainly haven't run across any others.
BTW, Syd Allan appears to be a kewl kat (even putting aside the fact that he's a fellow Canuck). He has an interesting SPAM essay, I can appreciate his I Am Rude And Uncooperative essay, and his Friends essay is touching. There's a lot more to read. Looks like I'll be here a while.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Sat Dec 06, 2003 11:06 pm Post subject: |
|
|
Ikeb,
<hr width=35% align=left ><hr width=35% align=left ><hr width=35% align=left ><hr width=35% align=left >I thought you might like that link <hr width=35% align=left ><hr width=35% align=left ><hr width=35% align=left ><hr width=35% align=left >
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Sun Dec 07, 2003 6:04 am Post subject: |
|
|
<hr width=40% align=left ><hr width=30% align=left ><hr width=20% align=left ><hr width=10% align=left >Showoff! <hr width=10% align=left ><hr width=20% align=left ><hr width=30% align=left ><hr width=40% align=left >
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Mon Dec 08, 2003 7:11 pm Post subject: |
|
|
Ikeb,
I have looked at the 'Redirected Link' filter and it shows promise. It us a very interesting strategy and would make a good addition to a filter set.
I have made a few changes to the filter to make it simpler and faster. Here is the code as modified:
| Code: |
<[^>]*?http://[^>]*?(@|& #0?64;|\*|& #0?42;)http://[^>]*?> |
REMOVE the spaces in the two '& #' sequences in the above code.
Let me know what you think of the changes...
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Mon Dec 08, 2003 8:18 pm Post subject: |
|
|
Your regex is certainly simpler but I'm not sure that it would be any faster. I wanted to be sure that such a redirect is part of an '<a href>' tag, nothing else.
I note that you allow for '& #64' and '& #42' character encoding as well. I've never seen that but don't doubt this would represent the same thing as '& #064' and '& #042' respectively.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Tue Dec 09, 2003 2:09 am Post subject: |
|
|
| denn988 wrote: |
To tell you the truth...I see no reason to limit the expression to an 'a href' tag. If you look at the following example tag, you may understand why.
| Code: | <img src=http://rd.yahoo.com/iuepyariczdds/*http://frpq27mz.
com\hond\jphoto10.gif> |
It is doing the same thing, but it is using an image source (The above was taken from a post by Cowboy). |
Yeah I noticed that redirect. I just don't know what to make of it. Why bother redirecting a gif? So what if it's picked up from a site other than the first reference? It's not that anyone pays attention as to what site a picture comes from. I just don't see any purpose to the redirect.
That said I'm going to track any such <img src> redirections just to find out how often this happens and if it's a useful SPAM indicator.
| denn988 wrote: | ALSO....
I was just testing the '@' (& #064) and '*' (& #042;) charactor representations using this forum and it will translate with as many as five leading zeros (7 numerals total...more than that and this forum returns a '?').
Try it for yourself...
Because of that, the code that I posted above would be better if it were changed to:
| Code: |
<[^>]*?http://[^>]*?(@|*64;|\*|*42;)http://[^>]*?>
OR
<[^>]*?http://[^>]*?(@|{0,5}64;|\*|{0,5}42;)http://[^>]*?> |
You would want to make sure that your filter would still work no matter what the Spammer could legally come up with to zap it. Besides...you can post it without the spaces. |
OK, thanks for the info. Certainly it's worth taking the trouble to close any loopholes a SPAMer could take advantage of. And there's a neat little side benefit to boot.
| denn988 wrote: |
The main thing is that it looks like a good filter strategy....  |
So far I haven't had a false positive .... and when you thing of it who, other than a SPAMer, would use this technique? I continue to find a strong correlation between this filter and the latest "word broken by an invalid HTML tag" filter I posted earlier.
Incidently, I also find a strong correlation between these two techniques and messages hit by the "Email burried in an HTML link" filter as detailed in the OP. I have had a few false positives (emag subscriptions) with the latter however. Nothing the Friends list cann't deal with though.
|
|
| Back to top |
|
 |
denn988
Guest IP: 66.44.*.*
|
Posted: Tue Dec 09, 2003 3:20 am Post subject: |
|
|
| Ikeb wrote: | Yeah I noticed that redirect. I just don't know what to make of it. Why bother redirecting a gif? So what if it's picked up from a site other than the first reference? It's not that anyone pays attention as to what site a picture comes from. I just don't see any purpose to the redirect.
|
Besides 'cookies' which many people (such as myself) block as much as possible, one of the tracking methods used is through 'gifs'. They may not provide any personally identifiable info, but just the fact that your client is going to the site to get the download might be enough to generate income for the Spammer.
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
|
| Back to top |
|
 |
Guest
Guest IP: 68.51.*.*
|
Posted: Tue Dec 09, 2003 2:44 pm Post subject: |
|
|
Ikeb,
I've put both of your filters from this thread into my filters list (at the top of the list) and I've gotten hits on both of them.
This morning I received an email that had the status of 'Origin blacklisted' so that email made it past all of my filters, but when I looked at the body of the email in the preview pane I could see my email address in one of the links. So for some reason it made it past your 'Email Account Buried in Link' filter.
I present the line from the preview pane for you to take a look at:
| Code: | | http://womc.info/pass.php?a=donotemail&b=EmailAccountHere&c=true |
|
|
| Back to top |
|
 |
Cowboy
Guest IP: 213.112.*.*
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
Posted: Tue Dec 09, 2003 7:06 pm Post subject: |
|
|
| Guest wrote: | | I've put both of your filters from this thread into my filters list (at the top of the list) and I've gotten hits on both of them. |
That confirms you installed the filters OK.
| Guest wrote: | | This morning I received an email that had the status of 'Origin blacklisted' so that email made it past all of my filters, but when I looked at the body of the email in the preview pane I could see my email address in one of the links. So for some reason it made it past your 'Email Account Buried in Link' filter. |
Doesn't an 'Origin blacklisted' take precedence to filters ... unless the filter is set to take precedence over the Friends list? Is the filter set to take precedence? Are you using RBL? Perhaps try turning off BRL if so and let us know what happens.
| Guest wrote: | I present the line from the preview pane for you to take a look at:
| Code: | | http://womc.info/pass.php?a=donotemail&b=EmailAccountHere&c=true |
|
Looks fine to me. (i.e. the filter should catch this .... assuming of course that 'EmailAccountHere' matches the account PLUS mail domain you placed into the regex filter.)
NOTE: The filter strategy assumes that account name (i.e. the part before the '@' sign) is placed in the filter separately from your mail server domain (i.e. the part after the '@' sign).
|
|
| Back to top |
|
 |
Ikeb
Special Response Team Forums Admin
 Joined: Apr 20, 2003 Posts: 16543
|
|
| Back to top |
|
 |
|
|