CastleCops, Internet Crime Fighters
Need help? Click here to register for free! Absolutely zero advertisements on this site!

$9736.22 of $21422.68
left sidedonated so farneed $11686.46 donated to reach our goalright side, our goal
Help CastleCops serve the community on new servers, Donate Here to reach our goal.

Donation/Premium
spacer
block bottom
Security Central
spacer
· Home
· PIRT/Fried Phish
· MIRT
· SIRT
· Deutsch
· Wiki
· Newsletter
· O16/ActiveX
· CLSID List
· Contest2007
· Downloads
· Feedback (send)
· Forums
· HijackThis
· Hijacktrend
· LSPs
· My Downloads
· O18
· O20
· O21
· O22
· O23
· O9
· Premium
· Private Messages
· Proxomitron
· Reviews
· Search
· StartupList
· Stories Archive
· Submit News
· WsIRT
· Your Account
· Acceptable Use Policy
block bottom
Survey
spacer
Was 2007 a good year?

Yes it was a wonderful year
Yes, but there is always room for improvement
Status quo
It was a challenge
Other (leave comment)



Results
Polls

Votes: 937
Comments: 25
block bottom
spacer spacer

Please help with Regular Expression problem
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General [del.icio.us!] [digg it!] [reddit!]
View previous topic :: View next topic  
Author Message
Walt

Sergeant
Sergeant


Joined: Mar 15, 2003
Posts: 91
Location: USA

PostPosted: Fri Dec 12, 2003 10:07 pm    Post subject: Please help with Regular Expression problem
Reply with quote

OK, I thought I understood regular expressions. Apparently, not.

Here is the Regular Expression I have:

Code:
^Content-Type: .*\.exe"$


MW Pro is successfully finding this RE within this email body:

Code:
Status:  U
Return-Path: <victorsa@rieder.net.py>
Received: from mail.rieder.net.py ([66.178.33.5])
   by meadowlark (Earthlink/Onemain SMTP Server) with ESMTP id 1auV4y3Fe3NZFl60
   for <walt@xxxx.com>; Fri, 12 Dec 2003 13:43:40 -0800 (PST)
Received: from ixjwl (ptp205.rieder.net [66.178.35.205])
   by mail.rieder.net.py (8.12.10/8.12.10) with SMTP id hBCLXGhl010652;
   Fri, 12 Dec 2003 18:33:16 -0300
Date: Fri, 12 Dec 2003 18:33:16 -0300
Message-Id: <200312122133.hBCLXGhl010652@mail.rieder.net.py>
FROM: "admin" <smailengine@microsoft.com>
TO: " " <>
Subject: {Virus?}
Mime-Version: 1.0
Content-type: multipart/mixed; boundary="hzmqugk"
X-Rieder_Internet-MailScanner-Information: Please contact the ISP for more information
X-RiederAV: Found to be infected

--hzmqugk
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<HTML>

conten=EDa uno o m=E1s virus que pudieron ser desinfectados.

Los virus han sido eliminados y los archivos desinfectados se han
anexado a este mensaje.

--=20
MailScanner
Protecci=F3n contra Virus de E-mail
www.mailscanner.info

<HEAD></HEAD>
<BODY>
<iframe src=3D"cid:wrkmupxnzhclsbw" height=3D0 width=3D0></iframe>
<BR>Message from microsoft.com
<BR><BR><BR><BR>Undelivered to <B>ahcrif@microsoft.com</B>
</BODY></HTML>

--hzmqugk
Content-Type: text/plain; charset="us-ascii"; name="VirusWarning.txt"
Content-Disposition: attachment; filename="VirusWarning.txt"
Content-Transfer-Encoding: quoted-printable

Este es un mensaje del Servicio de Protecci=F3n de Virus para Correo
Electr=F3nico MailScanner
----------------------------------------------------------------------
El archivo anexado original "bzestyf.exe"
se considera que ha sido infectado por un virus y el mismo
ha sido reemplazado por este mensaje de aviso.

El Fri Dec 12 18:33:50 2003 el analizador de virus dijo:
   ClamAV: bzestyf.exe contains Worm.Gibe.F=20
   MailScanner: Executable DOS/Windows programs are dangerous in email (bze=
styf.exe)


Nota para el departamento de soporte: Revisar en Rieder_Mail_Antivirus en
/var/spool/MailScanner/quarantine/20031212 (mensaje hBCLXGhl010652).

--
Postmaster

--hzmqugk--


I am sorry, but I just don't see how. Crying or Very sad

Can someone help me understand what I did wrong in coding up my RE, and exactly what within that SPAM body, the RE matches????

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16506

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sat Dec 13, 2003 3:20 am    Post subject:
Reply with quote

Actually I would think it should get a multiline match:

Code:
Content-Type: text/plain; charset="us-ascii"; name="VirusWarning.txt"
Content-Disposition: attachment; filename="VirusWarning.txt"
Content-Transfer-Encoding: quoted-printable

Este es un mensaje del Servicio de Protecci=F3n de Virus para Correo
Electr=F3nico MailScanner
----------------------------------------------------------------------
El archivo anexado original "bzestyf.exe"

But when I plugged the text you posted into TestRExp and ran your regex against the text, I couldn't get the match ... even with /s enabled. Nada.

If you can't locate the problem, wait until you get another hit, don't delete it (even if it's SPAM), download it to your email client and save it as a text file. Then run it against your regex using TestRExp.

Edit: OK, I got a match when I removed the space character at the end of the "El archivo anexado original "bzestyf.exe" " line. Is it possible that the space was added as you copied and pasted?

Second edit: Answering my own question here. It appears that the [ code ] BBCode adds a space at the end of each line. Rolling Eyes
... which means of course that we can't post text segments with complete confidence that the text accurately reflects what was pasted. (Come to think of it, neither can &nn character representations)
I'll raise a flag at the Site Inbox ...

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Sat Dec 13, 2003 4:40 am    Post subject:
Reply with quote

Ikeb,

I concur with part of your above analyses, either the space was mistakenly added to the text under test...or removed from the RegExp.

The fact of the matter is that this filter (with the space taken into account) would actually fire on this larger part of the text, as it would be the first match that the filter would actually find:

Code:
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<HTML>

conten=EDa uno o m=E1s virus que pudieron ser desinfectados.

Los virus han sido eliminados y los archivos desinfectados se han
anexado a este mensaje.

--=20
MailScanner
Protecci=F3n contra Virus de E-mail
www.mailscanner.info

<HEAD></HEAD>
<BODY>
<iframe src=3D"cid:wrkmupxnzhclsbw" height=3D0 width=3D0></iframe>
<BR>Message from microsoft.com
<BR><BR><BR><BR>Undelivered to <B>ahcrif@microsoft.com</B>
</BODY></HTML>

--hzmqugk
Content-Type: text/plain; charset="us-ascii"; name="VirusWarning.txt"
Content-Disposition: attachment; filename="VirusWarning.txt"
Content-Transfer-Encoding: quoted-printable

Este es un mensaje del Servicio de Protecci=F3n de Virus para Correo
Electr=F3nico MailScanner
----------------------------------------------------------------------
El archivo anexado original "bzestyf.exe"



As to the problem with the filter itself....

Code:
^Content-Type: .*\.exe"$


The above RegExp has a very serious flaw built into it. The use of    .*    in the expression causes the filter to look through an unlimited amount of text till it finds the    .exe"    at the end of a line.

In this case, the    .exe"    occured 10 lines down the body from the start point that the filter was looking for.

You might try placing a limit on how far down the expression will look once it finds a start point. For example, you could change the RegExp to read as follows:

Code:
^Content-Type:.{0,150}?\.exe"


This would limit the RegExp to look at no more than 150 charactors in between    Content-Type:    and    .exe"

It would also be 'non-greedy' which would be preferred. Finally, it would allow the filter to trigger even if there is a space following the    .exe"

Back to top
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16506

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sat Dec 13, 2003 6:28 am    Post subject:
Reply with quote

denn988 wrote:
I concur with part of your above analyses, either the space was mistakenly added to the text under test...or removed from the RegExp.

Oh it's definitely that the [ code ] field adds a space at the end of each line. And it's easy enough to check just by selecting some of the text in display mode and comparing that to the text in edit mode.

denn988 wrote:
The fact of the matter is that this filter (with the space taken into account) would actually fire on this larger part of the text, as it would be the first match that the filter would actually find:

Code:
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<HTML>

etc.

Good point. I was so caught up by the [ code ] issue I missed that. But now that I look at the text there's actually an earlier match:
Code:
Content-type: multipart/mixed; boundary="hzmqugk"
X-Rieder_Internet-MailScanner-Information: Please contact the ISP for more information
X-RiederAV: Found to be infected

--hzmqugk
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<HTML>

etc.

denn988 wrote:
The above RegExp has a very serious flaw built into it. The use of    .*    in the expression causes the filter to look through an unlimited amount of text till it finds the    .exe"    at the end of a line.

Again good point that I missed due to the [ code ] "flustration".

denn988 wrote:
You might try placing a limit on how far down the expression will look once it finds a start point. For example, you could change the RegExp to read as follows:
Code:
^Content-Type:.{0,150}?\.exe"


This would limit the RegExp to look at no more than 150 charactors in between    Content-Type:    and    .exe"

It would also be 'non-greedy' which would be preferred. Finally, it would allow the filter to trigger even if there is a space following the    .exe"

Good points. Perhaps it would be a bit more deterministic if limited by the number of lines it allows though:
Code:
^Content-Type:([^\n]*?\n){0,4}?[^\n]*?\.exe"


BTW, I'm assuming this filter should NOT be firing on this message.....

Back to top
View users profile Send private message
Walt

Sergeant
Sergeant


Joined: Mar 15, 2003
Posts: 91
Location: USA

PostPosted: Sat Dec 13, 2003 12:27 pm    Post subject:
Reply with quote

denn988 wrote:

As to the problem with the filter itself....

Code:
^Content-Type: .*\.exe"$


The above RegExp has a very serious flaw built into it. The use of    .*    in the expression causes the filter to look through an unlimited amount of text till it finds the    .exe"    at the end of a line.

In this case, the    .exe"    occured 10 lines down the body from the start point that the filter was looking for.

You might try placing a limit on how far down the expression will look once it finds a start point. For example, you could change the RegExp to read as follows:

Code:
^Content-Type:.{0,150}?\.exe"


This would limit the RegExp to look at no more than 150 charactors in between    Content-Type:    and    .exe"

It would also be 'non-greedy' which would be preferred. Finally, it would allow the filter to trigger even if there is a space following the    .exe"


Wow, once again, a BIG "Thank you" to all!

I guess the point I missed, when I was trying to learn all this stuff, was that I had incorrectly thought that if I start a RE with an "^" and end it with a "$", that I bounded the search to a single line. In other words, the first CR,LF would terminate any further matching.

Now I see that is where the root of my problem lies.

Again, thanks to you all!

Back to top
View users profile Send private message
Walt

Sergeant
Sergeant


Joined: Mar 15, 2003
Posts: 91
Location: USA

PostPosted: Sat Dec 13, 2003 12:31 pm    Post subject:
Reply with quote

Ikeb wrote:
But when I plugged the text you posted into TestRExp and ran your regex against the text, I couldn't get the match ... even with /s enabled. Nada.

If you can't locate the problem, wait until you get another hit, don't delete it (even if it's SPAM), download it to your email client and save it as a text file. Then run it against your regex using TestRExp.


What, where, is TestRExp ????

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Sat Dec 13, 2003 2:12 pm    Post subject:
Reply with quote

Ikeb wrote:

Good point. I was so caught up by the [ code ] issue I missed that. But now that I look at the text there's actually an earlier match:
Code:
Content-type: multipart/mixed; boundary="hzmqugk"
X-Rieder_Internet-MailScanner-Information: Please contact the ISP for more information
X-RiederAV: Found to be infected

--hzmqugk
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<HTML> 



Actually, there is no way the filter would have fired within MWP for the example above. The reason is that it contains the first blank line of the entire message. That first blank line is the separator between the header and the body. In other words, a body filter would have never seen the following in the example above:

Code:
Content-type: multipart/mixed; boundary="hzmqugk"
X-Rieder_Internet-MailScanner-Information: Please contact the ISP for more information
X-RiederAV: Found to be infected







Ikeb wrote:

denn988 wrote:
You might try placing a limit on how far down the expression will look once it finds a start point. For example, you could change the RegExp to read as follows:
Code:
^Content-Type:.{0,150}?\.exe"


This would limit the RegExp to look at no more than 150 charactors in between    Content-Type:    and    .exe"

It would also be 'non-greedy' which would be preferred. Finally, it would allow the filter to trigger even if there is a space following the    .exe"

Good points. Perhaps it would be a bit more deterministic if limited by the number of lines it allows though:
Code:
^Content-Type:([^\n]*?\n){0,4}?[^\n]*?\.exe"


BTW, I'm assuming this filter should NOT be firing on this message.....



Nice bit of filter writing there Ikeb...That is a much better way to limit the filter than counting charactors.

Back to top
Walt

Sergeant
Sergeant


Joined: Mar 15, 2003
Posts: 91
Location: USA

PostPosted: Sat Dec 13, 2003 2:30 pm    Post subject:
Reply with quote

OK, I tried these two different RE's in my actual MW Pro filters, against email which contained these two examples,

Code:
Content-Type: text/plain; name="VirusWarning.exe"


and

Code:
Content-Type: text/plain;
 name="VirusWarning.exe"


The RE

Code:
^Content-Type:.{0,150}?\.exe"


"hits" on both. Very Happy

The RE
Code:
^Content-Type:([^\n]*?\n){0,4}?[^\n]*?\.exe"


"hits" on neither. Crying or Very sad

Unfortunately, I don't seem to know enough about RE's to see what is wrong with the second one.

Back to top
View users profile Send private message
denn988

Guest
IP: 66.44.*.*






PostPosted: Sat Dec 13, 2003 3:32 pm    Post subject:
Reply with quote

Walt,

You put a space at the end of Ikebs version. You seem to have a lot of trouble with that based on your earlier posts. Watch out for that in the future.



Ikeb,

I like your new code so much that I modified my TEXT BASE64 filter to use it.

The RegExp for that filter now reads:

Code:

^Content-Type: text/([^\n]*?\n){0,4}?[^\n]*?^Content-Transfer-Encoding: base64


Look in both the header OR the body with this filter.

It will detect any text file that is encoded in BASE64. As e-mail is a TEXT format, there is no need to encode test into BASE64, other than to obfuscate the text.

Thanks for the nifty new technique Ikeb...

Back to top
denn988

Guest
IP: 66.44.*.*






PostPosted: Sat Dec 13, 2003 3:35 pm    Post subject:
Reply with quote

Walt and Ikeb..

I just noticed...Ikeb put a space on the end of his version when he posted it. If Walt copied it exactly as posted it would have been the reason the filter did not fire.

We all have to watch out for those minor details.

Back to top
Walt

Sergeant
Sergeant


Joined: Mar 15, 2003
Posts: 91
Location: USA

PostPosted: Sat Dec 13, 2003 6:29 pm    Post subject:
Reply with quote

denn988 wrote:
Walt and Ikeb..

I just noticed...Ikeb put a space on the end of his version when he posted it. If Walt copied it exactly as posted it would have been the reason the filter did not fire.

We all have to watch out for those minor details.


Yep, three things I learned are:
    Arrow What out for trailing spaces. They all too often go easily unnoticed.

    Arrow Don't use ".*". It spans lines, and will continue eating characters all the way until the very end. Instead, find something that limits the search to either the current line, or just a few lines.

    Arrow Wow, you guys are great.

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16506

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sat Dec 13, 2003 8:01 pm    Post subject:
Reply with quote

Walt wrote:
I guess the point I missed, when I was trying to learn all this stuff, was that I had incorrectly thought that if I start a RE with an "^" and end it with a "$", that I bounded the search to a single line. In other words, the first CR,LF would terminate any further matching.

Right! I thought exactly the same thing until Denn988 informed me that with MWP, the global /s parameter is ON by default. This could use some further explanation within the MWP Help file.

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16506

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sat Dec 13, 2003 8:17 pm    Post subject:
Reply with quote

Walt wrote:
What, where, is TestRExp ????

Actually it's referenced in the MWP About screen .... http://anso.da.ru/ will get you to the site. TestRExp is the utility you can use to test Regular Expressions intended to run as a MWP filter rule .... code developed by the Russian chap who put up the web site.

Note: The web site has a spyware utility set to autoinstall via your browser. Make sure your security profile will detect and stop the installation. I have noted the issue here and suggested that FireTrust take action to protect their customers. Having this web site referenced directly from their About page surely implies a certain element of liability on the part of FireTrust.

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16506

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sat Dec 13, 2003 8:27 pm    Post subject:
Reply with quote

denn988 wrote:
Walt and Ikeb..

I just noticed...Ikeb put a space on the end of his version when he posted it. If Walt copied it exactly as posted it would have been the reason the filter did not fire.

We all have to watch out for those minor details.

I was just about to say "I did not! The [code] function did that!" but decided to confirm first. Indeed I forgot to take out the space at the end before posting. And you're right, we have to watch that. To that end, please note that the [code] BBcode adds an extra space at the end of each line so don't copy from the displayed code, rather click the Quote button and copy the [code]ed section from there.

Back to top
View users profile Send private message
Ikeb

Special Response Team
Forums Admin

Joined: Apr 20, 2003
Posts: 16506

Forums Admin Moderators MVP Premium SRT Team CC Committee Team F@H

PostPosted: Sat Dec 13, 2003 8:31 pm    Post subject:
Reply with quote

denn988 wrote:
Ikeb wrote:

Good point. I was so caught up by the [ code ] issue I missed that. But now that I look at the text there's actually an earlier match:
Code:
Content-type: multipart/mixed; boundary="hzmqugk"
X-Rieder_Internet-MailScanner-Information: Please contact the ISP for more information
X-RiederAV: Found to be infected

--hzmqugk
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<HTML> 



Actually, there is no way the filter would have fired within MWP for the example above. The reason is that it contains the first blank line of the entire message. That first blank line is the separator between the header and the body. In other words, a body filter would have never seen the following in the example above:

Code:
Content-type: multipart/mixed; boundary="hzmqugk"
X-Rieder_Internet-MailScanner-Information: Please contact the ISP for more information
X-RiederAV: Found to be infected



Thanks for that bit of info. So the end of the header is marked by a blank line eh? What the h*** is the
Code:
Content-type: multipart/mixed; boundary="hzmqugk"
line doing in the header anyway?

denn988 wrote:
Nice bit of filter writing there Ikeb...That is a much better way to limit the filter than counting charactors.

Thanks! It took me a while to come up with this approach. I was just about to give up when I realized that it needed another [^\n]* to complete the match on the last line.

Back to top
View users profile Send private message
Display posts from previous:   
Post new topic   Reply to topic       All -> FavForums -> Mailwasher - Troubleshooting / General All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Quick Reply:
Username: 

Quote the last message
Attach signature (signatures can be changed in profile)
 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001 phpBB Group
spacer spacer