![]() |
Problem with regular expression?
I'm trying to write a program that processes a web log file. Here's an antry example:
:
3236 "GET /robert/./index.php?page=links HTTP/1.1" "30/Sep/2005:11:11:38 -0400" "Java/1.4.1_05" "-" - - 200 69.177.179.241I made a regex to get the important parts and trying to print out the first reference for the 'bytes': :
$entry =~ /\I get an error :
Use of uninitialized value in print at ./prog11.pl line 54 (#1) |
What line is line 54?
|
Sorry, forgot to post that. It's the 'print $bytes'. It works until I put the print statement in there.
|
That regexp looks a little strange to me. To quote the sense of it here, but in the squished up way I'm more used to looking at:
:
(\d+)\s+(".*")\s+(".*")\s+(".*")\s+(".*")\s+.*\s+.*\s+(\d+)\s+(.*)This regexp seems to have several faults. The first one is (".*") to match a quoted string - what this actually matches is zero or more of anything (including quotes) surrounded by quotes. You could use a nongreedy quantifier here (? after the *), but the more efficient method would be to change that . to a character class that, if you wanted to be generous, just excluded quotes: :
^(\d+)\s+("[^"]+")\s+("[^"]+")\s+("[^"]+")\s+("[^"]+")\s+\S+\s+\S+\s+(\d{3})\s+(.*)$I haven't actually tested the above expression. I've replaced (".*") with the version with the character class each time it occurs, and replaced the two .*'s later (that you weren't capturing) with \S (nonwhitespace) because otherwise the .* consumes the whitespace too. I changed the \d+ for the status code with \d{3} (the {3} is a quantifier meaning exactly three of the preceding atom, which is OK here since all HTTP status codes are 3 digits long). Other than that, I've added start and end anchors (^ and $) to the expression to make sure it matches the whole thing and fails to match on lines it can't do that on. Like I say, I haven't actually tested this, but I hope it helps. This may not appear to have much to do with the error about $bytes - but if the first .* construct ends up eating all the text, it prevents the expression matching, and if the whole expression doesn't match, none of the submatch variables will be set. Hope this helps! |
| All times are GMT -5. The time now is 2:08 AM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC