![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
|
|
#1 |
|
Newbie
Join Date: Sep 2005
Location: Omaha, NE
Posts: 8
Rep Power: 0
![]() |
Problem with regular expression?
I'm trying to write a program that processes a web log file. Here's an antry example:
3236 "GET /robert/./index.php?page=links HTTP/1.1" "30/Sep/2005:11:11:38 -0400" "Java/1.4.1_05" "-" - - 200 69.177.179.241 I made a regex to get the important parts and trying to print out the first reference for the 'bytes': $entry =~ /\
(\d+)\s+ #bytes
(".*")\s+ #method.url.hvers
(".*")\s+ #date & time
(".*")\s+ #useragent
(".*")\s+ #referer
.*\s+.*\s+
(\d+)\s+ #statuscode
(.*) #ipaddy
/x;
$bytes = $1;
print $bytes;I get an error Use of uninitialized value in print at ./prog11.pl line 54 (#1)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables. |
|
|
|
|
|
#2 |
|
Expert Programmer
Join Date: Aug 2005
Location: Rotterdam, the Netherlands
Posts: 942
Rep Power: 4
![]() |
What line is line 54?
|
|
|
|
|
|
#3 |
|
Newbie
Join Date: Sep 2005
Location: Omaha, NE
Posts: 8
Rep Power: 0
![]() |
Sorry, forgot to post that. It's the 'print $bytes'. It works until I put the print statement in there.
|
|
|
|
|
|
#4 |
|
Professional Programmer
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 317
Rep Power: 4
![]() |
That regexp looks a little strange to me. To quote the sense of it here, but in the squished up way I'm more used to looking at:
(\d+)\s+(".*")\s+(".*")\s+(".*")\s+(".*")\s+.*\s+.*\s+(\d+)\s+(.*)This regexp seems to have several faults. The first one is (".*") to match a quoted string - what this actually matches is zero or more of anything (including quotes) surrounded by quotes. You could use a nongreedy quantifier here (? after the *), but the more efficient method would be to change that . to a character class that, if you wanted to be generous, just excluded quotes: ^(\d+)\s+("[^"]+")\s+("[^"]+")\s+("[^"]+")\s+("[^"]+")\s+\S+\s+\S+\s+(\d{3})\s+(.*)$I haven't actually tested the above expression. I've replaced (".*") with the version with the character class each time it occurs, and replaced the two .*'s later (that you weren't capturing) with \S (nonwhitespace) because otherwise the .* consumes the whitespace too. I changed the \d+ for the status code with \d{3} (the {3} is a quantifier meaning exactly three of the preceding atom, which is OK here since all HTTP status codes are 3 digits long). Other than that, I've added start and end anchors (^ and $) to the expression to make sure it matches the whole thing and fails to match on lines it can't do that on. Like I say, I haven't actually tested this, but I hope it helps. This may not appear to have much to do with the error about $bytes - but if the first .* construct ends up eating all the text, it prevents the expression matching, and if the whole expression doesn't match, none of the submatch variables will be set. Hope this helps! |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|