Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Oct 24th, 2005, 10:07 AM   #1
wingz198
Newbie
 
wingz198's Avatar
 
Join Date: Sep 2005
Location: Omaha, NE
Posts: 8
Rep Power: 0 wingz198 is on a distinguished road
Problem with regular expression?

I'm trying to write a program that processes a web log file. Here's an antry example:
3236 "GET /robert/./index.php?page=links HTTP/1.1" "30/Sep/2005:11:11:38 -0400" "Java/1.4.1_05" "-" - - 200 69.177.179.241

I made a regex to get the important parts and trying to print out the first reference for the 'bytes':
$entry =~ /\
                   (\d+)\s+                        #bytes
                   (".*")\s+               #method.url.hvers
                   (".*")\s+               #date & time
                   (".*")\s+               #useragent
                   (".*")\s+               #referer
                   .*\s+.*\s+
                   (\d+)\s+                #statuscode
                   (.*)                    #ipaddy
                   /x;
             $bytes = $1;
             print $bytes;

I get an error
Use of uninitialized value in print at ./prog11.pl line 54 (#1)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.
when I try to print it out. Is there something wrong with the expression? The way I see it, it should work
wingz198 is offline   Reply With Quote
Old Oct 24th, 2005, 10:18 AM   #2
Polyphemus_
Expert Programmer
 
Polyphemus_'s Avatar
 
Join Date: Aug 2005
Location: Rotterdam, the Netherlands
Posts: 942
Rep Power: 3 Polyphemus_ is on a distinguished road
What line is line 54?
Polyphemus_ is offline   Reply With Quote
Old Oct 24th, 2005, 10:28 AM   #3
wingz198
Newbie
 
wingz198's Avatar
 
Join Date: Sep 2005
Location: Omaha, NE
Posts: 8
Rep Power: 0 wingz198 is on a distinguished road
Sorry, forgot to post that. It's the 'print $bytes'. It works until I put the print statement in there.
wingz198 is offline   Reply With Quote
Old Jan 24th, 2006, 5:49 PM   #4
mackenga
Professional Programmer
 
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 314
Rep Power: 4 mackenga is on a distinguished road
That regexp looks a little strange to me. To quote the sense of it here, but in the squished up way I'm more used to looking at:

(\d+)\s+(".*")\s+(".*")\s+(".*")\s+(".*")\s+.*\s+.*\s+(\d+)\s+(.*)

This regexp seems to have several faults. The first one is (".*") to match a quoted string - what this actually matches is zero or more of anything (including quotes) surrounded by quotes. You could use a nongreedy quantifier here (? after the *), but the more efficient method would be to change that . to a character class that, if you wanted to be generous, just excluded quotes:

^(\d+)\s+("[^"]+")\s+("[^"]+")\s+("[^"]+")\s+("[^"]+")\s+\S+\s+\S+\s+(\d{3})\s+(.*)$

I haven't actually tested the above expression. I've replaced (".*") with the version with the character class each time it occurs, and replaced the two .*'s later (that you weren't capturing) with \S (nonwhitespace) because otherwise the .* consumes the whitespace too. I changed the \d+ for the status code with \d{3} (the {3} is a quantifier meaning exactly three of the preceding atom, which is OK here since all HTTP status codes are 3 digits long). Other than that, I've added start and end anchors (^ and $) to the expression to make sure it matches the whole thing and fails to match on lines it can't do that on.

Like I say, I haven't actually tested this, but I hope it helps. This may not appear to have much to do with the error about $bytes - but if the first .* construct ends up eating all the text, it prevents the expression matching, and if the whole expression doesn't match, none of the submatch variables will be set.

Hope this helps!
mackenga is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 4:08 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC