Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Apr 22nd, 2005, 1:28 PM   #1
mpn
Newbie
 
Join Date: Apr 2005
Posts: 2
Rep Power: 0 mpn is on a distinguished road
Regular expressions - extracting what's found

I have a bunch of lines of the format

(02:25:26) user: message

More precisely, in the language of regular expressions,

(\d+:\d+:\d+) .+:.*

What's the easiest way to extract the timestamp, user, and message from such a line? scanf can't quite do it.

Thanks,
- Mike Nolan
mpn is offline   Reply With Quote
Old Apr 22nd, 2005, 1:41 PM   #2
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
Like that, I should think. I don't know Perl, but could you tell us where scanf's falling down?
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old Apr 22nd, 2005, 1:50 PM   #3
mpn
Newbie
 
Join Date: Apr 2005
Posts: 2
Rep Power: 0 mpn is on a distinguished road
I only know how to match with that regular expression. How do I extract the three variables using it?

I'm actually using Python+scanf, but I posted here because I thought you all would be the best with regexp. Reasonable assumption? So here's where/how it's failing in Python:

>>> scanf.sscanf('(02:24:26) username: message', '(%s) %s: %s')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "scanf.py", line 341, in sscanf
return bscanf(CharacterBufferFromIterable(inputString), formatString)
File "scanf.py", line 362, in bscanf
return parser(buffer)
File "scanf.py", line 523, in __call__
raise IncompleteCaptureError, (e, tuple(results))
scanf.IncompleteCaptureError: (<scanf.FormatError instance at 0x403f54ac>, ('02:
24:26)',))
mpn is offline   Reply With Quote
Old Apr 27th, 2005, 10:34 AM   #4
spydoor
Programmer
 
Join Date: Feb 2005
Posts: 64
Rep Power: 4 spydoor is on a distinguished road
Looks like a Perl answer may not help you at this point, but since it's in the Perl forum...

my $var = "(02:25:26) user: message";
$var =~ /\((\d+):(\d+):(\d+)\) (.+):(.*)/;

you put paranthesis around the parts of the expression you're trying to capture. Now we also have to escape literal paranthesis.
They get stored in automatic variables $1, $2, $3.... from left to right

so

$1 = 02
$2 = 25
$3 = 26
$4 = user
$5 = message
spydoor is offline   Reply With Quote
Old Apr 27th, 2005, 11:24 AM   #5
Infinite Recursion
Programming Guru
 
Infinite Recursion's Avatar
 
Join Date: Jul 2004
Location: United States
Posts: 3,467
Rep Power: 8 Infinite Recursion is on a distinguished road
Send a message via MSN to Infinite Recursion Send a message via Yahoo to Infinite Recursion
Use AWK and select the corresponding fields for the values you want.
__________________
http://jasonpowers.net

"There are a thousand hacking at the branches of evil to one who is striking at the root."
Infinite Recursion is offline   Reply With Quote
Old May 2nd, 2005, 11:44 AM   #6
mackenga
Professional Programmer
 
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 317
Rep Power: 4 mackenga is on a distinguished road
If you're already in Perl, why call out to awk? Spydoor's regexp isn't quite there:

my $var = "(02:25:26) user: message";
$var =~ /^\((\d+):(\d+):(\d+)\) (.+?):(.*)$/;
my ($hour, $min, $sec, $user, $msg) = ($1, $2, $3, $4, $5);

I'm pretty sure this ought to work. The regexp was nearly there; I just put a ? after .+ to make it nongreedy (otherwise it would have eaten up the colon and the message) and put anchors on to ensure it would match the whole line.

As usual with code I post, I haven't actually tested it. Ain't I helpful?
mackenga is offline   Reply With Quote
Old May 5th, 2005, 3:37 PM   #7
spydoor
Programmer
 
Join Date: Feb 2005
Posts: 64
Rep Power: 4 spydoor is on a distinguished road
the (.+): would not have eaten up the colon or anything after it unless there was a another colon later in the string. In this case it would match upto the final : in the string.

so you're right the ? should be there, but the reason was a little off.

Last edited by spydoor; May 5th, 2005 at 3:39 PM.
spydoor is offline   Reply With Quote
Old May 5th, 2005, 4:39 PM   #8
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
Can I suggest replacing the "(.+?)" with a simple "([^:])"?
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old May 19th, 2005, 3:54 PM   #9
mackenga
Professional Programmer
 
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 317
Rep Power: 4 mackenga is on a distinguished road
Thanks for the correction, spydoor. I had my head on squint.

Ooble; yeah, it would work with an earlier regexp engine that way. I like nongreedy quantifiers though. I was so pleased when they added something new to get confused about to regexps.
mackenga is offline   Reply With Quote
Old May 19th, 2005, 4:39 PM   #10
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
Using the lazy operator apparently slows down your regexps - apparently you should only use them when you have to. :p
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 10:46 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC