![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Newbie
Join Date: Apr 2005
Posts: 2
Rep Power: 0
![]() |
Regular expressions - extracting what's found
I have a bunch of lines of the format
(02:25:26) user: message More precisely, in the language of regular expressions, (\d+:\d+:\d+) .+:.* What's the easiest way to extract the timestamp, user, and message from such a line? scanf can't quite do it. Thanks, - Mike Nolan |
|
|
|
|
|
#2 |
|
I eat cake for breakfast.
![]() ![]() ![]() ![]() Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9
![]() |
Like that, I should think. I don't know Perl, but could you tell us where scanf's falling down?
|
|
|
|
|
|
#3 |
|
Newbie
Join Date: Apr 2005
Posts: 2
Rep Power: 0
![]() |
I only know how to match with that regular expression. How do I extract the three variables using it?
I'm actually using Python+scanf, but I posted here because I thought you all would be the best with regexp. Reasonable assumption? So here's where/how it's failing in Python:>>> scanf.sscanf('(02:24:26) username: message', '(%s) %s: %s') Traceback (most recent call last): File "<stdin>", line 1, in ? File "scanf.py", line 341, in sscanf return bscanf(CharacterBufferFromIterable(inputString), formatString) File "scanf.py", line 362, in bscanf return parser(buffer) File "scanf.py", line 523, in __call__ raise IncompleteCaptureError, (e, tuple(results)) scanf.IncompleteCaptureError: (<scanf.FormatError instance at 0x403f54ac>, ('02: 24:26)',)) |
|
|
|
|
|
#4 |
|
Programmer
Join Date: Feb 2005
Posts: 64
Rep Power: 4
![]() |
Looks like a Perl answer may not help you at this point, but since it's in the Perl forum...
my $var = "(02:25:26) user: message"; $var =~ /\((\d+):(\d+):(\d+)\) (.+):(.*)/; you put paranthesis around the parts of the expression you're trying to capture. Now we also have to escape literal paranthesis. They get stored in automatic variables $1, $2, $3.... from left to right so $1 = 02 $2 = 25 $3 = 26 $4 = user $5 = message |
|
|
|
|
|
#5 |
|
Programming Guru
![]() ![]() ![]() |
Use AWK and select the corresponding fields for the values you want.
__________________
http://jasonpowers.net "There are a thousand hacking at the branches of evil to one who is striking at the root." |
|
|
|
|
|
#6 |
|
Professional Programmer
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 314
Rep Power: 4
![]() |
If you're already in Perl, why call out to awk? Spydoor's regexp isn't quite there:
my $var = "(02:25:26) user: message"; $var =~ /^\((\d+):(\d+):(\d+)\) (.+?):(.*)$/; my ($hour, $min, $sec, $user, $msg) = ($1, $2, $3, $4, $5); I'm pretty sure this ought to work. The regexp was nearly there; I just put a ? after .+ to make it nongreedy (otherwise it would have eaten up the colon and the message) and put anchors on to ensure it would match the whole line. As usual with code I post, I haven't actually tested it. Ain't I helpful? ![]() |
|
|
|
|
|
#7 |
|
Programmer
Join Date: Feb 2005
Posts: 64
Rep Power: 4
![]() |
the (.+): would not have eaten up the colon or anything after it unless there was a another colon later in the string. In this case it would match upto the final : in the string.
so you're right the ? should be there, but the reason was a little off. Last edited by spydoor; May 5th, 2005 at 3:39 PM. |
|
|
|
|
|
#8 |
|
I eat cake for breakfast.
![]() ![]() ![]() ![]() Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9
![]() |
Can I suggest replacing the "(.+?)" with a simple "([^:])"?
|
|
|
|
|
|
#9 |
|
Professional Programmer
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 314
Rep Power: 4
![]() |
Thanks for the correction, spydoor. I had my head on squint.
Ooble; yeah, it would work with an earlier regexp engine that way. I like nongreedy quantifiers though. I was so pleased when they added something new to get confused about to regexps. ![]() |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|