Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jan 25th, 2005, 2:43 PM   #1
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 13 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
removing certain lines from a file

Hey all, I have a script that parses my weblog, but it skips some lines cuz they're too long something.
i use this command to find the line numbers that were skipped
cat webdruid.log | awk '/Log file/ {print $5}' | cut -d ':' -f1
now i was planning on just loading all lines individually, and then deleting the lines that matched a number of the output of the awk. But is there a better way of deleting those lines, because there are about 35000 to go through. Although only about 100 to delete.

TIA

Dizz
__________________
naked pictures of you | PFO F@H stats
Dizzutch is offline   Reply With Quote
Old Jan 25th, 2005, 4:36 PM   #2
Lance
Programmer
 
Lance's Avatar
 
Join Date: Oct 2004
Posts: 74
Rep Power: 14 Lance is on a distinguished road
Huh?

I just read this post. Then I read it again, 10 times or so. I still can't understand what you're trying to accomplish...

Examples? Pseudo-code? I'll code it if you give me the basic algorithm. Or at least what you want accomplished, and exceptions to look for. Maybe some data to test my code on.

Or do you just want help in coding this? Or did you really mean to post this in the shell section? Because your little code looks like a bash script.

Anyways, just clearify what you need help on.
__________________
[redacted]
Lance is offline   Reply With Quote
Old Jan 25th, 2005, 5:43 PM   #3
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 13 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
I'll keep it simple, i have a list of line numbers that i want to remove from a file. Since i can't do "remove line number X from file" i need to iterate through all the lines and remove the ones that are to be removed (aka, on the list). This way is rather slow since the file i want to remove the lines form is about 35000 lines long. So now i'm just hopefully wishing that there is an easier way of accomplishing this rather than iterating through all these 35000 lines every 5 minutes. (that's when the file needs to be purged). Hope this clears things up.
__________________
naked pictures of you | PFO F@H stats

Last edited by Dizzutch; Jan 25th, 2005 at 5:49 PM.
Dizzutch is offline   Reply With Quote
Old Jan 25th, 2005, 6:35 PM   #4
Lance
Programmer
 
Lance's Avatar
 
Join Date: Oct 2004
Posts: 74
Rep Power: 14 Lance is on a distinguished road
So do a:

foreach $line (<file>) {
$line =~ s/thing to remove//g;
}

of you can open() a file, and create a new file. Print each line, from one file to another, unless that line doesn't belong there. Then you can delete the orginal file, and rename it to whatever it should be.

That's the jist of it. Unless you give me some example data to process, and what I should be removing, I can't really code anything. So you're on your own without more specifications...
__________________
[redacted]
Lance is offline   Reply With Quote
Old Jan 25th, 2005, 8:59 PM   #5
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 13 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
i'm not asking you to code anything,. i just don't want to iterate through 35000+ lines every five minutes if i don't have to. But it doesn't seem like i have any other options than doing that.
__________________
naked pictures of you | PFO F@H stats
Dizzutch is offline   Reply With Quote
Old Jan 25th, 2005, 11:32 PM   #6
Lance
Programmer
 
Lance's Avatar
 
Join Date: Oct 2004
Posts: 74
Rep Power: 14 Lance is on a distinguished road
Ohhh, I see! You were looking for a different algorithm/method. But unfortunately, there is none. The usual iterate-over-every-damn-line is what you must do, unfortunately. And even if you were to use some higher level construct for it, what do you expect it would do? Iterate over every damn line.

Sorry, but you're stuck. :/ Either way, it's going to go searching through it all. Even if something is to search for just that line, it would have to find all the line breaks, meaning search for all the line breaks... and that's iteration over every character. Anyways, yea. You're stuuck.

Sorry again...
__________________
[redacted]
Lance is offline   Reply With Quote
Old Jan 25th, 2005, 11:55 PM   #7
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 13 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
it's all good, maybe i can somehow pre-index the file, make another file, that has indexes for say every 500th line, the direct address/inode if you will. Not sure if that's possible, will have to look farther into it. But if it is possible it will narrow the search down a lot.
__________________
naked pictures of you | PFO F@H stats
Dizzutch is offline   Reply With Quote
Old Jan 26th, 2005, 9:20 AM   #8
Infinite Recursion
Programming Guru
 
Infinite Recursion's Avatar
 
Join Date: Jul 2004
Location: United States
Posts: 3,508
Rep Power: 17 Infinite Recursion will become famous soon enough
Send a message via MSN to Infinite Recursion Send a message via Yahoo to Infinite Recursion
I think C has a file seek function that will allow you to go to specific locations within the file. Using the line numbers you can calculate the offset of the line based on the size of each entry (which generally, you can base it off the max for each data type within the entry)... so you could go exactly to each location in the file and hammer that line. There is probably a bit more to it than this, but this is the jist of how it works. I realize you wanted this in Perl, but I'm not sure of a way this can be done in Perl (aside from calling a C program).
__________________
http://jasonpowers.net

"There are a thousand hacking at the branches of evil to one who is striking at the root."
Infinite Recursion is offline   Reply With Quote
Old Jan 26th, 2005, 10:12 AM   #9
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 13 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
yeah, i was thinking along the same lines, but i'll need to test whether the file address changes when the file gets updated. 'm running a JFS file system of which I don't know that much, but i'll look into it. If there's a better way of doing it, with less iteration, then i'll do it in C, it's not a big deal.
__________________
naked pictures of you | PFO F@H stats
Dizzutch is offline   Reply With Quote
Old Jan 26th, 2005, 10:55 AM   #10
Infinite Recursion
Programming Guru
 
Infinite Recursion's Avatar
 
Join Date: Jul 2004
Location: United States
Posts: 3,508
Rep Power: 17 Infinite Recursion will become famous soon enough
Send a message via MSN to Infinite Recursion Send a message via Yahoo to Infinite Recursion
You could also use egrep...

For instance, your lines are numbered and you want to delete line 444.

egrep -v ^444 infile > outfile

Although you may have to read line by line, which is no better than what you are already doing.

If the file is only updated / appended to at the end, the fseek function would be your best bet. Even if it is updated in the middle, this still really wouldn't matter if you base the seek off of the max possible size of each data type in the entry.

If you can find a way to interact wit VI, you can just do a '444g' followed by a 'dd' to delete line 444 without going line by line.
__________________
http://jasonpowers.net

"There are a thousand hacking at the branches of evil to one who is striking at the root."
Infinite Recursion is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 6:15 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC