![]() |
SCRIPT TO TRACE 100MB FILE(One million records)
Hi,
I have to sort & search for a particular pattern in an 100MB file. ( 1 million records). Can you please provide an efficient solution to this? Thanks in advance. Regards, Pavan |
Quote:
|
The file contains only Mobile numbers. It requires search the file and find-out the particular number existing or not. The script has already written. The problem i am facing.,awk script shooting CPU usage. Here OS is Linux.
|
If the file is sorted, the search should be relatively efficient. I suppose a germane question is: how often is the file modified and how extensive are the modifications? Problems such as this need to be addressed in terms of their specifics whereas general solutions can be applied to less recalcitrant things.
|
Yes. first i am doing sorting and then applying binary search logic. File content is fixed. It's exactly 100MB size and contains all mobile numbers.,each line one mobile number.
The problem is once if i run the script.,it's shooting CPU usage., but it works fine. |
If the file is modified rarely or in a limited manner, I would think an insertion procedure could maintain the sort, as one would do with a linked list. If it's modified grossly, then a heap sort might be in order. There are other methods that require more storage and an initial organization, but are amenable to more effective operations subsequently. A database is a good example of a sophisticated and complex process that yields ultimate gains. My initial impression is that you should deep-six the general-purpose utility approach and go for an application dedicated to the problem. One hundred megs is not a breathtaking amount these days, but one needs to realize the problems that can arise if improper amounts of cache can lead to thrashing (page swapping unduly) and other undesirable thangys. The fact that each item is on one line is immaterial. A "line" is the result of just another delimiter. It's possible that a mere modification of your approach is in order. Consider a silly extreme: you perform a bubble sort on 10 million items when the data has been modified only slightly or not at all, then you perform a binary search. If something like this is an invariant. then there is much room for improvement. Again, specific suggestions can't be made in the absence of specific information regarding the process as well as the mere organization of the data.
|
You can use grep "123-333-1234" <filename>, it is slow.
We have a C routine that can do a find in about 20 I/O operations or less searching a fixed record length file - kind of a binary search on a file - This is a simplified version - i tested it on a file with 8 million records. test results: :
kcsdev:/home/jmcnama> time ffind 000000000 8000000 /tmp/testcode: :
/***************************************** |
yes JIM. Thanks for the same.
|
| All times are GMT -5. The time now is 9:22 PM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC