![]() |
Let's try efficiency - a coding challenge
Suppose that we have a really big number file, say 20 MB of numbers in a
single column. It has duplicates. Let's call this file bigfile. You are given another file, which is much smaller, say 20000 lines. We'll call it smallfile. It is also a list of numbers, with no duplicates. Requirements Statement: Create a new file based on the data in bigfile. 1. the new file will contain no lines found in smallfile 2. the new file will have no duplicates. This code snippet, while it works, will take a large number of operations: file I/O's & searches: :
while read number:
grep -f -v smallfile bigfile | sort -u > newfileBut we're in another forum... and we're trying something. Soo... based on the name of the forum (sed & awk in case you forgot) how would you create a nice efficient chunk of code that meets the Requirements Statement above? Efficient means the least number of passes thru bigfile. Say 2-3 maybe. Forget grep & sort. In other words, you have to create a resultset from bigfile "minus" smallfile that is a unique list. Go for it. And think associative arrays. |
| All times are GMT -5. The time now is 12:48 AM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC