Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

 
 
Thread Tools Display Modes
Prev Previous Post in Thread   Next Post in Thread Next
Old Aug 23rd, 2006, 4:43 PM   #1
jim mcnamara
Hobbyist Programmer
 
Join Date: Jun 2005
Location: New Mexico
Posts: 228
Rep Power: 4 jim mcnamara is on a distinguished road
Let's try efficiency - a coding challenge

Suppose that we have a really big number file, say 20 MB of numbers in a
single column. It has duplicates. Let's call this file bigfile.

You are given another file, which is much smaller, say 20000 lines. We'll
call it smallfile. It is also a list of numbers, with no duplicates.

Requirements Statement:
Create a new file based on the data in bigfile.
1. the new file will contain no lines found in smallfile
2. the new file will have no duplicates.

This code snippet, while it works, will take a large number of operations: file I/O's & searches:

while read number
do
	grep -v "$number" bigfile > newfile
	mv newfile bigfile
done < smallfile
sort -u bigfile > newfile

grep -f -v smallfile bigfile | sort -u > newfile
is a possibility. If your grep doesn't barf on more than 2048000 bytes in the -f file (XOPEN limit)

But we're in another forum... and we're trying something.

Soo... based on the name of the forum (sed & awk in case you forgot) how
would you create a nice efficient chunk of code that meets the Requirements
Statement above? Efficient means the least number of passes thru bigfile.
Say 2-3 maybe. Forget grep & sort.

In other words, you have to create a resultset from bigfile "minus"
smallfile that is a unique list.


Go for it. And think associative arrays.
jim mcnamara is offline   Reply With Quote
 

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Little Challenge For A Bored Programmer Sane Coder's Corner Lounge 20 May 5th, 2006 6:11 AM
Software or Coding?? java_roshan HTML / XHTML / CSS 26 Apr 9th, 2006 3:42 AM
Challenge: How to make daily life better with programming? tempest Coder's Corner Lounge 53 Jun 17th, 2005 2:37 AM
Uman's WEEKEND CHALLENGE uman Coder's Corner Lounge 25 Jun 6th, 2005 9:49 PM
Weekend Challenge theduck Community Announcements and Feedback 43 Jun 3rd, 2005 4:58 PM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 6:53 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC