![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Hobbyist Programmer
Join Date: Jun 2005
Location: New Mexico
Posts: 228
Rep Power: 4
![]() |
Let's try efficiency - a coding challenge
Suppose that we have a really big number file, say 20 MB of numbers in a
single column. It has duplicates. Let's call this file bigfile. You are given another file, which is much smaller, say 20000 lines. We'll call it smallfile. It is also a list of numbers, with no duplicates. Requirements Statement: Create a new file based on the data in bigfile. 1. the new file will contain no lines found in smallfile 2. the new file will have no duplicates. This code snippet, while it works, will take a large number of operations: file I/O's & searches: while read number do grep -v "$number" bigfile > newfile mv newfile bigfile done < smallfile sort -u bigfile > newfile grep -f -v smallfile bigfile | sort -u > newfile But we're in another forum... and we're trying something. Soo... based on the name of the forum (sed & awk in case you forgot) how would you create a nice efficient chunk of code that meets the Requirements Statement above? Efficient means the least number of passes thru bigfile. Say 2-3 maybe. Forget grep & sort. In other words, you have to create a resultset from bigfile "minus" smallfile that is a unique list. Go for it. And think associative arrays. |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Little Challenge For A Bored Programmer | Sane | Coder's Corner Lounge | 20 | May 5th, 2006 6:11 AM |
| Software or Coding?? | java_roshan | HTML / XHTML / CSS | 26 | Apr 9th, 2006 3:42 AM |
| Challenge: How to make daily life better with programming? | tempest | Coder's Corner Lounge | 53 | Jun 17th, 2005 2:37 AM |
| Uman's WEEKEND CHALLENGE | uman | Coder's Corner Lounge | 25 | Jun 6th, 2005 9:49 PM |
| Weekend Challenge | theduck | Community Announcements and Feedback | 43 | Jun 3rd, 2005 4:58 PM |