Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Nov 9th, 2004, 9:05 AM   #1
satimis
Newbie
 
Join Date: Oct 2004
Posts: 17
Rep Power: 0 satimis is on a distinguished road
Hi folks,

I'm going to make a script checking inconsistence on 2 documents, say doc_a and doc_b and have no idea how to start.

doc_b is reproduced from doc_a, (original document) not with 'copy and paste' command.

Making it simple first, as highlighted in following example, an one line document:-

1)
Original document "doc_a"
Check this link to sea what scannars are supported by SANE
Already having 2 typing mistakes
sea
scannars

2)
The reproduced document "doc_b" must maintain these 2 mistakes for consistence.
check thes link to sea what scannars are suppurted by SeNE
Unfortunately another 3 typing mistakes were further made;
thes
suppurted
SeNE

What I expect to have in the printout is;
Original  Mistake Line No. Word No.
this   thes   1     2
supported suppurted 1     9
SANE   SeNE   1     11
not just printing out their contents and saying "differ"

Kindly advise how to start. TIA

B.R.
satimis
satimis is offline   Reply With Quote
Old Nov 9th, 2004, 11:54 AM   #2
kurifu
Expert Programmer
 
kurifu's Avatar
 
Join Date: Jul 2004
Location: Halifax, Nova Scotia (Canada)
Posts: 784
Rep Power: 5 kurifu is on a distinguished road
Send a message via ICQ to kurifu Send a message via MSN to kurifu
Look up the source code for the GNU diff program. IT is really difficult to tell you how to start because for a seemingly simple program like this a lot of though and design still needs to go into the algorithm if you want good functionality.

You could just do work checking, store a sequence of words and check them, only problem is that if one document inserts a word and another does not the remainder of the document will be out of sync while a real diff program would understand that it is only a small segment of the document that went out of sync... and this is only one example.

Checking the GNU diff source code is honestly probably the best idea if you need to figure out an algorithm. You could also just pipe and reformat the output from diff too.
__________________
Clifford Matthew Roche <geek@cliffordroche.com>
Web Hosting: http://www.crd-hosting.com
Consulting: http://www.crdev-consulting.com
kurifu is offline   Reply With Quote
Old Nov 9th, 2004, 1:45 PM   #3
monkey8
Newbie
 
Join Date: Nov 2004
Posts: 12
Rep Power: 0 monkey8 is on a distinguished road
Quote:
Originally posted by kurifu@Nov 9 2004, 05:54 PM
You could just do work checking, store a sequence of words and check them, only problem is that if one document inserts a word and another does not the remainder of the document will be out of sync while a real diff program would understand that it is only a small segment of the document that went out of sync... and this is only one example.
To append to kurifu's suggestion:

You could open both files the same time

foreach line, foreach word, if its different, highlight it. This would solve the problem of "one inserted word" would break the program. But this also would break if they added a new line.

If you just want a program to do this, this one looks pretty good http://meld.sourceforge.net/
monkey8 is offline   Reply With Quote
Old Nov 11th, 2004, 9:16 PM   #4
satimis
Newbie
 
Join Date: Oct 2004
Posts: 17
Rep Power: 0 satimis is on a distinguished road
Hi kurifu,

Tks for your advice. I'm a newbei in perl. My example is only for preparing an experimental script.

Quote:
Look up the source code for the GNU diff program...
Whether you meant Algorithm/Diff or Diff on Bash. Please advice how to check the source code.

Quote:
You could just do work checking, store a sequence of words and check them, only problem is that if one document inserts a word and another does not the remainder of the document will be out of sync while a real diff program would understand that it is only a small segment of the document that went out of sync... and this is only one example.
Can you please explain in more detail. Any hint to start a script finding on words input for checking, similar to "Find", the GUI.

Quote:
Checking the GNU diff source code is honestly probably the best idea if you need to figure out an algorithm. You could also just pipe and reformat the output from diff too.
Please explain in more detail.

What I'm worrying on line checking, cutting the document to lines, is if 1 or 2 words on a line left out to type in, then words on the following line will be push up to fill its/their space. In such case there will be a mess.

Any advice. TIA

B.R.
satimis
satimis is offline   Reply With Quote
Old Nov 11th, 2004, 9:25 PM   #5
satimis
Newbie
 
Join Date: Oct 2004
Posts: 17
Rep Power: 0 satimis is on a distinguished road
Hi kurifu,

Tks for your advice and URL.

Quote:
You could open both files the same time
I'll check it with following command

$ perl compare.sh doc_a.txt doc_b.txt

or something similar.

Quote:
foreach line, foreach word, if its different, highlight it. This would solve the problem of "one inserted word" would break the program. But this also would break if they added a new line.
Any hint to start the script;

1) on line checking
2) on words cheching

TIA

B.R.
satimis
satimis is offline   Reply With Quote
Old Nov 13th, 2004, 4:42 PM   #6
monkey8
Newbie
 
Join Date: Nov 2004
Posts: 12
Rep Power: 0 monkey8 is on a distinguished road
I would suggest you read these tutorials.

You can do very complex "word checking" using perl regular expressions.

http://www.comp.leeds.ac.uk/Perl/start.html (read this one first)
http://www.perlmonks.org/?node=Tutorials
monkey8 is offline   Reply With Quote
Old Nov 17th, 2004, 5:16 AM   #7
satimis
Newbie
 
Join Date: Oct 2004
Posts: 17
Rep Power: 0 satimis is on a distinguished road
Quote:
Originally posted by monkey8@Nov 13 2004, 10:42 PM
I would suggest you read these tutorials......
Hi monkey8,

Tks for your advice and URLs

B.R.
satimis
satimis is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 5:19 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC