Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Perl (http://www.programmingforums.org/forum21.html)
-   -   Perl Script (http://www.programmingforums.org/showthread.php?t=1079)

satimis Nov 9th, 2004 10:05 AM

Hi folks,

I'm going to make a script checking inconsistence on 2 documents, say doc_a and doc_b and have no idea how to start.

doc_b is reproduced from doc_a, (original document) not with 'copy and paste' command.

Making it simple first, as highlighted in following example, an one line document:-

1)
Original document "doc_a"
:

Check this link to sea what scannars are supported by SANE
Already having 2 typing mistakes
sea
scannars

2)
The reproduced document "doc_b" must maintain these 2 mistakes for consistence.
:

check thes link to sea what scannars are suppurted by SeNE
Unfortunately another 3 typing mistakes were further made;
thes
suppurted
SeNE

What I expect to have in the printout is;
:

Original  Mistake Line No. Word No.
this  thes  1    2
supported suppurted 1    9
SANE  SeNE  1    11

not just printing out their contents and saying "differ"

Kindly advise how to start. TIA

B.R.
satimis

kurifu Nov 9th, 2004 12:54 PM

Look up the source code for the GNU diff program. IT is really difficult to tell you how to start because for a seemingly simple program like this a lot of though and design still needs to go into the algorithm if you want good functionality.

You could just do work checking, store a sequence of words and check them, only problem is that if one document inserts a word and another does not the remainder of the document will be out of sync while a real diff program would understand that it is only a small segment of the document that went out of sync... and this is only one example.

Checking the GNU diff source code is honestly probably the best idea if you need to figure out an algorithm. You could also just pipe and reformat the output from diff too.

monkey8 Nov 9th, 2004 2:45 PM

Quote:

Originally posted by kurifu@Nov 9 2004, 05:54 PM
You could just do work checking, store a sequence of words and check them, only problem is that if one document inserts a word and another does not the remainder of the document will be out of sync while a real diff program would understand that it is only a small segment of the document that went out of sync... and this is only one example.
To append to kurifu's suggestion:

You could open both files the same time

foreach line, foreach word, if its different, highlight it. This would solve the problem of "one inserted word" would break the program. But this also would break if they added a new line.

If you just want a program to do this, this one looks pretty good http://meld.sourceforge.net/

satimis Nov 11th, 2004 10:16 PM

Hi kurifu,

Tks for your advice. I'm a newbei in perl. My example is only for preparing an experimental script.

Quote:

Look up the source code for the GNU diff program...
Whether you meant Algorithm/Diff or Diff on Bash. Please advice how to check the source code.

Quote:

You could just do work checking, store a sequence of words and check them, only problem is that if one document inserts a word and another does not the remainder of the document will be out of sync while a real diff program would understand that it is only a small segment of the document that went out of sync... and this is only one example.
Can you please explain in more detail. Any hint to start a script finding on words input for checking, similar to "Find", the GUI.

Quote:

Checking the GNU diff source code is honestly probably the best idea if you need to figure out an algorithm. You could also just pipe and reformat the output from diff too.
Please explain in more detail.

What I'm worrying on line checking, cutting the document to lines, is if 1 or 2 words on a line left out to type in, then words on the following line will be push up to fill its/their space. In such case there will be a mess.

Any advice. TIA

B.R.
satimis

satimis Nov 11th, 2004 10:25 PM

Hi kurifu,

Tks for your advice and URL.

Quote:

You could open both files the same time
I'll check it with following command

$ perl compare.sh doc_a.txt doc_b.txt

or something similar.

Quote:

foreach line, foreach word, if its different, highlight it. This would solve the problem of "one inserted word" would break the program. But this also would break if they added a new line.
Any hint to start the script;

1) on line checking
2) on words cheching

TIA

B.R.
satimis

monkey8 Nov 13th, 2004 5:42 PM

I would suggest you read these tutorials.

You can do very complex "word checking" using perl regular expressions.

http://www.comp.leeds.ac.uk/Perl/start.html (read this one first)
http://www.perlmonks.org/?node=Tutorials

satimis Nov 17th, 2004 6:16 AM

Quote:

Originally posted by monkey8@Nov 13 2004, 10:42 PM
I would suggest you read these tutorials......
Hi monkey8,

Tks for your advice and URLs

B.R.
satimis


All times are GMT -5. The time now is 12:18 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC