Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Visual Basic .NET (http://www.programmingforums.org/forum19.html)
-   -   MODI and OCR (http://www.programmingforums.org/showthread.php?t=13550)

tobyhughes Jul 13th, 2007 3:24 PM

MODI and OCR
 
Hi,

I am trying to do Car Number Plate recognition using OCR, and I was trying the Microsoft Office Document Imaging (MODI) Library, and it will not work because it says that it is not in english, can anyone help either with MODI or some other free OCR library that could do this job?????

lectricpharaoh Jul 13th, 2007 11:07 PM

I don't know much about OCR, but one thing I do know is that many OCR systems try to infer 'questionable' characters based on context. For example, say you have 'LlTTLE' (that second character is actually a lowercase L, not an uppercase I). However, the OCR system will, by checking its dictionary, conclude that it's an I, not a similarly-shaped character. The implication of all this is that if the context is completely nonsensical (as it will be for random characters in a license plate number), the system gets confused.

See if it lets you disable this functionality, and instead operate solely on the shapes of the letters. You may be able to 'teach' it how the letters and numbers look (license plates in one area will share a 'font'). You may also be able to tell it what character patterns to look for with regular expressions (say, three letters, a space, then three numbers). This approach can often resolve otherwise ambiguous characters.

DaWei Jul 14th, 2007 1:31 AM

Back in the early to mid 80s OCR was based on a templating approach, as (apparently) suggested by Lectric. In such a system, 'I' and 'l' are virtually indistinguishable. They remained so even in the newer approaches. A lower case script 'I' is not clearly different than a lower case script 'L', if the producer of the text writes in a "loopy" manner. You, as a human, distinguish the difference, not on character shape, but on context.

The best OCR systems take this into account, albeit sometimes poorly. The OCR facilities that are bundled with your $100 printer are about 10 times as good as what we sold to the USPS, for $250,000 dollars, in those years.

If you can figure out how to solve the last few percent of ambiguous items, you won't ever have to worry about standing in the unemployment line.


All times are GMT -5. The time now is 2:38 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC