Thread: MODI and OCR
View Single Post
Old Jul 14th, 2007, 12:31 AM   #3
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Back in the early to mid 80s OCR was based on a templating approach, as (apparently) suggested by Lectric. In such a system, 'I' and 'l' are virtually indistinguishable. They remained so even in the newer approaches. A lower case script 'I' is not clearly different than a lower case script 'L', if the producer of the text writes in a "loopy" manner. You, as a human, distinguish the difference, not on character shape, but on context.

The best OCR systems take this into account, albeit sometimes poorly. The OCR facilities that are bundled with your $100 printer are about 10 times as good as what we sold to the USPS, for $250,000 dollars, in those years.

If you can figure out how to solve the last few percent of ambiguous items, you won't ever have to worry about standing in the unemployment line.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote