|
Back in the early to mid 80s OCR was based on a templating approach, as (apparently) suggested by Lectric. In such a system, 'I' and 'l' are virtually indistinguishable. They remained so even in the newer approaches. A lower case script 'I' is not clearly different than a lower case script 'L', if the producer of the text writes in a "loopy" manner. You, as a human, distinguish the difference, not on character shape, but on context.
The best OCR systems take this into account, albeit sometimes poorly. The OCR facilities that are bundled with your $100 printer are about 10 times as good as what we sold to the USPS, for $250,000 dollars, in those years.
If you can figure out how to solve the last few percent of ambiguous items, you won't ever have to worry about standing in the unemployment line.
|