![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Newbie
Join Date: Jan 2007
Posts: 8
Rep Power: 0
![]() |
MODI and OCR
Hi,
I am trying to do Car Number Plate recognition using OCR, and I was trying the Microsoft Office Document Imaging (MODI) Library, and it will not work because it says that it is not in english, can anyone help either with MODI or some other free OCR library that could do this job????? |
|
|
|
|
|
#2 |
|
Caffeinated Neural Net
![]() Join Date: Jun 2005
Location: Dry west coast of Canada
Posts: 1,005
Rep Power: 5
![]() |
I don't know much about OCR, but one thing I do know is that many OCR systems try to infer 'questionable' characters based on context. For example, say you have 'LlTTLE' (that second character is actually a lowercase L, not an uppercase I). However, the OCR system will, by checking its dictionary, conclude that it's an I, not a similarly-shaped character. The implication of all this is that if the context is completely nonsensical (as it will be for random characters in a license plate number), the system gets confused.
See if it lets you disable this functionality, and instead operate solely on the shapes of the letters. You may be able to 'teach' it how the letters and numbers look (license plates in one area will share a 'font'). You may also be able to tell it what character patterns to look for with regular expressions (say, three letters, a space, then three numbers). This approach can often resolve otherwise ambiguous characters.
__________________
And once again, Probability proves itself willing to sneak into a back alley and service Drama as would a copper-piece harlot. - Vaarsuvius, Order of the Stick |
|
|
|
|
|
#3 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Back in the early to mid 80s OCR was based on a templating approach, as (apparently) suggested by Lectric. In such a system, 'I' and 'l' are virtually indistinguishable. They remained so even in the newer approaches. A lower case script 'I' is not clearly different than a lower case script 'L', if the producer of the text writes in a "loopy" manner. You, as a human, distinguish the difference, not on character shape, but on context.
The best OCR systems take this into account, albeit sometimes poorly. The OCR facilities that are bundled with your $100 printer are about 10 times as good as what we sold to the USPS, for $250,000 dollars, in those years. If you can figure out how to solve the last few percent of ambiguous items, you won't ever have to worry about standing in the unemployment line.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|