Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jul 13th, 2007, 2:24 PM   #1
tobyhughes
Newbie
 
Join Date: Jan 2007
Posts: 8
Rep Power: 0 tobyhughes is on a distinguished road
MODI and OCR

Hi,

I am trying to do Car Number Plate recognition using OCR, and I was trying the Microsoft Office Document Imaging (MODI) Library, and it will not work because it says that it is not in english, can anyone help either with MODI or some other free OCR library that could do this job?????
tobyhughes is offline   Reply With Quote
Old Jul 13th, 2007, 10:07 PM   #2
lectricpharaoh
Caffeinated Neural Net
 
lectricpharaoh's Avatar
 
Join Date: Jun 2005
Location: Dry west coast of Canada
Posts: 925
Rep Power: 4 lectricpharaoh will become famous soon enough
I don't know much about OCR, but one thing I do know is that many OCR systems try to infer 'questionable' characters based on context. For example, say you have 'LlTTLE' (that second character is actually a lowercase L, not an uppercase I). However, the OCR system will, by checking its dictionary, conclude that it's an I, not a similarly-shaped character. The implication of all this is that if the context is completely nonsensical (as it will be for random characters in a license plate number), the system gets confused.

See if it lets you disable this functionality, and instead operate solely on the shapes of the letters. You may be able to 'teach' it how the letters and numbers look (license plates in one area will share a 'font'). You may also be able to tell it what character patterns to look for with regular expressions (say, three letters, a space, then three numbers). This approach can often resolve otherwise ambiguous characters.
__________________
A man's knowledge is like an expanding sphere, the surface corresponding to the boundary between the known and the unknown. As the sphere grows, so does its surface; the more a man learns, the more he realizes how much he does not know. Hence, the most ignorant man thinks he knows it all. - L. Sprague de Camp
lectricpharaoh is offline   Reply With Quote
Old Jul 14th, 2007, 12:31 AM   #3
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Back in the early to mid 80s OCR was based on a templating approach, as (apparently) suggested by Lectric. In such a system, 'I' and 'l' are virtually indistinguishable. They remained so even in the newer approaches. A lower case script 'I' is not clearly different than a lower case script 'L', if the producer of the text writes in a "loopy" manner. You, as a human, distinguish the difference, not on character shape, but on context.

The best OCR systems take this into account, albeit sometimes poorly. The OCR facilities that are bundled with your $100 printer are about 10 times as good as what we sold to the USPS, for $250,000 dollars, in those years.

If you can figure out how to solve the last few percent of ambiguous items, you won't ever have to worry about standing in the unemployment line.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:45 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC