Jump to content
Macro Express Forums

OCR (Optical Character Recognition)


Recommended Posts

In my last job, we encountered a lot of situations where we wanted to gather data from archival documents like PDFs where the text was very regular, consistent, and always in the same font and size. I had developed the means to read targeted text by comparing pixel colors to the background color. I was able to "read" data off of these documents very quickly and it streamlined a lot of processes. It worked best with standard fonts, for example non-italic, non-bold. I created a character map database that lived in a submacro and compared pixel groupings in the document to the database. The data could be transferred to any form, file, or website as required. The process started out in VB, but was gradually transferred into ME.

Link to comment
Share on other sites

19 hours ago, Jeff said:

In my last job, we encountered a lot of situations where we wanted to gather data from archival documents like PDFs where the text was very regular, consistent, and always in the same font and size. I had developed the means to read targeted text by comparing pixel colors to the background color. I was able to "read" data off of these documents very quickly and it streamlined a lot of processes. It worked best with standard fonts, for example non-italic, non-bold. I created a character map database that lived in a submacro and compared pixel groupings in the document to the database. The data could be transferred to any form, file, or website as required. The process started out in VB, but was gradually transferred into ME.

 

Now that's impressive!!!  You did OCR strictly with ME macros??? 

 

How much data did you have to get off a typical document -- a few key characters, or whole lines, or pages of text? 

How large were the fonts, or did you blow up the PDFs to make gigantic letters? 

Even though you were working with known fonts and sizes, there must have been slight variations between the pixels "read" and the standardized pixel maps.  How did you adjust for the differences?  How did you determine that a character was, or was not, a match to one of the maps?  By sampling a few dozen, or a few hundred, pixels within a known space on the screen?  

When you say you could read "very quickly", what does that mean in characters per second, or however you measured it? 

 

Sounds like wicked good programming fun!!! 

Link to comment
Share on other sites

  • 3 years later...

I don't think it's possible to programmatically incorporate OCR into a Macro Express script.

 

But if you have another application that does the OCR, it may be possible to trigger the OCR from within Macro Express. There are several techniques that could work, e.g., within a Macro Express script, simulate the hotkey that starts the OCR operation.

 

There are limits to what can be accomplished in this way. It might not be easy, or possible, or reliable, for Macro Express and the OCR application to exchange information.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...