Jeff Posted June 28, 2020 Report Share Posted June 28, 2020 In my last job, we encountered a lot of situations where we wanted to gather data from archival documents like PDFs where the text was very regular, consistent, and always in the same font and size. I had developed the means to read targeted text by comparing pixel colors to the background color. I was able to "read" data off of these documents very quickly and it streamlined a lot of processes. It worked best with standard fonts, for example non-italic, non-bold. I created a character map database that lived in a submacro and compared pixel groupings in the document to the database. The data could be transferred to any form, file, or website as required. The process started out in VB, but was gradually transferred into ME. Quote Link to comment Share on other sites More sharing options...
Cory Posted June 28, 2020 Report Share Posted June 28, 2020 What's your question? Quote Link to comment Share on other sites More sharing options...
Cory Posted June 28, 2020 Report Share Posted June 28, 2020 BTW Tesseract OCR worked well for me. Also Omnipage. And I have used free online OCR engines, however I would never use those for sensitive data. Quote Link to comment Share on other sites More sharing options...
rberq Posted June 28, 2020 Report Share Posted June 28, 2020 19 hours ago, Jeff said: In my last job, we encountered a lot of situations where we wanted to gather data from archival documents like PDFs where the text was very regular, consistent, and always in the same font and size. I had developed the means to read targeted text by comparing pixel colors to the background color. I was able to "read" data off of these documents very quickly and it streamlined a lot of processes. It worked best with standard fonts, for example non-italic, non-bold. I created a character map database that lived in a submacro and compared pixel groupings in the document to the database. The data could be transferred to any form, file, or website as required. The process started out in VB, but was gradually transferred into ME. Now that's impressive!!! You did OCR strictly with ME macros??? How much data did you have to get off a typical document -- a few key characters, or whole lines, or pages of text? How large were the fonts, or did you blow up the PDFs to make gigantic letters? Even though you were working with known fonts and sizes, there must have been slight variations between the pixels "read" and the standardized pixel maps. How did you adjust for the differences? How did you determine that a character was, or was not, a match to one of the maps? By sampling a few dozen, or a few hundred, pixels within a known space on the screen? When you say you could read "very quickly", what does that mean in characters per second, or however you measured it? Sounds like wicked good programming fun!!! Quote Link to comment Share on other sites More sharing options...
ihuxihy Posted July 3, 2023 Report Share Posted July 3, 2023 Hello, Is there a way to integrate the OCR to my macro ? Quote Link to comment Share on other sites More sharing options...
acantor Posted July 3, 2023 Report Share Posted July 3, 2023 I don't think it's possible to programmatically incorporate OCR into a Macro Express script. But if you have another application that does the OCR, it may be possible to trigger the OCR from within Macro Express. There are several techniques that could work, e.g., within a Macro Express script, simulate the hotkey that starts the OCR operation. There are limits to what can be accomplished in this way. It might not be easy, or possible, or reliable, for Macro Express and the OCR application to exchange information. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.