Optical character recognition (OCR) hardly gets the credit it deserves. Without it, we’d be stuck in the Dark Ages retyping entire pages to make an edit or convert a file — which would inevitably lead to more mistakes. Collaborating on documents would be a nightmare, and finding information would be a frustrating and painstaking task. And all of those automated document processes and big data and analytics solutions we enjoy today? Impossible without OCR.
OCR has come a long way from a technology that could read and replicate only a handful of fonts. Today’s technology can handle machine or handwritten text in multiple languages on paper and electronic documents, all while maintaining their entire structure, down to that tiny page number at the bottom corner of a page. We can point OCR at specific places on a document, like where an invoice number lives, or at each field on a form, and integrate it with a number of other solutions to optimize business processes. OCR is even smart enough to learn new tricks, like how to recognize weird fonts and characters, or that strange jargon used exclusively by your niche vertical.
New OCR solutions are being developed that are taking into account what needs to be done to documents after they’ve been converted. Businesses need a solution that makes it easy for their workforce to create, work with, edit, collaborate and compare documents on PDFs.
A PDF toolkit
There aren’t many folks who don’t use PDFs throughout their day-to-day duties, which makes it useful to keep a good set of PDF tools handy. And the marketplace is adjusting to meet customer demand.
Longtime OCR developer ABBYY, for instance, has repositioned its flagship product, FineReader 14, as a PDF editing and document comparison tool in addition to its status as an OCR solution. Their latest offering, released earlier this year, added new features to enable workers to do more with PDFs. Users can edit text, extract and modify pictures, tables, charts, graphs, and other elements you’d normally find in a PDF, plus add, delete or reorder pages. In addition, users have access to a full complement of document markup tools, like sticky notes and stamps, and the ability to comment on documents to make it easier to work as a team.
The latest version also provides features to help protect sensitive information. For instance, users can password protect PDFs, and your IT department can control which users have access to what documents, and what they can do with them. In addition, users can redact text and hide metadata before sharing a document with others to avoid disclosing sensitive information. Digital signatures are also supported so documents can be signed and returned right away, rather than needing to print the document before it is signed, then scanned and returned.
While not new, document comparison functionality — which displays and compares two different versions of the same document, even if they’re in different formats — has made it into products like FineReader 14. Document comparison functionality parses the contents of two documents and identifies major changes, such as deleted, added or edited text. In some cases, users can configure the solution to ignore insignificant changes like different fonts and formatting.
OCR will continue to be the workhorse of document technology, translating static documents into actionable information. And as we see OCR solutions incorporate PDF and document comparison tools, its value will only increase, even if we don’t spend a lot of time thinking about it.
is president and senior analyst for BPO Media, which publishes The Imaging Channel and Workflow magazines. As a market analyst and industry consultant, Ames has worked for prominent consulting firms including KPMG and has more than 15 years experience in the imaging industry covering technology and business sectors. Ames has lived and worked in the United States, Southeast Asia and Europe and enjoys being a part of a global industry and community.