Improve Document Processing Workflows With Intelligent Data Capture: Part 2

In the second part of this blog series on the topic of intelligent data capture, let me emphasize again that effectively managing document processing activities is critical to business success. It can help raise employee productivity; reduce the cost of processing documents such as invoices, and enhance customer service. Intelligent data capture is a system that can help you update document processing activities in order to realize these and other goals.

I previously focused on the potential business advantages of intelligent data capture. In this and my next article I will address data capture at work, including five key steps to implementing a solution.

The basic flow of the capture process begins with how data typically enters an organization in many different formats that are sent from multiple locations. The information is captured using scanning technology, comprising hardware and software.  OCR (optical character recognition) is applied, and at this point the real power of the approach begins to be harnessed. Data is extracted using business rules, which specify such details as how a document is recognized; what information is needed from a particular document type as well as how this information is identified, where it is sent and in what format.

This is a broad view of how the process works, but how can an organization plan to implement an intelligent data capture solution? We have determined five essential steps that can support a successful implementation.

1. Centralize Intake

Step one is to centralize the intake of information. This is perhaps the most important step in the process of implementing a data capture solution. It revolves around understanding how and where useful data is being received, and in what formats.

Another key to this step is for the organization to identify the type of business model it wants to use for the capture process: internally managed or outsourced.  We’ll examine some pros and cons of each model later in this paper. The point for now is that, depending on the model chosen, the enterprise will want to designate a point of data capture and begin to consolidate information streams through that point. The reason for this is to help ensure that all useful data will be handled by the new workflow and that duplication of efforts will be avoided as much as possible.

It is important to note that centralizing intake does not have to disrupt ongoing processes that the company wants to retain. For example, perhaps the established practice is to send invoices to different locations. The organization might want to continue this approach so that the locations can efficiently maintain current vendor relationships. In this case, the invoices could be scanned locally at multifunctional devices, and then the electronic images could be automatically routed to a central point for processing, keeping the local workflow intact.

2. Automate Capture

Step two involves automating the document capture process using OCR and business rules. Implementing new capture systems of this type can require a substantial investment. Some of the more complex software systems are not easy to deploy and many IT departments lack the expertise to install these solutions. Consequently, service costs from vendors can accumulate.

One way to contain these expenses is to use a phased approach. This includes identifying the important channels and/or locations and bringing them online one at a time. In this manner a company can purchase fewer licenses and reduce installation time and associated costs.

Another possibility is to leverage the expertise and technology that the right managed services provider can deliver. This approach can work well regardless of the centralization model, but is especially helpful for organizations deploying an offsite solution because they can take advantage of the provider’s technology and infrastructure.

3. Classify Documents

The goal of step three is to virtually eliminate manual document sorting by classifying document types, beginning the most important. One challenge in this step is to ensure that explicit rules exist to identify or classify the document types. In our experience working with clients, we often find that manual processes, especially those in place for a long time, may not have specific written rules for document classification. For example, how does the company identify purchase order and non-purchase-order invoices? Is the purchase order number required to be on each invoice? Another challenge is that it may be a regular practice for processors to read some unstructured documents in order determine their type, such as a contract or proposal.

Companies can meet these challenges by first, writing rules for document classification, starting with documents that are either most important or easiest to classify. Second, the implementation team can undertake an analysis designed to help gain better control of unstructured documents. This could involve talking to processors and determining what key words or phrases they look for when classifying documents. Third, the team can investigate the possibility of leveraging a solution that is commonly referred to as clustering. This is a machine learning technology that involves scanning and automatically grouping documents together that have similar characteristics. The document groups can be reviewed, corrected and the clustering system continually adjusted so that over time it can identify document types without the need to code a series of complex rules.

In my next article I’ll examine steps four and five in implementing a successful intelligent data capture solution.

is a Solutions Engineer, Information Governance for Canon Business Process Services, Inc., a leading provider of managed services and technology. Please visit for more information.