Effectively managing document processing activities is critical to business success. It can help raise employee productivity; reduce the cost and cycle time associated with processing documents such as invoices; strengthen security and enhance customer service. Intelligent data capture is an approach that can help you streamline document processing activities and realize these and other goals.

Defining Intelligent Data Capture

Before looking in more detail at the business benefits and implementation strategies connected with intelligent data capture, let’s start with how I define the term. Intelligent data capture is the process of automatically scanning traditional long-form documents or electronic pages, extracting specific data (regardless of its structure) and integrating the information with downstream workflow processes and systems

When implemented correctly, this approach can save valuable time in end-to-end document processing, which in turn can drive lower overall costs. Other benefits of integrating useful information earlier and more reliably in the document management process include, as spotlighted earlier, an improved customer experience, less manual data entry that in turn improves data quality, and greater security controls enabling an enterprise to better meet today’s compliance requirements.

Why it Matters

With this definition in mind, let’s begin our closer look at intelligent data capture with why I believe it matters. According to my own industry research and discussions with clients and subject matter experts, approximately 80 percent of an organization’s information assets exist as unstructured content. In a sense, this means that the vast majority of your company’s “intelligence” is trapped inside unstructured, unmanaged documents. In this scenario, a tremendous amount of useful information is not maximized via processes and technologies that can efficiently move it into workflow systems that support and help grow the business.

Trapped does not necessarily mean stuck forever. In reality, the data is caught in a bottleneck that hinders its flow to downstream systems, some of which can include:

  • Enterprise risk management systems
  • Client management software
  • Business intelligence dashboards
  • Private or public cloud repositories that are used for collaboration

Two Important Concepts

The key to unlocking this trapped data so it can more effectively support downstream processes is to extract it. We’ll look at this process in more detail shortly. First, it’s important to understand two concepts. The first concerns the level of technology required to extract data from a document. With structured documents, less advanced technology is required to extract data. With semi-structured and unstructured documents, the technology required becomes increasingly more robust in order to accommodate the need to make more complex decisions. Let me explain this in a bit more detail.

Within structured documents, the essential information is consistently located in the same place. Examples of these documents include government I-9 and medical HCFA (Health Care Financing Administration) 1500 forms. Within semi-structured documents, information generally is found in the same location with some variation in how data is displayed, depending on the individual company. Documents such as invoices and purchase orders fall into this category. Unstructured documents display information with the most variation. Data could be located anywhere within these documents, which typically include contracts, correspondence, reports and business proposals. Therefore extracting data from these documents requires more complex decisions, driving the need for more sophisticated technology.

Besides document structure, the second concept that companies need to be clear about is that data enters an enterprise in many different formats. Ironically, technology has actually increased the number of these formats, not reduced them as was anticipated a decade ago. Data is received in the form of paper documents as well as faxes, e-faxes and emails with attachments. Beyond that, information is transmitted as Microsoft Office files (Word, Excel and PowerPoint), PDF documents and more. This is compounded by the fact that data can enter your company from different devices (desktop, laptop or mobile) and multiple locations within and external to your organization.

Why Extract Data?

As I pointed out earlier, businesses extract data in order to make downstream systems and workflow processes more efficient and cost effective. This in turn enables an organization to grow, maintain a competitive edge and realize other significant benefits. Let me provide one practical example of the value of data extraction, which concerns an accounts payable solution.

We teamed with an investment bank that wanted to improve its extraction process and technology. The company had moved its approval process offshore, but wanted to reduce the cycle time for getting invoices into its payment system. Additionally, the bank saw an opportunity to reduce the staff required to manage this process so it could redeploy resources to more strategic areas of the operation.

While our project included using OCR (optical character recognition) technology to extract and index a variety of key data points such as vendor name, invoice number and line item detail, the purchase order number was especially important. Once validated, purchase order invoices could be paid without a departmental approver, which could reduce processing cycle time by as much as a week.

Also, the information for all invoices, including those that required approval, would flow more quickly into our client’s ERP system, which triggered the approval workflow. Due to the enhanced extraction process and technology we put into place, the bank reduced processing cycle time from seven days to one and reassigned 10 staff members to areas that more directly supported its core business. These results indicate the potential business value of effective data extraction.

In my next few articles in this series, I’ll examine intelligent data capture at work, including some key steps to implementing a solution. 

Gary Allen
Gary Allen

is a Solutions Engineer, Information Governance for Canon Business Process Services, Inc., a leading provider of managed services and technology. Please visit www.cbps.canon.com for more information.