Seven SharePoint Document-Scanning Best Practices Every Organization Needs to Know
By Nalaka Withanage 10/18/13
Are you planning to scan and digitize all your paper documents and manage them electronically in Microsoft SharePoint? Then you are in the right place. SharePoint is a mature enterprise content management platform that allows you to manage the full lifecycle of content in an organization.
However, in order to get the maximum value of its rich ECM capabilities, you need get the content into SharePoint the right way. If you are coming from a paper-based environment, this requires seamless integration of your scanning solution with SharePoint delivering uninterrupted flow of metadata-rich content into the platform.
Here are the seven best practices companies need to follow in order to plan and execute a successful SharePoint document-scanning project:
1. Define a metadata extraction strategy
SharePoint, as enterprise content management system, allows you to organize, share, manage and find documents in the enterprise. However, your ability to leverage the ECM features in SharePoint depends on how well you define a context into the unstructured content. Defining the context to the documents in SharePoint is done via metadata. Manual metadata entry (manual indexing) is inconsistent, cumbersome and error prone. This is where you might want to look at automated metadata extraction (automated indexing) of all documents in SharePoint during the document-scanning process.
2. Define content search criteria first and work backwards
One of the major goals in going paperless is a seamless content search experience for your users. The ability to find and discover documents and information from any place, any time using any device is fundamental for meeting today’s business demands. Providing an advance search experience requires a proper planning of index fields to find and discover content in an efficient manner. Low-fidelity, general search can be enabled by releasing searchable PDF documents as the scanned output into SharePoint while high-precision search requires upfront planning of managed properties, search schema, taxonomy and search result refinement criteria. Once you have a clear goal and idea of how your end users want to find information, you will have much deeper knowledge of what metadata data fields to associate with the documents at the point of scanning and data extraction.
3. Get your SharePoint content type design right
Content types – including vendor contracts, HR contracts, policy documents, etc. – are the fundamental base artifacts in SharePoint that allows you to organize the content. Content types can have their own metadata fields, workflows and information management policies with retention schedules defined as it applies to specific groups of documents. When you analyze documents of a department, you may find hundreds of content types – obviously this is a challenging task to manage. Therefore, it is important to design a set of base-level and child-level content types as part of your information architecture in SharePoint. Once the content type design is completed, it can be mapped into document scanning and indexing profiles for uninterrupted flow of information into SharePoint straight from your scanning software solution.
4. Centralize the routing and business rules
SharePoint has a built-in document routing and workflow engine that allows you to distribute the documents across the organization based on the metadata properties associated with the documents. This can be enabled via a content organizer feature in SharePoint with the help of drop-off libraries. It is important that you centrally manage business rules that are distributed and enforced across the entire organization. Your scanning software should release the document into drop-off libraries along with right metadata so that SharePoint can make the routing and distribution design from that point forward. Clear separation of document data validation and business rule validation is very important for maintaining a solution in the enterprise. Data validation can be done within your scanning software application at the point of the data is captured off the original document while rule validation must be done centrally in SharePoint upon release.
5. Configure the SharePoint site collection as DRM sites
SharePoint 2010 and SharePoint 2013 have mature document and records management (DRM) features built into the platform. High-volume scanning sites can be configured as DRM sites to manage the full lifecycle of the scanned content and information. When SharePoint sites are configured as DRM sites, it will facilitate information capture, control, store, find and disposition with little or no manual intervention. Some of the most popular features to leverage include place records management, records center, information management policy automation, legal holds, document ID, manage metadata services and termstore.
6. Do your capacity planning and container design right
SharePoint, as an enterprise content management system, has been designed to hold millions of documents if it’s done right. Unlike in a file-share environment, it has various levels of containers where we can store and organize digital content. There are site collections, sites, document libraries, document sets, folders and so on. Did you know that some of these containers have default throttling limits? Therefore, it is important to design a document indexing and cataloguing architecture that can work with the default throttling limits. In a high-volume, batch-scanning scenario, capacity planning for SharePoint containers should assess and plan for both the initial document workload to be scanned as well as the expected annual growth of documents. This is a fundamental design decision that’s required to construct a sustainable document-scanning solution with SharePoint.
7. Select a OCR scanning software solution that has seamless integration with SharePoint
Today there are number of OCR scanning solutions that have varied degrees of integration with SharePoint as an ECM system. Some of scanning software solutions better leverage rich enterprise content management features such as content types, taxonomy, document sets and drop-off libraries. It is equally important to select a scanning solution that has zero footprint (no installation of software) on the SharePoint environment.
Modern intelligent scanning software solutions can handle enterprise content with minimal or no human intervention required. Some of the features to look for in selecting an enterprise OCR scanning and capture solution include:
— Distributed capture and centralized processing architecture
— Capture documents and content from multiple input channels (paper, fax, email attachments, web, file share, social, mobile etc.).
— Automated document separation and image enhancement
— Auto-Classification of documents against SharePoint taxonomy
— Automated data extraction, validation and data redaction; Learn by example (auto learning) features
A fine blend of OCR scanning ECM features in SharePoint allow you to manage the full lifecycle of content information from the point of origination as a profitable asset. Data Capture Experts has helped number of organizations with the design and implementation of high-volume, document-scanning solutions leveraging the enterprise content management features built into SharePoint.
Nalaka Withanage is the founder and CTO of Data Capture Experts, a company specializing in enterprise content management solutions with Microsoft SharePoint. Data Capture Experts’ approach to information lifecycle management uses a proven digital content analysis and transformation framework that allows companies to capture, control, store, find, and deliver content and documents related to organizational processes. This framework helps organizations, maximize the value of their SharePoint investment to drive process efficiencies and minimize regulatory compliance risks.