In the past several years, we have increasingly seen artificial intelligence (AI) in information management-related use cases – typically those involving either simple classification of content as part of the capture or ingestion process, or as a more advanced, learning-based version of optical character recognition (OCR). With 2019 now underway, we’re likely to see AI utilized towards an even bigger opportunity: to manage and optimize an organization’s “digital landfill.”
Think about the mass of content and data that already exists within your organization – that’s your digital landfill. In today’s reality of Big Data and Big Content, finding information within that hulking landfill is like finding needles in a haystack. How do you quickly and reliably locate important information residing within your organization’s various systems and repositories? By using AI to address this information management challenge, you can search for nuggets of useful information, recycle useful information, and even get rid of content that really doesn’t need to be kept.
Here are a few AI applications that will make your information management challenges disappear in 2019.
Metadata is basically information about information, and that makes it very important. If, as they say, “information is everything, and everything is information,” then metadata is king.
This represents a step forward from the days of traditional document management, or even old-school enterprise content management (ECM), when the content – that is, the document – took top billing. In those earlier regimes, each stored document became the focal point for invoice processing, claims management, and other document-centric enterprise processes. Every one of those stored documents contained a set of metadata attributes (or tags), which were typically limited to information such as filename, date created, author, and type of content. For most systems, once the set of metadata stored (or metadata “schema”) was defined, it usually remained untouched because changing metadata schemas required tedious development work and mass updates to all content related to that metadata.
In a modern content services platform (CSP), metadata schemas are both flexible and extensible. This means you can readily and easily add new metadata fields. In addition, much more metadata is being stored and used than ever before — image resolutions, language of a document, geophysical data, and more. CSPs provide increased capability and the ability to utilize metadata much more effectively.
But, you ask, what about the content stored in those legacy solutions? CSPs also can connect to content residing within legacy systems, leaving the content itself in its native repository, while still providing access to that content from the CSP. CSPs also allow you to add metadata properties and data to the legacy content, without making any changes to the legacy system at all.
This is massively powerful when combined with AI, which automates this process. Imagine a legacy ECM repository containing customer documents. These contracts are poorly managed, and the only relevant metadata attributes associated with these documents are customer reference numbers. By using a CSP to pass that content through an AI enrichment engine, you can potentially apply additional metadata attributes to each one of the files currently stored, which immediately injects more context, intelligence, and insight into your information management ecosystem.
Among other attributes, the AI engine can identify:
- The type of each document — contract, correspondence, invoice, etc.
- Documents containing personal information, which then may automatically initiate additional security controls and provisions per privacy policies or regulations.
- Documents that should be deleted per retention policies.
Identifying content for compliance
Every industry has certain retention policies, rules, or other compliance regulations that require organizations to keep documents and records for a specific period. If you can’t determine the content type, how on earth can you apply retention policies to it? Historically, organizations would do so in one of two ways: manually, or not at all. The manual approach was tedious, error-prone, and time-consuming, which led a lot of organizations to take the “keep everything just in case” approach.
An AI-driven engine can classify content stored within legacy systems. AI tools have already proven valuable in identifying the difference between, for example, a contract and a resume; advanced engines expand this principle to build AI models based on content specific to an organization. So, for example, if your business needs to know the difference between a personal life insurance document and a life annuity document, then this can be incorporated into a specifically-trained AI model, which in turn will deliver a much more detailed classification than could ever be possible with a generic classification.
Modern CSPs with such AI capabilities enable organizations apply this to the mass of content stored in those legacy systems. This can add significant benefit to your business, and increase the visibility you have into both your key information assets and liabilities.
Ditch the trash
The “keep it all just in case” approach described above not only exacerbated the digital landfill effect but also meant that a lot of information that could (and often should) have been destroyed, was not. Aside from alleviating the cost of having to store this content ad-infinitum, AI can help mitigate potential legal issues that arise from keeping information longer than you need to.
Part of the challenge of managing records, or even simply applying retention policies, is the sheer volume of content that needs to be managed. In the past, the only way to go through this was by hand, document by document.
AI can help with this. By using AI-classification of content with a CSP, it is possible, at a massive scale, to quickly and easily determine what is and of equal importance what is NOT a record. Many organizations continue storing content that is “R.O.T.” — Redundant, Obsolete or Trivial. Clearing out the ROT makes it easier to identify relevant content to apply retention policies. Moreover, AI can then identify the types of content remaining, matching that to relevant retention rules, and then make recommendations to relevant staff members. This greatly clarifies the whole process of identifying, declaring and managing any records or other information that must be retained for compliance purposes. It’s more scalable, and more cost-effective given that the storage requirements for old content just got slashed.
2019 is the year that AI helps organizations clean out their digital landfill. Whoever thought that sorting the trash out could be such a rewarding exercise?
David Jones is an information management professional with more than 20 years’ experience in the emerging technologies space across multiple industries including big data, analytics, cloud and enterprise content management. As VP of Marketing at AODocs, David is responsible for developing the global go-to-market strategy and execution plan for AODocs modern, intelligent Content Services Platform.