What is Intelligent Document Processing? Why IDP Matters in Business

Join today’s top leaders online at the Data Summit on March 9. Register here.


Paperwork is the lifeblood of many organizations. According to one source, 15% of a company’s revenue is spent creating, managing and distributing paper documents. But documents are not only expensive, they are time-consuming and error-prone. More than nine in 10 employees responding to an ABBY survey in 2021 said they waste up to eight hours a week sifting through documents to find data, and that using the traditional method to create a new document takes an average of three hours and results in six punctuation, spelling, omission or printing errors.

Intelligent Document Processing (IDP) is touted as a solution to the problem of file management and orchestration. IDP combines technologies such as computer vision, optical character recognition (OCR), machine learning and natural language processing to scan paper and electronic documents and extract data from them – as well as analyze them. For example, IDP can validate information in files such as invoices by cross-referencing them with databases, lexicons, and other digital data sources. The technology can also sort documents into different storage compartments to keep them up to date and better organized.

Because of IDP’s potential to reduce costs and free up employees for more meaningful work, interest in it is on the rise. According to KBV research, the IDP solutions market could reach $4.1 billion by 2027, growing at a compound annual growth rate of 29.2% from 2021.

Process documents with AI

Paper documents abound in all industries and businesses, no matter how strongly the industry or business has embraced digitization. Whether for compliance, governance, or organizational reasons, businesses use files for things like order tracking, records, purchase orders, statements, maintenance logs, onboarding employees, complaints, proof of delivery, etc.

A 2016 Wakefield study shows that 73% of “owners and decision makers” of companies with less than 500 employees print at least four times a day. As Randy Dazo, Group Director at InfoTrends, explained to the CIO in a recent article, employees use printing and scanning for both ad hoc business processes (for example, because it’s more “ instantly” to scan a receipt) and for “transactional” processes. process (as part of a daily workflow in human resources, accounting and legal departments).

Adopting digitization alone cannot solve all processing bottlenecks. In a 2021 study published by PandaDoc, over 90% of companies using digital files still found business proposals and HR documents difficult to create.

The answer – or at least part of the answer – lies in the IDP. IDP automates the processing of data in documents, which involves understanding what the document is about and the information it contains, extracting that information, and sending it to the right place.

IDP platforms start by capturing data, often from multiple document types. The next step is recognizing and classifying things like form fields, customer and company names, phone numbers, and signatures. Finally, the IDP platform validates and verifies the data – either through rules, humans in the loop, or both – before integrating it into a target system, such as data management software. customer relationship or enterprise resource planning software.

OCR and handwritten text recognition are two ways IDP recognizes data in documents. Technologies that have been around for decades, OCR and handwritten text recognition attempt to capture key features of text, glyphs, and images, such as global features that describe the text as a whole and local features that describe parts of it. individual characters of the text (such as symmetry in the letters).

When it comes to recognizing images or image content, computer vision comes into play. Computer vision algorithms are “trained” to recognize patterns by “looking” at collections of data and learning , over time, the relationships between data elements. For example, a basic computer vision algorithm can learn to distinguish cats from dogs by ingesting large databases of images of cats and dogs captioned “cat” and “dog” respectively.

OCR, handwritten text recognition and computer vision are not flawless. In particular, computer vision is susceptible to biases that can affect its accuracy. But the relative predictability of documents (for example, invoices and barcodes follow a certain format) allows them to work well in IDP.

Other algorithms handle post-processing steps such as brightening and removing artifacts such as ink smudges and blots from files. As for text comprehension, it generally comes under natural language processing (NLP). Like computer vision systems, NLP systems improve their understanding of text by examining many examples. Examples come in the form of documents in training datasets, which contain terabytes to petabytes of data fetched from social media, Wikipedia, books, software hosting platforms like GitHub, and others. public web sources.

NLP-based document processing can allow employees to search for key text in documents or highlight trends and changes in documents over time. Depending on how the technology is implemented, an IDP platform can bundle onboarding forms into a folder or automatically paste salary information into relevant tax PDFs.

The final stages of IDP may involve robotic process automation (RPA), a technology that automates tasks traditionally performed by a human using software robots that interact with business systems. These AI-powered bots can handle a slew of tasks, from moving files between databases to copying text from a document, pasting it into an email, and by sending the message.

With RPA, a company could, for example, automate the creation of reports by having a software robot extract various processed documents. Or they could eliminate duplicate entries in spreadsheets in various file formats and programs.

Growing IDP Platforms

Attracted by the huge addressable market, a growing number of vendors are offering IDP solutions. While they don’t all take the same approach, they share the goal of abstracting ranking that would otherwise be done by a human.

For example, Rossum provides an IDP platform that extracts data while making corrections through what it calls “spatial OCR (optical character recognition)”. The platform basically learns to recognize different structures and patterns of different documents, such as the fact that an invoice number may be at the top left of one invoice but elsewhere in another.

Another IDP provider, Zuva, focuses on contract and document review, offering out-of-the-box trained models that can extract data points and present them as questions and answers. M-Files applies algorithms to document metadata to create structure, unifying the categories and keywords used within a company. Meanwhile, Indico ingests documents and does post-processing with models that can classify and compare text as well as detect sentiment and phrases.

Among the tech giants, Microsoft uses IDP to pull knowledge from emails, messages, and documents from paying organizations into a knowledge base. Amazon Web Services’ Textract service can recognize scans, PDFs, and photos and forward all extracted data to other systems. For its part, Google hosts DocAI, a collection of AI-powered document analyzers and tools available through an API.

How IDP makes the difference

According to IDC, 42% of knowledge workers say paper-based workflows make their daily tasks less efficient, more costly, and less productive. And Foxit Software reports that more than two-thirds of companies admit their need for paperless office processes has increased during the pandemic.

The benefits of IDP cannot be overstated. But its implementation is not always easy. As KPMG analysts point out in a report, companies run the risk of failing to define a clear strategy or achievable business goal, failing to keep humans informed, and misjudging IDP technology possibilities. Companies that operate in highly regulated industries may also need to take additional security measures or precautions when using IDP platforms.

Yet the technology promises to transform the way companies do business, while saving money in the process. “Semi-structured and unstructured documents can now be automated faster and with greater accuracy, leading to happier customers,” writes Lewis Walker of Deloitte. “As business leaders evolve to gain competitive advantage in the age of automation, they will need to unlock higher value opportunities by processing documents more efficiently and turning that information into deeper insights faster. than ever.”

VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Learn more

Comments are closed.