The Emerging Technology Frontier

Five Steps to Digitization

Bringing it all together To bring it all together, an orchestrator acts as the framework for the system by linking the five independent, modular components together, ensuring that they communicate and provide optimal outputs.

Five main components are necessary for document digitization. These components make up the microservice architecture of Fractal’s Doc. Digit solution and interact with each other as needed to streamline end-to-end document digitization for different business processes. Module 1 : Consolidation The first component involves consolidating data from various sources — emails, chats, and shared locations — into a single source of truth, eliminating duplicate or repetitive documents. A centralized location for all documents ensures that the digitization process runs smoothly.

Today’s biggest challenge

In addition to common issues like data protection, computing, and power cost, document digitization has two significant challenges: handwritten and multilingual documents. The first challenge is the recognition of handwritten documents. Although OCR tech is being used to extract such information, it still needs to be made easier to recognize handwritten text accurately. To solve this problem, Fractal is investigating the application of the intelligent character recognition (ICR) framework, which uses convolutional neural network (CNN) models to determine the most probable characters or words in handwritten text. The second challenge is the digitization of multilingual documents. While it is relatively straightfor- ward to digitize templated documents such as invoices, documents that require an accurate interpretation of context, such as legal contracts, are proving to be much more challenging. We are assessing the potential of different approaches to decipher multilingual documents, ranging from transformer-based models to open-language frameworks that can be tailored for contextual under- standing. Although positive steps have been taken to develop solutions, they are still being tested in controlled environments and are not yet mature. Fractal & the future digitization Our solution for document management has been developed through the collaboration of technical and business teams, focusing on a solution that produces results closer to the business’s specific needs — even if the output is not 100% accurate. This has allowed us to find the sweet spot between accuracy levels from a technical perspective and business validation rules regarding specific information that needs to be extracted. This framework also goes beyond just digitizing and storing information. The envisioned solution is about organizing documents and data and using them to support business operations. In other words, the end goal is more than simply providing structured data– we want to help organizations with their functions, and the applications for this are exciting, vast, and wide-reaching.

Module 2 : IVA OCR (Fractal Image Processing Engine)

The next step is to translate scanned copies of the physical documents into unstructured text. The Fractal IVA platform’s customized optical character recognition (OCR) algorithms offer superior extraction rates, making the digitization process more efficient and accurate. The output is a set of unstructured text containing all content from the original document, including text representations of non-text elements such as nested tables and embedded JPG and PNG files. Module 3 : dCrypt (NLP engine) The third module, dCrypt, is an NLP suite and accelera- tor for post-OCR data preparation that extracts relevant information from the unstructured text corpus. This module is the Core component that allows a high level of customization to address different types of documents and business requirements. Each module in Doc. Digit draws upon the previous module for input but operates independently, providing flexibility in component usage. Modules come with pre-trained and configured components that can be retrained or tweaked based on specific client require- ments. Module 4 : Validation engine The next step is to pass the extracted information through the validation engine, which checks it against simple predetermined rules based on business processes and document standards, such as a character limit for the invoice number. All documents with issues are returned to the submitter for resolution.

Module 5 : Reporting / Consumption Finally, data is summarized and prepared for

consumption through dashboards, integrated into other applications, or even sent directly to customers (e.g., a notification that their ticket has been actioned).

6

© 2023 Fr a c ta l A na ly ti c s I n c. All ri g h ts r ese r ve d

Made with FlippingBook - PDF hosting