Document Extraction with AI and IDP is an AI-driven approach that automates the classification, extraction, and processing of unstructured and semi-structured documents by combining OCR, computer vision, natural language processing (NLP), and machine learning to transform them into accurate, structured information.
In 2016, the Financial Accounting Standards Board (FASB) replaced ASC 840 with ASC 842. This change introduced a new set of rules for how organizations report leases on balance sheets. The new rule affects all leases for warehouses, vehicles, phones, and other equipment. This change represents the most significant update to lease accounting since 1976.
ASC 842 divides leases into operating and finance leases (capital leases). Previously, only capital leases were listed on the balance sheet. Many companies now need to keep a centralized record of their leases, as they were not required to do so in the past. Lease agreements are often hard to track and classify because they are buried in other contracts.
The new FASB requirements impose an additional burden on all US companies, requiring more resources to complete the work. Tackling this complex issue is crucial, as companies could face debt covenant failures and future financing risks.
To comply with the new FASB rules, the fast-food franchise mentioned in the introduction must extract more than 350 fields from lease-related documents weekly and input the data into their lease management system. The challenge is compounded by commercial leasing agents submitting lease documents of varying quality, types (such as PDFs, images, and email attachments), and languages. The documents also contain handwriting, checkboxes, and tables, making them difficult to process uniformly.
After several attempts with different advisory firms and technologies, the client was eager to find a resolution. In the interest of time, Ashling repurposed an array of technical components that were previously built for other solutions.
While trialing several technology combinations across five different approaches, they discovered one particular mix of technologies yielded a superior result. The combo? SS&C Blue Prism Technology, ABBYY Vantage, and GPT-4 turbo.
ABBYY Vantage is a powerful IDP solution for categorizing the different agreement types and extracting segments of the agreement (Preamble, Premises, Renewal Options). Meanwhile, GPT-4 Turbo provided accurate field-level extraction, especially when fed a specific segment of the agreement.
GPT4 and ABBYY Vantage created a consolidated user experience, providing results before any field-level training. This approach—leveraging IDP for classification and segmentation, GenAI for field extraction, and RPA to automate data extraction and entry—achieved an 82% accuracy rate. For the small subset of fields falling outside the configured confidence threshold, ABBYY Vantage’s manual review station was utilized to enable human users to correct machine-extracted values. The machine learning model learns from the human-corrected values and improves over time, resulting in reduced manual review effort.