Looking At OCR Use Cases In Mortgage Lending

With the costs to process each mortgage continuing to rise, lenders must leverage automation to improve profitability and consistency in their business processes.  With the right Advanced Mortgage OCR solution, mortgage companies have been able to reduce their level of manual document indexing and data entry activity, enabling them to process more loans per day at a lower cost per loan – yielding a leaner process and increased profit margins. 

Advanced OCR, More Than Just Reading Characters

An Advanced Mortgage OCR solution needs to do more than just convert document images to text.  Once converted, an advanced OCR solution should then be able to interpret that text using Semantic Analysis and artificial intelligence (“AI”) rules engines in a similar way a human being would process the content. Based on these results, documents can be automatically indexed and relevant datapoints extracted.  This information is then passed to downstream applications for appropriate routing, and archival. 

Featured Sponsors:


A Technology Vendor with a Unique Approach

For today’s most advanced OCR solution, the OCR process begins with a full-page OCR scan of each image.  This step is unique and typically completed in less than one second per page.  An extremely high-speed OCR process is critical and yet difficult for many vendors to achieve.  It is this performance, which allows every word on the page to be included in the scope of the AI rules engine analysis, just as a human being would interpret the content.  This content evaluation process is unique in terms of the combination of speed and ability to include allpage content in the evaluation scope, thereby making it extremely flexible with documents of varying layout (for example, bank statements).  

OCR in Action Use Cases from Leading Lenders

>>TRID Capture and Audit

The ideal OCR solution provides a rigorous tool for a comprehensive review of each TRID transaction. Typically, during the origination process there are several iterations of both a Loan Estimate and a Closing Disclosure. The most efficient TRID Audit solution is able to extract every data element from all initial and re-disclosed Loan Estimates and Closing Disclosures. The system can be configured to either output all of the data from each document iteration, or output just the differences found from the prior document. Output formats should include MISMO v3.3 or custom XML schemas. 

Featured Sponsors:


In the case where a loan origination system is generating the TRID disclosures, this differential reporting may be something produced by the LOS itself. However, in the correspondent lending channel, or in the case of a split, “borrower-only” and “seller-only” Closing Disclosure transaction, this Advanced OCR solution closes a gap that the LOS is unable to address. 

In these cases where the lender’s LOS does not generate all iterations of the Closing Disclosure and Loan Estimate, a solution is needed that can natively read PDF or scanned TIFF versions of these documents. This type of TRID Audit solution has been developed and tested to support any layout of these documents from any source.

>>UCD Creation and Audit

The Uniform Closing Dataset (UCD) provides a common industry dataset to support the Consumer Financial Protection Bureau’s (CFPB) Closing Disclosure and its ability to be communicated electronically. 

Loans closed on or after September 25, 2017 which are acquired by the  GSEs are required to have both a UCD XML file and after June 25, 2018 an embedded PDF of the associated Borrower Closing Disclosure. 

Over time the UCD is intended to provide the following benefits:

Featured Sponsors:


A. Greater data consistency by promoting better and more efficient data integration and exchange between business partners.

B. A common understanding, as all parties use a consistent approach and language to describe the information on the Closing Disclosure.

C. Improved data accuracy by eliminating the need for proprietary formats that can be costly to maintain and can lead to misinterpretation of the data.

The GSEs are collecting UCD data because it:

A. Helps enhance credit risk management with more data and better quality data.

B. Provides important information to help increase their ability to detect fraud and misrepresentation at loan delivery.

C. Provides additional transparency into the mortgage loan transaction file to help assess whether the loan, as closed, meets the GSE’s eligibility requirements.

Featured Sponsors:


According to the GSEs a PDF of the Closing Disclosure needs to be embedded in the UCD because, “The Borrower Closing Disclosure is the definitive record of the fees, charges, and adjustments that occurred in the loan transaction. As such, it is used to validate that the information provided in the UCD submission is complete and accurate.”

July 2018 UPDATE: As the new requirement for embedding a PDF of the Borrower’s Closing Disclosure was beginning to rollout, leading solution providers engineered a solution to perform an audit to statistically measure the accuracy between the data found on the embedded PDF and the MISMO XML data found in the UCD.  

The right solution provides the tools to determine if the data on the embedded PDF Closing Disclosure source document actually matches the same data within the UCD XML file. While this capability is certainly valuable to GSE entities, it is also possible to use this audit for other loan transfers.  As part of a due diligence process, investors may use this capability to verify that a set of loans to be purchased is as advertised and all critical metadata provided is accurate.

>>HMDA Audit

In order to promote compliance with federal consumer protection laws, lenders are required to submit certain borrower demographic data to the federal government. HMDA (Home Mortgage Disclosure Act) disclosures provide the public with information on the home mortgage lending activities of most lenders.

One of the challenges for a lender in reporting HMDA data is to ensure that the documents from which data is pulled are, in fact, the final versions. Many times errors in HMDA reporting are due to reporting data based on a non-final source document.

The most advanced OCR solution for HMDA Audits searches through an image archive for every version of every document relevant to the HMDA reporting process and automatically determines the final versions. Data is then automatically captured from these final documents via their AI data extraction rules and coalesced into an XML file or spreadsheet to be used for reporting. 

This process provides lenders with a highly automated method for assuring accuracy of required Loan Application Register (LAR) reporting data and to ensure database of record quality for future reporting needs.

What’s New in 2018?

>>OnDemand OCR capture (W2s, Paystubs, and Tax forms)

As the industry continues to look for faster and more efficient ways to capture key data from prospective borrowers, a leading OCR provider has been listening.  Their sub-second speed OCR is the ideal technology platform from which to allow borrowers, loan officers and others to submit supporting loan documentation for quick automated document identification and data field capture.

A user may drag a PDF of their Federal IRS 1040 Income Tax form to a browser-based app, the form will be identified and all data fields captured in a short time frame and immediately available to loan officers and loan origination systems.

>>Necessary but Unique Capabilities

The key capabilities and features of the Paradatec Advanced OCR solution that make these use cases possible are:

A. Sub-second per image full OCR processing

Paradatec advanced indexing and data capture technology is at least 10 times faster than others, which allows them to take an approach others would like to, but just can’t because of their system performance. This capability is unique, and enables Paradatec to evaluate all text on every page, just as a human can but much faster. 

B. Extreme scalability with a small hardware footprint 

Paradatec’s Advanced OCR solution scales from the ability to process over 1,000,000 images daily on a single eight core server to tens of millions of images daily by simply enlisting additional cores into the configuration.

C. Pre-built mortgage OCR library

Over 500 mortgage document types ready to be indexed, and more than 6,000 mortgage loan data fields able to be captured right “out of the box”.  

D. Web services API

Paradatec’s OnDemand OCR feature extends their Advanced OCR capabilities to other applications through seamless integration with a web services API.

E. Document versioning

Documents can be stacked, with like documents consolidated together, to streamline the document versioning process.

F. Bookmarked PDF output

Paradatec’s WritePDF module provides a bookmarked and annotated PDF of the submitted loan package, including a table of contents with links to key data elements within the package.  Clients find this feature invaluable and a significant documentation addition to their inventories of mortgage loans.

Paradatec’s Advanced Mortgage OCR solutionis designed to make mortgage lending faster and more accurate.   In 2017, Paradatec’s Mortgage OCR solution processed over 1,500,000,000 images (representing over 2,500,000 loans), helping lenders and servicers streamline their origination, onboarding and compliance obligations by automating document indexing, automating data extraction, meeting tighter service level agreements, and delivering more accurate data much faster than manual data entry alone. In 2018, Paradatec is on track to again exceed the volumes processed and the automation provided to their lender, servicer, and other technology provider clients in the mortgage lending industry.

About The Author