Regression analysis isn’t just a statistical tool — it’s a strategic asset for financial institutions (FIs) navigating mounds of mortgage, consumer, and small business lending data and staying ahead of fair lending compliance requirements.
While it’s a game-changer, understanding how and when to apply fair lending regression analysis is crucial to maximizing its potential. Read on to learn everything you need to know to determine if it’s the right tool for your FI and how to integrate it into your fair lending program.
Want more regression analysis insights? Check out our Fair Lending Regression Analysis Primer.
Regression analysis is a statistical model that helps organizations understand how variables interact with each other. Rather than relying on line-by-line comparisons, organizations can evaluate data holistically to uncover patterns and explain disparities.
Regression analysis can be used in fair lending to analyze several factors — such as DTI (Debt-to-Income ratio), LTV (Loan-to-Value ratio), and credit score — to explain disparities and determine areas that need to be explored.
Related: State Fair Lending Enforcement Is Heating Up: Massachusetts Hits Lender with $2.5 Million Settlement
Many FIs face the challenges of navigating massive and complex data sets. When performed correctly, regression analysis can help simplify data testing by identifying variances that need further investigation in order to mitigate fair lending compliance risk. For example, multiple logistic regression is often used to analyze credit or approval decisions, while multiple linear regression is typically applied to pricing.
Underwriting and pricing are two areas of significant risk in lending. From potential biases in the loan underwriting process to unequal loan terms for similar applicants, discriminatory outcomes can spell trouble for lenders.
A lender can use regression analysis to estimate how protected factors (such as race or gender) are statistically related to the likelihood of a given outcome (e.g., approval or denial), controlling for legitimate credit factors.
Here are some other ways regression analysis can help lenders:
Related: 5 Questions to Learn if Fair Lending Regression Analysis is Right for You
The data required for regression analysis depends on an FI’s goals, risk profile, and lending type. While the core fields are often the same, the details shift depending on whether the focus is on underwriting, pricing, or a specific product line.
For most underwriting and loan pricing analyses, key data points include Credit Score, LTV, DTI, Interest Rate, APR, Branch ID, and Loan Term. These variables establish the baseline for how credit and pricing decisions are made across the portfolio.
From there, additional data are layered in depending on the type of lending. Mortgage reviews typically draw on Home Mortgage Disclosure Act Loan/Application Register (HMDA LAR) data, rate type, loan terms, and rate lock information. Auto lending requires vehicle-specific attributes such as age and mileage, along with buy rates and dealer details. Consumer lending analysis often centers on applicant demographics and the factors explicitly used in pricing or underwriting decisions.
Other factors can also shape credit decisions or pricing, often involving more subjective considerations such as special offers or promotional programs, borrower relationships, market-driven pricing differences, or the individual review of credit information.
| Category | Core/Key Data Fields | Additional/Lending-Specific Data | Subjective Factors |
| Underwriting & Pricing | Credit Score, Loan-to-Value (LTV), Debt-to-Income (DTI), Interest Rate, APR, Branch ID, Loan Term | N/A | Special offers or promotions, borrower relationships, market-driven pricing differences, and individual review of credit information |
| Mortgage Lending | Core fields above | HMDA LAR data, rate type, loan terms, rate lock information, etc. | Same as above |
| Auto Lending | Core fields above | Vehicle-specific attributes (age, mileage), buy rates, dealer information, etc. | Same as above |
| Consumer Lending | Core fields above | Credit type, rate set date, loan terms, etc. | Same as above |
Regression analysis is an ideal fit for FIs that:
Even with powerful tools like regression analysis, lenders can undermine their own efforts by making mistakes, such as using a too-small dataset, analyzing inappropriate data, skipping basic fair lending reviews, or relying on poor-quality data.
If your FI can check “yes” next to any of the following statements, adjustments may be necessary before proceeding with regression analysis.
Regression analysis is not robust with small datasets. Larger files yield more accurate and meaningful predictive models. A minimum of 1,000 records is generally recommended, as organizations with fewer records often find regression analysis to be limited.
Regression requires variation in both dependent and independent variables. It will not work effectively if, for example:
Regression relies on clean, accurate data. Rushing into analysis without reviewing data integrity undermines results. At a minimum, ensure that all records include the price (rate/APR), the action taken, and contain no outlier data, such as negative credit scores or other obvious errors.
Variables included in the model should have a direct relationship with the outcome being tested. For example, in underwriting analysis, including a variable such as lock term would invalidate the results, since lock term should not influence underwriting decisions.
Want to learn more about regression analysis? Check out our compliance primer for a detailed breakdown of regression analysis for fair lending.