Redesigning Credit Risk Modeling Chapter6
Chapter 6: The failure of risk-based pricing
This is where I get to mention that I've been observing how lenders set loan pricing for 30 years, namely:
1. Meet the market -- look at your peers, recent yield trends, recent loss trends, recent volume trends, and decide to nudge prices up or down a bit.
2. Pricing by Score -- use moving average loss and prepayment rates by risk tier, run a financial projection, and nudge prices up or down a bit.
The group-think of option 1 failed spectacularly prior to the 2009 Great Recession, and it was not because of the house price collapse. The loans were already bad and mispriced. The fall in house prices meant there was no escape.
The more-advanced lenders regularly follow option 2, but it only works when nothing much is changing... which is a bit ironic. The problem is that no matter how good your credit score, whether bureau or in-house, it is not tied to forward-looking default probabilities with economic scenarios. That connection is usually made through the magic of cut-off scores or equivalently, judgmental pricing changes to reflect assumptions about the future of the economy. (The lack of adjustment for shifts in the borrowing pool will come up later.)
In order to connect credit scores to cash flow models so that we can optimize prices, we must abandon the fixed outcome window approach to scoring. It is not good enough to know the probability, historically, that an account with specific attributes will default in a fixed window of time -- say, 24 or 36 months. Instead, we need to record WHEN the account defaulted, so that we can compare to the product lifecycle and economic conditions. This allows the model to measure the amount of "surprise" in the default.
The solution is panel data models. Panel data is where we observe every account every month. Yes, 20 years ago our storage and compute resources made this difficult, but if you have the resources to create a machine learning model, you can create a panel logistic regression model. This works particularly well when you estimate a vintage analysis (Age-Period-Cohort) model first, so that the lifecycle and environment functions can be provided as fixed inputs while creating the score.
The result, whether origination score or behavior score, is a set of coefficients for input variables that looks just like a logistic regression score today. You might not even notice the shifts in the coefficients, but they adapt to the amount of surprise relative to lifecycle and environment. Consequently, 1) you can deploy a panel data score exactly the way you do traditional scores, 2) they directly add to lifecycle and environment to predict forward-looking, monthly PDs with future economic scenarios, 3) exactly the same can be done for prepayment probability, and 4) you now have an account-level cash flow model to use for yield forecasting and pricing optimization.
Next week, I'll explain how this solves the overfitting problem...
The book is available on Amazon
Joseph Breeden
Posted on LinkedIn