Rosella Machine Intelligence & Data Mining

Finance / Credit Risk Predictive Modeling and Risk Management

Credit risk analysis (finance risk analysis, credit loan default risk analysis, retail loan delinquency analysis) and credit risk management is important to financial institutions which provide loans to businesses and individuals. Credit loans can occur for various reasons: bank mortgages (or home loans), motor vehicle purchase finances, credit card purchases, installment purchases, retail loans and so on. Credit loans and finances have risk of being defaulted or delinquent. To understand risk levels of credit users, credit providers normally collect vast amount of information on borrowers. Statistical predictive analytic techniques can be used to analyze or to determine risk levels involved on credits, finances, and loans, i.e., default risk levels. Credit risk predictive modeling using Machine Learning methods is discussed here.

Why internal credit scoring?

Personal credit scores are normally computed from information available in credit reports collected by external credit bureaus and ratings agencies. Credit scores may indicate personal financial history and current situation. However, it does not tell you exactly what constitutes a "good" score from a "bad" score. More specifically, it does not tell you the level of risk for the lending you may be considering. Furthermore, in many countries, credit rating system is not available. Internal credit scoring methods described in this page address this problem. It is noted that internal credit scoring techniques can be applied to commercial credits as well.

Credit Risk Analysis and Credit Risk Prediction by Machine Learning

In this page, the following credit risk analysis and credit risk prediction methods are described;

Credit risk prediction and predictive modeling by machine learning.
Credit risk machine learning and deep learning.
Credit risk predictive modeling step-by-step guides.
Credit loans default analysis by hotspot profiling.
Internal credit risk scoring.

Deep learning risk model output.

YouTube Tutorial Videos: Credit Risk Neural Network Modeling

- YouTube video on Neural network modeling for risk management
- Other CMSR Youtube Tutorial Videos

Credit Risk Analysis by Hotspot Profiling of Risky Credit Segments

Credit risk profiling (finance risk profiling) is very important. The Pareto principle suggests that 80%~90% of the credit defaults may come from 10%~20% of the lending segments. Profiling risky segments can reveal useful information for credit risk management.

Credit providers often collect a vast amount of information on credit users. Information on credit users (or borrowers) often consists of dozens or even hundreds of variables, involving both categorical and numerical data with noisy information. Hotspot profiling is to identify factors or variables that best summarize risky segments.

CMSR Studio hotspot profiling tools drill-down data systematically and detect important relationships, co-factors, interactions, dependencies and associations amongst many variables and values accurately using Artificial Intelligence techniques, and generate profiles of most interesting segments. Hotspot analysis can identify profiles of high (and low) risk loans accurately through thorough systematic analysis of all available data. The following figure is an example output of Hotspot Profiling of Risky Credit Segments. It shows most risky credit customer segments;

Credit Hotspot Profiling of Risky Credit Segments

For more on customer risk hotspot profiling, please read customer profiling.

Credit Risk Predictive Modeling and Credit Risk Prediction by Machine Learning

If past is any guide for predicting future events, credit risk prediction by Machine Learning is an excellent technique for credit risk management. Prediction models are developed from past historical data of credit loans, containing financial, demographic, psychographic, geographic information, etc. From the past credit information, predictive models can learn patterns of different credit default/delinquency ratios, and can be used to predict risk levels of future credit loans. It is important to note that statistical process requires a substantially large number of past historical records (or customer loans) containing useful information. Useful information is something that can be a factor that differentially affects credit default/delinquency ratios.

Credit Risk Predictive Modeling Techniques and Software Tools

CMSR Data Miner / Machine Learning / Rule Engine Studio supports robust easy-to-use predictive modeling machine learning tools. Users can develop models with the help of intuitive model visualization tools. CMSR supports the following predictive modeling and machine learning software tools;

Neural Network is a very powerful modeling tool. It generally offers most accurate and versatile models. It's very easy to develop neural network predictive models with CMSR. Network visualization tools will guide users from configuration, training, testing, and more importantly direct application to databases (for testing and scoring).
Cramer Decision Tree produces compact and thus general decision trees. Decision tree can be used for predicting segmentation-based statistical probability of credit loan defaults.
Regression produces mathematical functions for predicting default risk levels. It can be very limiting to be used as general-purpose credit risk predictive modeling methods. However when it is used with above methods, it can be a useful method.
RME-EP (Rule-based Model Evaluation with Event Processing) is an amalgam of (machine learnig) predictive modeling and (forward chaining) rule engine. It provides a powerful platform for Deep Learning models. Very intelligent models can be developed. RME-EP can be used to combine a number of (machine learning) predictive models into a single model, producing combined predictions such as maximum, minimum, average, etc. In addition, it can be used to classify combined predictions into classes such as "Very high risk", "High risk", "Medium risk", "Low risk", etc.

Decision Explainability and Modeling Techniques

As rule-based decision making, decision tree models naturally explain how decisions are reached. Regression also shows some explanations. Neural networks normally have multiple layers. It's very difficult to read how neural networks make decisions. To understand how neural networks reach decisions, neural networks with no hidden layer are recommended. They are regression nerural networks.

Does Credit Risk Prediction by Machine Learning Work?

Effectiveness of credit risk prediction by machine learning depends on the quality of historical data. If historical data contains information that can predict customer tendencies and behaviors, credit risk prediction by machine learning can be very effective. Otherwise reliable credit risk predictive models will be difficult to obtain. Good historical data is essential to produce good predictive models. How can you know whether your customer data contain predictive information? You need to perform variable relevancy analysis and build models and test!

Data is Knowledge!

In this age of Machine Learning, good knowledge can be extracted from good data by automatic means using Machine Learning Algorithms. The importance of good data is essential in Machine Learning as it is Garbage In Garbage Out (GIGO). Garbage data produce garbage models. But good data can result in good Credit Risk Predictive Models that can be used as important risk management tools.

Credit Risk Scoring by Machine Learning - Credit Risk Predictive Models

Credit risk score is a risk rating of credit loans. It measures the level of risk of being defaulted/delinquent. The level of default/delinquency risk can be best predicted with predictive modeling using machine learning tools. Credit risk scores can be measured in terms of default/delinquency probability and/or relative numerical ratings. The following subsequent sections outline credit risk scoring methods by AI Machine Learning;

Why Neural Network and Deep Learning?

A commonly used method used in risk prediction is regression. Regression works well if information structure is functional and simple. However it does not perform well on complex information with many categorical variables. Another commonly cited method is decision tree classification. Decision tree is not suitable if dependent variables have heavy skews. Credit loan data have this skew. Decision tree can work if default/delinquency rate is about, say, 30% ~ 70%. A commonly used method to overcome this problem is the boosting method which duplicates skewed data. But duplication turns outliers into statistically significant patterns, introducing bogus patterns. It produces lots of false positive or negative predictions. This is a bad approach! In fact, all classification methods suffer from mis-classifications.

This leads Neural Network and Deep Learning based on risk-level scoring methods to be the choice for credit risk predictive modeling. The following figure shows a neural network model;

Neural network arranges information in nodes and weight-links as shown in the above figure. Nodes represent input/output values. Nodes are organized into layers: input layer, (optional) internal layers (normally a single layer as in the figure), and output layer. Input layer nodes accept input values. Values of output layer nodes and internal layer nodes are computed by summing up previous layer nodes multiplied by weight-links' values.

Neural network weight-links are computed in such a way that given input values, network produces certain output value(s) for output layer node(s). This process is called as network training. This is performed using past data. Neural network is a heuristic predictive system.

Bias nodes are similar to coefficients in regression. They have value 1 and tend to improve network's learning capability.

The output node "RiskScore" produces risk score values, normally between 0 and 1 which is risk probability.

In the above figure, positive value weight-links are colored in red. Negative value weight-links are colored in blue. Colors are scaled according to absolute value ratios against the largest absolute value. Absolute value zero is colored in white. Largest absolute value is colored in pure red or blue color. The rest are scaled accordingly.

It is noted that neural network is not good at predicting unseen information. It can make very wild predictions. Thus good comprehensive training data is very important. For more on neural network, please read Neural Network.

In the following sections, credit risk modeling steps are described.

Step 1: Develop Neural Network Models

Predictive models infer predictions from a set of variables called independent variables. To develop models, the first step is to analyze which variables contain predictive information through relevancy analysis. Only relevant variables are recommended to be used as independent variables. CMSR Data Miner / Machine Learning Studio provides tools for variable relevancy analysis. For more on this, please read Variable Relevancy and Factor Analysis, aka, Principal Component Analysis, in Predictive Modeling.

Once relevant variables are identified, neural network models can be configured and trained using past historical data. Neural network training is a repetitive process which may take hours and days. Fast computer may be needed. Fully trained models should be tested using past historical data which was not used in training, before using them. Single models can have bias and weakness. To overcome this, multiple models can be developed and combined as described in the next section.

It is very important to note that the state art of predictive modeling is to develop models that are both accurate and general at the same time. If it's not accurate and general at the same time, models won't predict accurately on unseen information. These conflicting goals are difficult to achieve without advanced tools. CMSR Studio is well equiped with poweful modeling tools.

Step 2: Combine Neural Network Models

Once models are fully trained and tested, they can be integrated to produce combined outputs such as largest (=maximum), smallest (=minimum), average, average without largest and smallest values, etc. This can be done using RME-EP (Rule-based Model Evaluation available in CMSR Studio) easily.

Testing credit risk predictive models requires data visualization tools. The following histogram shows largest (=maximum) scores and risk distribution in past historical data. "RiskScore" represents the combined largest (=maximum) values horizontally. Vertically risk (=delinquent/defaulted) proportion (in red color) is shown. Note that the the class label "Risk" represents historical data which were in delinquent/defaulted state. It clearly shows that higher model scores (=RiskScore) have higher proportion of risk in the past historical data. So the models are effective and useful. Note that the neural network models are trained to predict values between 0 and 1. This can be a bit higher and a bit lower value as seen in the following histogram.

The following figure plots data of the above chart. "RECORDSEQ" (=record sequence) is used to spread data horizontally so that prediction information can be seen easily. Vertically it shows values scored by models. Red circles represent historical data records that were delinquent or defaulted. Clearly this plot shows higher the score is, higher the risk. Score 0.6 and above was all delinquent/defaulted. Score 0.4 to 0.6 also has high risk. Score 0.1 to 0.4 has medium risk. The rest has very low risk.

Scatterplot of maximum risk scores

Credit Risk Deep Learning and Decision Support Expert System on Model

Good news! Now you can develop Risk Management Expert Advisor Models using Deep Learning techniques very easily. What you need is relevant data for Deep Learning.

Large neural networks are very powerful. They can learn finely detailed information. This will lead to overfitting. This is no good for predictive modeling which requires learning of statistical patterns. To overcome this overfitting problem, Deep Learning techniques can be used. Instead of large networks with large number of input and hidden layer nodes, multiple smaller specialized neural networks performing smaller decomposed tasks can be used. Multiple smaller models are then integrated with integration neural networks. Advanced credit risk deep learning models can work as Credit Risk Management Decision Support Expert Systems. For more on this technique, please read Rule Engine with Machine Learning, Deep Learning, Neural Network.

Step 3: Risk Scores to Risk Classification

Risk scores produced by neural networks and RME-EP models can be confusing to users. It will be better if they are verbalized into more easily understood vocabularies such as "Very high risk", "High risk", "Medium risk", "Low risk", etc. The above histogram clearly shows that if maximum risk score is equal or greater than 0.6, it has 100% risk. So it can be coded as "Very high risk". The next class is if maximum risk score is equal or greater than 0.4, it has "High risk". The next class is if maximum risk score is equal or greater than 0.1, it has "Medium risk". The rest has "Low risk". This classification can be easily implemented with RME-EP rule engine rules. This produces risk distribution as shown in the following categorical bar chart. This chart shows how each class had risk in the past historical data.

* Note that models and charts used in this page are produced using CMSR Data Miner / Machine Learning / Rule Engine Studio and are based on artificially generated data. For free software download, please visit CMSR Download & Install.

For more details on these modeling steps, please read;

YouTube Tutorial Videos: Credit Risk Neural Network Modeling

- YouTube video on Neural network modeling for risk management
- Other CMSR Youtube Tutorial Videos

Free Software Download

If your organization has data that can be used to develop risk predictive models, please try CMSR Data Miner / Machine Learning Studio. Download from CMSR Download & Install.

Data Mining Methods

Data Mining Tools