Insurance Risk Predictive Modeling & Management

Risk management is very important for insurance industry. Insurance means that insurance companies take over risks from customers. Insurers consider every available quantifiable factors to develop profiles of high and low insurance risk. Level of risk determines insurance premiums. Generally, insurance policies involving factors with greater risk of claims are charged at a higher rate. With much information at hand, insurers can evaluate risk of insurance policies at much higher accuracy. To this end, insurers collect a vast amount of information about policy holders and insured objects. Statistical methods and tools based on data mining techniques can be used to analyze or to determine insurance policy risk levels. Insurance risk predictive modeling is discussed here.

Insurance Risk Analysis

In this page, insurance risk analysis methods are described;

  • Insurance risk factors hotspot profiling
  • Insurance risk predictive modeling
  • insurance risk modeling
  • Insurance scoring
  • Insurance risk-level classification

Hotspot Profiling of Risky Insurance Segments

Profiling insurance risk factors is very important. The Pareto principle suggests that 80%~90% of the insurance claims may come from 10%~20% of the insurance segment groups. Profiling these hotspot segments can reveal invaluable information for insurance risk management. Insurance providers often collect a large amount of information on insured entities. Policy information (such as automobile insurance, life insurance, general insurance, etc.) often consists of dozens or even hundreds of variables, involving both categorical and numerical data with noisy information. Profiling is to identify factors and variables that best summarize the segments.

Combinational factor analysis and Combinatorial blowout!

Analyzing such vast information is an extremely difficult and challenging task. In conventional profiling methods, factor analysis is performed on a few (to several) variables at a time using statistical software. As the total number of variables analyzed increases, the number of combinations to be examined in this way grows combinatorially. When a large number of variables is involved, the number of combinations is too large. Thorough systematic analysis is all but impossible! A conventional method to this problem is to examine only combinations that are likely to have influence. However, hunch can leave out important factors without being noticed.

Fortunately, this problem can be overcome with Hotspot Profiling Analysis Tools. Hotspot profiling analysis drills-down data systematically and detects important relationships, co-factors, interactions, dependencies and associations amongst many variables and values accurately using Artificial Intelligence techniques such as incremental learning and searching, and generate profiles of most interesting segments. It is noted that insurance premiums are normally stipulated with profiles of risky (or very low-risk) policy holders. Hotspot analysis can identify profiles of high (and low) risk policies accurately through thorough analysis of all available insurance data.

Insurance Risk Modeling

If past is any guide for predicting future events, predictive modeling is an excellent technique for insurance risk management. Predictive models are developed from past historical records of insurance polices, containing financial, demographic, psychographic, geographic information, along with properties of insured objects. From the past insurance policy information, predictive models can learn patterns of different insurance claim ratios, and can be used to predict risk levels of future insurance policies. It is important to note that statistical process requires a substantially large number of past historical records (or insurance policies) containing useful information. Useful information is something that can be a factor that differentially affects insurance claims ratios.

Insurance Risk Predictive Modeling Software Tools

CMSR Data Miner supports robust easy-to-use predictive modeling tools. Users can develop models with the help of intuitive model visualization tools. Application and deployment of insurance risk models is also very simple. CMSR supports the following predictive modeling tools;

  • Neural Network is a very powerful modeling tool. It generally offers most accurate and versatile predictive models. It's very easy to develop neural network predictive models with CMSR. Network visualization tools will guide users from configuration, training, testing, and more importantly direct application to databases.
  • Cramer Decision Tree produces most compact and thus most general decision trees. Decision tree can be used for predicting segmentation-based statistical probability of insurance claims.
  • Regression produces mathematical functions for predicting insurance claims. It can be very limiting to be used as general-purpose insurance claims predictive modeling methods. However, regression may be useful for some ad-hoc special case modeling.
  • RME (Rule-based Model Evaluation) is a powerful model integration tool. It can be used to combine a number of predictive models into a single model, producing combined predictions such as maximum, minimum, average, etc. In addition, it can be used to classify combined predictions into classes such as "Very high risk", "High risk", "Medium risk", "Low risk", etc.
  • RME-EP (Rule-based Model Evaluation with Event Processing) is an amalgam of predictive modeling and forward chaining rule engine. It provides powerful platform for Deep Learning style modeling. Very sophisticated models can be developed.

Does Predictive Modeling Work?

Effectiveness of predictive modeling depends on the quality of historical data. If historical data contains information that can predict customer tendencies and behaviors, predictive modeling can be very effective. Otherwise reliable predictive models will be difficult to obtain. How can you know whether your customer data contain predictive information? You need to perform variable relevancy analysis and build models and test!

Do you have recent historical data?

If your organization has data that can be used to develop predictive models, please write us by filling the form CMSR Data Miner Download Application. We will provide software and email technical support up to a year free. You will also receive "Predictive Modeling Guide to Credit and Insurance Risk Scoring" ebooks. If you are unsure of how predictive models can be used, please try MyDataSay Android App.

If you have questions regarding predictive modeling described here, please write to us. (Note that academic questions are not included here.)

Insurance Risk Scoring

Insurance risk scoring is numerical rating of insurance policies. It measures the level of risk of being claimed. This section describes advanced insurance risk modeling and insurance scoring methods;

Why Neural Network?

A commonly used method used in risk prediction is regression. Regression works well if information structure is functional and simple. However it does not perform well on complex information with many categorical variables. Another commonly used method is decision tree. Decision tree is not suitable if dependent variables have heavy skews. Insurance claims data have this skew. This leads neural network to be the choice for insurance risk modeling. The following figure shows a neural network model;

Neural Network

Neural network arranges information in nodes and weight-links as shown in the above figure. Nodes represent input/output values. Nodes are organized into layers: input layer, (optional) internal layers (normally a single layer as in the figure), and output layer. Input layer nodes accept input values. Values of output layer nodes and internal layer nodes are computed by summing up previous layer nodes multiplied by weight-links' values.

Neural network weight-links are computed in such a way that given input values, network produces certain output value(s) for output layer node(s). This process is called as network training. This is performed using past data. Neural network is a heuristic predictive system.

Bias nodes are similar to coefficients in regression. They have value 1 and tend to improve network's learning capability.

In the above chart, positive value weight-links are colored in red. Negative value weight-links are colored in blue. Colors are scaled according to absolute value ratios against the largest absolute value. Absolute value zero is colored in white. Largest absolute value is colored in pure red or blue color. The rest are scaled accordingly.

It is noted that neural network is not good at predicting unseen information. It can make very wild predictions. Thus good training data is very important.

In the following sections, insurance risk modeling steps are described.

Step 1: Develop Neural Network Models

Predictive models infer predictions from a set of variables called independent variables. To develop models, the first step is to analyze which variables contain predictive information through relevancy analysis. Once relevant variables are identified, (neural network) models can be configured and trained using past historical data. Neural network training is a repetitive process which may take long. Fast computer may be needed. Fully trained models should be tested using past historical data before using them. Single models can have bias and weakness. To overcome this, multiple models can be developed and combined as described in the next section.

Step 2: Combine Neural Network Models

Once models are fully trained and tested, they can be integrated to produce combined outputs such as largest (=maximum), smallest (=minimum), average, average without largest and smallest values, etc. This can be done using RME/RME-EP (Rule-based Model Evaluation available in CMSR Data Miner) easily.

The following histogram shows largest(=maximum) scores and risk distribution in past historical data. "RSCORE1" represents the combined largest(=maximum) values horizontally. Vertically risk (=claimed proportion is shown. (Note that the label "Risky" represents historical data which were in claimed state. It is used because it makes sense in classification modeling.) It clearly shows that higher scores have higher proportion of risk in the past historical data. So the models are effective and useful. Note that the neural network models are trained to predict values between 0 and 1. This can be a bit higher and a bit lower value as seen in the histogram.

Histogram of maximum risk scores

The following figure plots data of the above chart. "RID" record identifier is used to spread data horizontally so that data can be seen easily. Vertically it shows values scored by a model. Red circles represent historical data records that were in fact insurance claims. Clearly this plot shows higher the score is, higher the risk. Score 0.6 and above was all insurance claims. Score 0.4 to 0.6 also has high risk. Score 0.2 to 0.4 has medium risk. The rest has low risk.

Scatterplot of maximum risk scores

Step 3: Risk Scores to Risk Classification

Risk scores produced by neural network and RME/RME-EP models can be confusing to users. It will be better if they are verbalized into more easily understood vocabularies such as "Very high risk", "High risk", "Medium risk", "Low risk", etc. The above histogram clearly shows that if maximum risk score is equal or greater than 0.6, it has 100% risk. So it can be coded as "Very high risk". The next class is if maximum risk score is equal greater than 0.3, it has "High risk". The next class is if maximum risk score is equal greater than 0.2, it has "Medium risk". The rest has "Low risk". This classification produces risk distribution as in the following chart.

Proportions of risk classes

This chart shows how each class had risk in the past historical data. This classification is coded using an RME/RME-EP model. You need two RME/RME-EP models: One is to combine scores to produce maximum scores for analysis. The next model is to produce classification and to deploy.

This classification can be extended to include minimum and average risk scores. Expanded documentation of this extension can be found in MyDataSay Android Application. Download is available here MyDataSay Android App.

* Deep Learning version of this risk scoring is described at the bottom of Rule Engine With Predictive Modeling.

** Note that charts used in the page are based on artificially generated data. Your data may not produce similar outcome.

Step 4: Deploy Models for Users

Once models are fully trained, tested and combined into RME/RME-EP models, they are ready to deploy for customer-facing users. We provide the following deployment options;

  • Web Server: Rosella BI Server provides predictive models to users through web. It can support a large number of users. It is optimized for small screen devices such as smart phones and tablets as well as normal computers. In addition, it can be incorporated into your web-based business applications using Java JSP pages and JSON HTTP requests. For more, read Rosella BI Server.
  • Android Application: MyDataSay is an Android application which can be used by an unlimited number of users. Download is available here MyDataSay Android App. You are recommended to download and try MyDataSay. You can learn how predictive modeling can be used.

For more details on these modeling steps, please read Nine Steps Predictive Modeling Guide for Risk Management and Predictive Modeling Cook Book.

Android App for Insurance Risk Predictive Models (Downloads)

Android App for predictive models is available for download. You can install and try out how predictive models are used in insurance risk management on your Android phones and tablets. It's a perfect app for deploying your insurance claims predicting models for customer-facing staff. Eventually insurance risk predicting models should be used by them. Download is available here MyDataSay Android App.

For information about predictive modeling, please read Predictive Modeling Software Tools.