Rosella Machine Intelligence & Data Mining

Why predictive models perform so badly?

There are a number of reasons that predictive models perform poorly. Major source of the problems may be attributed to the followings;

Historical data used to develop models may not show clear predictive patterns.
Past may not be a representative of the future. New patterns have emerged.
Data may have contained biases introduced by screening business practices.
Data may be severely skewed to a class. Models may have been developed in a simplistic way.
Overfitting.

To develop reliable predictive models, the first step is to identify datasets that should be a reasonable representation of future patterns. The second step is to identify biases introduced by business practices. Predictive models built using such data will not be able to predict broadly. Your models must have limited scopes not to cover non existing patterns. Such patterns can be covered using advanced rule engine with machine learning.

Dealing with Data Skew

Data skew may mean different things. Here we assume the differences in ratios amongst predicting classes. For example, in marketing campaigns, the predicting class may consist of "positive" and "negative" responses. Normally, the positive response ratio is very low, say, less than 10%. Predictive modeling can be useful if models can predict the positive respondents accurately. Unfortunately, classification predictive modeling techniques will fail miserably. Better approach is to use scoring methods. The following pages explain this;

Better solutions?

In a nutshell, one-size does not fit for all! All the problems described in this page indicates one direction: models that think like human experts who use knowledge, rule inference and predictive models in decision making. To find out more, read rule engine with machine learning.

CMSR Data Miner / Machine Learning Studio

Data Mining Methods

Data Mining Tools