|Rosella Predictive Knowledge & Data Mining|
Correlation and Link Analysis
Link analysis takes many forms. The most common use of link analysis is web-page hyper-text links, as described in Web Navigation Pattern Analysis. Another common usage is in Healthcare Fraud Detection. It analyzes association between providers and patients potentially involved in healthcare insurance scams. Other usage of link analysis is described in the subsequent sections.
Correlation analysis and Categorical data
Correlation coefficient is a numerical measurement of linear association between two numerical variables. Correlation analysis is very important in selecting variables for clustering/segmentation and predictive modeling. Correlation coefficients range between -1 and 1. If two variables are perfectly negatively correlated, the coefficient is -1. If two are perfectly positively correlated, it is 1. Simple coefficient computation reveals linear correlations as shown below;(Linear correlation)
However, the following type correlation can not be exposed with simple computation. To compute non-linear correlation, CMSR employs advanced techniques to identify non-linear correlation.(Non-linear correlation)
In addition, correlation coefficients can not be computed directly from categorical variables. Normally, linearization techniques are used.(Non-linear Categorical Correlation)
Correlation analysis is a feature of CMSR Data Miner. Download is available from Data Mining Software.