CramerTree - Cramer Decision Tree

The quality of predictive modeling is measured in terms with accuracy on un-seen data. In decision trees, there are two factors that we can measure: accuracy and numbers of nodes. The latter is a very important factor for un-seen data. For the same or similar accuracy, smaller numbers of nodes mean that trees were constructed with more general splitting criteria. Thus they can work better with un-seen data. Note that higher numbers imply that trees use splitting variables with large numbers of values. This can result in negative impact on un-seen data.

CramerTree uses Cramer coefficients as node splitting criteria. In general, it produces best trees measured in both accuracy and numbers of nodes. It is noted that most node splitting criteria inherently favor splits with many branches. This tends to result in trees with a large number of nodes. Cramer leverages this phenomena with degrees of freedom. Therefore, it tends to produce most compact decision trees. It is noted that less braching nodes means more large sized nodes, which translates to higher statistical supports. Generally, this will lead to better predictive accuracy on unseen data. We performed three experiments on personnel data as follows.

  • Test 1: Job category as target: multiple target values, i.e., five values.
  • Test 2: Gender as target: balanced binary target values.
  • Test 3: Race as target: skewed binary target values.

The following is the outcome of the experiments. Note that "(b)" denotes binary splits. The table shows that CramerTree in general produces smaller numbers of trees while maintaining high accuracy. This justifies the use of CramerTree as the default splitting criteria.

Test 1 Test 2 Test 3
accuracy% nodes accuracy% nodes accuracy% nodes
CramerTree   81.8 74 89.7 46 85.7 55
(b) 79.3 71 89.2 43 85.0 49
Entropy Gain Ratio   80.9 86 89.8 40 84.9 66
(b) 81.3 95 89.9 41 85.3 67
Entropy   80.3 86 90.0 60 85.6 60
(b) 81.9 83 90.6 39 85.5 63
X^2 statistics   79.9 85 89.7 46 85.7 54
(b) 79.3 71 89.2 43 85.0 49
X^2 probability   81.6 87 89.7 52 86.3 47
(b) 80.4 77 89.1 47 85.0 49
GINI Index   80.2 82 89.7 46 85.7 55
(b) 79.4 73 89.2 43 85.0 49
Twoing (b) 81.2 89 89.2 43 85.0 49
Expected Accuracy   79.5 75 87.1 63 83.8 56
(b) 78.2 79 86.0 31 82.9 67

For more, read decision trees and drill-down analysis.

For information about software, please read Data Mining Software. Software download is available from the page.

Applications of Decision Tree Classification Predictive Modeling

To find out how decision tree is used in Database Marketing.

To find out how decision tree is used in Targeted Marketing.

To find out how decision tree is used in Direct Marketing.

To find out how decision tree is used in Direct Mail Marketing.

To find out how decision tree is used in Credit Predictive Modeling and Credit Scoring.

To find out how decision tree is used in Insurance Predictive Modeling and Insurance Scoring.