Top-down Drill-down Data analysis
Conventional techniques with statistical packages, OLAP Pivot tables and BI software
can be considered as bottom-up approaches. In the approach, fields and their
values are examined one by one manually using statistical data visualization
and reporting tools which generally support a few dimensions. This approach works
well provided the numbers of variables and dimensions are small.
Combinational Factor analysis and Combinatorial Blowout!
The conventional bottom-up approach does not work well if numbers of variables
and dimensions grow. The numbers of combinations that analysts will examine
will grow combinatorially as numbers of variables and dimensions grow.
With the conventional bottom-up manual approach, it becomes increasingly
more difficult to analyze. It is noted that analysts should compare
different combination and compile different results. This process is a time
consuming and error-prone process. Furthermore, if the number reaches certain
level, the approach is simply not practical. So they normally don't perform
systematic thorough analysis, but performs partial analysis only.
It is noted that many business and scientific data consist of dozens
of variables. Many of them are categorical variables that render to
dimensional data analysis. For example, customer survey data,
direct marketing customer records, and government census data
consist of numerous fields.
A better technology is needed!
Hotspot Drill-down Segmentation analysis
Hotspot analysis employs drill-down analytic process using
using Artificial Intelligence techniques such as search and
incremental learning. analysis starts from the whole
population. Step-by-step, it generates hypothesis in all possible directions,
tests (or scores) them with the input data, and order them based on user-selected
scoring criteria. Examples of scoring criteria can be found from
here.
This provides analysts accurate mapping of most interesting segments, i.e., hotspots.
Hotspot drill-down process is performed automatically by the system.
Then analysts can perform tasks in a top-down fashion. Initially,
hotspot search can be used to identify factors, properties, sub-populations, etc.
It offers starting points for top-down data analysis.
The following figure shows an example of hotspot analysis output. Top-left is hotspot
drill-down tree. Top-right shows detailed statistics of hotspots selected.
Bottom left and right provide gains/lift factor analysis.
For more information on Hotspot analysis, click Hotspot analysis.
To know how insurance industry can use hotspot analysis to develop profiles of risky insurance policies,
click here.
Hierarchical Drill-down Segmentation analysis
Another useful drill-down analysis is
decision tree.
Decision tree divides populations into smaller segments repeatedly.
At a node, it selects a single variable in such a way that values of the variable
boost proportions of a largest categorical value in each resulting segments.
If the population is insurance policies, each segmentation will try to increase
the proportion of either never-claimed or claimed policies. This tends to
lead segments with higher portion of claimed policies. Similar analogy applies
to other areas, e.g., credit, finance, direct mail catalog responses,
customer churns, and so on. There are many applications that this type of
segmentation can be useful.
The following figure shows the CMSR Data Miner decision tree
classifier module. It is noted that trees are drawn from left to right. This renders compact
presentation of trees!
Node statistics are shown at the right-hand side. It includes variable-selection criteria scores,
value distribution, and prediction value distribution.
For the insurance example, reds represent claimed customer portions and greens
for never-claimed customers. Nodes in red indicate that over 50% customers of
the segments have claims. Green nodes have less than 50% of claim customers.
In addition to red nodes, nodes with lower height green bar may be of interest.
Note that they represent relatively higher proportions of risky customers.
To find out how drill-down tree is used in
Insurance Risk analysis.
To find out how drill-down tree is used in
Credit Risk analysis.
|