Clustering and Segmentation
Segmentation is the process that groups similar objects together and
forms clusters. Thus it is often referred to as clustering.
Clustered groups are homogeneous within and desirably heterogeneous in between.
The rationale of intragroup homogeneity is that objects with similar attributes
are likely to respond somewhat similarly to a given action.
This property has various uses both in business and in scientific research.
What are the problems with clustering techniques?
Most clustering techniques are developed for laboratory generated simple data
consisting of a few to several numerical variables. Applying these techniques
to business data that consist of many categorical complex data suffers from
various limitations, as described in the followings;

Numerical variables and normalization
Most clustering techniques are based on distance calculation. It is noted that
distance is very sensitive to ranges of variables. For example, "age" normally ranges
0 ~ 100. On the other hand, "salary" can spread from 0 to 100,000. When both variables are
used together, distance from salary can overwhelm the other.
Thus, values have to be normalized. However, normalization is rather a subjective function.
There is no way we can transform without creating biases.

Outliers and numerical variables
Related to numerical variables, outliers also create problems in data mining,
especially with clustering based on distance calculations. In such systems,
outliers should be identified and removed from data mining. (It is noted that
outliers are recommended to be removed in all data mining techniques!)

Categorical variables and binary variable encoding
Dealing with categorical variables (nonnumeric data, nonnumeric variables,
categorical data, nominal data, or nominal variables) are much more problematic.
Normally, we use "oneofN" or "thermometer" encoding. This can introduce extra
biases due to numbers of values in categorical variables. Note that oneofN and
thermometer encoding transforms each categorical value into a truefalse
binary variable. This can significantly increase the total number of variables,
which in turn decreases the effectiveness of many clustering techniques. For more,
read the section "Why kmeans clustering does not work well with business data?".

Clustering variable selections and weighting
Clustering variable selection is another problem.
Selection of variables will largely influence clustering results.
A commonly used method is to assign
different weight for variables and categorical values. However, this introduces
another problematic process. When many variables and categorical values are
involved, it's never possible to have best quality clustering.
For clustering variable selection methods, read
Variable & value link analysis.

Behavioral modeling on timevariant variables
Capturing patterns (or behaviors) hidden inside timevarying variables and modeling
is another difficult problem. In database marketing, it is desirable to segment
customers based on previous marketing campaigns, as predictive models, then to execute
marketing campaigns based on current customer information (using the same models).
Most clustering techniques do not possess this predictive modeling capability.
Lost in Translation?
Transformation process described above obscures hidden patterns in data. Generally speaking,
transformation changes information. Therefore patterns discovered from translated
information may not truly represent real genuine patterns. At least, it does not
produce accurate and precise patterns. For optimum segmentation, you need clustering
tools that do not require extensive data transformation.
Why kmeans clustering does not work well with business data?
Kmeans clustering suffers all the problems described above. It is suitable only
for simple numerical data, especially laboratorygenerated clean scientific data.
Generally, business data consist of many categorical variables with
complex taxonomic domain structure. These data generally
contains noisy information. For instance, CRM, direct marketing customer information,
etc., are such examples. Kmeans is simply not suitable for accurate clustering.
Better technology is needed!

Self Organizing Maps (SOM) and Competitive Learning
Self Organizing Maps (SOM), also known as Kohonen Feature Maps, were
developed to simulate the way that vision systems work in our brain.
Organizations constructed with SOM are very useful in clustering
data. It can automatically learn patterns present in data.
SOM is based on Neural Network. It is noted that
neural networks do not suffer greatly from the limitations discussed above.
SOM uses competitive learning techniques
to train networks (or to learn patterns). It is often referred to as
"Winner takes all strategy", since nodes compete among themselves
to display the strongest activation to a given data.
Neural Clustering
CMSR provides a neural clustering procedure that is based on SOM.
Neural clustering can be best explained with the figures shown below.
In the figures, objects are placed into two dimensional grid cells.
There are 81 cells from 9 rows and 9 columns. Note that some cells
are empty with no members. Each cell contains most similar
objects, i.e., having many similar properties. Objects in neighboring
(or nearby) cells are also similar in nature. Closeness of cell distance
indicates high degree of similarity. Neural clustering exhibits the following
advantages;
 Does not greatly suffer from limitations as in other techniques.
 No extensive data preparation effort is required.
 Clusters are organized in a way that shows closeness of objects in other clusters.
 Provide rich visualization as shown in "Segment Analysis".
 Most importantly, robust pattern detection and clustering.
Segment Analysis
In the left figure, colored circles are pie charts representing distribution
for combination of gender and race. Notice that all cells contain objects
of the same single type. Furthermore, nearby cells of the same type objects
are clustered together. An example of perfect clustering! The middle figure
shows histograms for a numerical variable. You will notice that nearby
cells have similar distributions.
The right is allinone distribution charts for a specific cell segment.
Neural clustering is robust in detecting patterns and organizes
them in a way that provides powerful cluster visualization, as shown in the
above figures. This is extremely useful with marketing and business data.
The following is another example of neural clustering. This example is
based two numerical variables. You can easily find this type of clustering
in scientific research. Notice that how well neural clustering
works both numerical and categorical data.
Statistical Predictive Segmentation Modeling
Generally, clustering tools do only one thing. That is to cluster similar objects
together of a given dataset. CMSR neural clustering is a segmentation
modeling system. It builds segmentation models. Then models are used to
segment not only the dataset used for network training but also other datasets
stored in database systems. In addition, it can be used for predictive modeling
for statistical inferences: probabilities, averages, and
classification.
As a matter of fact, CMSR uses predictive segmentation modeling to perform data
segmentation! Network is trained to learn clustering patterns as well as statistics
of resulting segmentation. Then, the network is applied to datasets for
segmentation and statistical value prediction.
CMSR predictive segmentation modeling is based on
Radial Basis Functions (RBF).
It's a variant form that allows flexible use of models.
The most prominent use of segmentation modeling is
behavior modeling on timevariant variables. In addition, it can be
used as an alternative modeling method to standard predictive modeling methods.
Behavioral modeling on Timevariant variables
The very idea of clustering along the similar attributes is that
people (or objects) with similar properties tend to exhibit similar behaviors.
It is very important to note that the similarity in terms of attributes
may change over time. Clustering with timevarying variables can be a
challenging task with other techniques. In RFM database marketing, for example,
it is a common practice to segment customers based on "recency", "frequency",
and "monetary values". Once marketing campaign is completed, the values
change. Modeling on neither previous nor current values will produce optimal
results. It is desirable to develop segmentation models based on the values
at the time of previous campaigns, and to segment the customers based on
the current values (or at the time of catalog mailing) using the segmentation
models. The result will be better segmentation that will exhibit more predictive
power!
For more, click RFM Segmentation Marketing.
Predictive Modeling vs Statistical Predictive Modeling
The main difference between predictive modeling and statistical predictive segmentation modeling
is the sorts of values they predict. In a nutshell, segmentation modeling
predicts statistical aggregates such as probabilities and averages. Note that segmentation
induces partitions that may consist of multiple instances on which statistical aggregates
can be inferred. Predictive modeling, on the other hand, is not designed to predict
aggregate values. Segmentation modeling is useful in situations where events
interested have very low frequency of occurring and, thus, application of
predictive modeling becomes unsuitable.
Such events can be best inferred with
attached probabilities or averages. Typical examples may include
catalog mail marketing,
insurance scoring and
credit scoring, and so on.

Segmentation Variable Selection Methods
Although neural clustering can automatically adjust variable weights, it is often
desirable to work only with variables of significant importance. Limiting to
such variables can generate segments with simple and clean profiles.
It is noted that careful segmentationtarget variable selection is essential
in predictive segmentation modeling. Unlike standard predictive modeling,
predictive segmentation modeling relies on modeler's manual selection of
predictive variables. Otherwise, segmentation may not induce models that
show predictive power.
Identification
of significant variables can be very difficult without proper tools. CMSR
link analysis and predictive neural network can be used for analyzing variable's
significance.
It is noted that selection of segmentation variables using link analysis
and/or predictive neural
network assures that segmentation results will have predictive power.
For more on segmentation variable selection methods,
read Link Analysis.
Similar objects identification : knearest neighbors
knearest neighbors searching has the similar problems as other
clustering techniques. Neural clustering can be used to overcome the
limitations posed by other techniques. It is noted that objects in
neural clustering are organized into cells where cells are arranged
based on similarity. So the cell distance can be used to identify
objects similar to a specific object. For example, objects can be
segmented into a large number (say, 1000) of cells. Then objects
in the cell containing a certain object (and nearby cells)
may be selected.

To find out how clustering is used in
Market Segmentation.
To find out how clustering is used in
Geographic Segmentation.
To find out how clustering is used in
Customer Segmentation.
To find out how clustering is used in
Database Marketing.
To find out how clustering is used in
Direct Marketing.
To find out how clustering is used in
Direct Mail Marketing.
Data Mining Tools for Segmentation
CMSR Data Miner supports segmentation tools based on neural networks.
For software information and downloads, please read
CMSR Data Miner.

