Clustering and Segmentation

Segmentation is the process that groups similar objects together and forms clusters. Thus it is often referred to as clustering. Clustered groups are homogeneous within and desirably heterogeneous in between. The rationale of intra-group homogeneity is that objects with similar attributes are likely to respond somewhat similarly to a given action. This property has various uses both in business and in scientific research.

What are the problems with clustering techniques?

Most clustering techniques are developed for laboratory generated simple data consisting of a few to several numerical variables. Applying these techniques to business data that consist of many categorical complex data suffers from various limitations, as described in the followings;

• Numerical variables and normalization

Most clustering techniques are based on distance calculation. It is noted that distance is very sensitive to ranges of variables. For example, "age" normally ranges 0 ~ 100. On the other hand, "salary" can spread from 0 to 100,000. When both variables are used together, distance from salary can overwhelm the other. Thus, values have to be normalized. However, normalization is rather a subjective function. There is no way we can transform without creating biases.

• Outliers and numerical variables

Related to numerical variables, outliers also create problems in data mining, especially with clustering based on distance calculations. In such systems, outliers should be identified and removed from data mining. (It is noted that outliers are recommended to be removed in all data mining techniques!)

• Categorical variables and binary variable encoding

Dealing with categorical variables (non-numeric data, non-numeric variables, categorical data, nominal data, or nominal variables) are much more problematic. Normally, we use "one-of-N" or "thermometer" encoding. This can introduce extra biases due to numbers of values in categorical variables. Note that one-of-N and thermometer encoding transforms each categorical value into a true-false binary variable. This can significantly increase the total number of variables, which in turn decreases the effectiveness of many clustering techniques. For more, read the section "Why k-means clustering does not work well with business data?".

• Clustering variable selections and weighting

Clustering variable selection is another problem. Selection of variables will largely influence clustering results. A commonly used method is to assign different weight for variables and categorical values. However, this introduces another problematic process. When many variables and categorical values are involved, it's never possible to have best quality clustering. For clustering variable selection methods, read Variable & value link analysis.

• Behavioral modeling on time-variant variables

Capturing patterns (or behaviors) hidden inside time-varying variables and modeling is another difficult problem. In database marketing, it is desirable to segment customers based on previous marketing campaigns, as predictive models, then to execute marketing campaigns based on current customer information (using the same models). Most clustering techniques do not possess this predictive modeling capability.

Lost in Translation?

Transformation process described above obscures hidden patterns in data. Generally speaking, transformation changes information. Therefore patterns discovered from translated information may not truly represent real genuine patterns. At least, it does not produce accurate and precise patterns. For optimum segmentation, you need clustering tools that do not require extensive data transformation.

Why k-means clustering does not work well with business data?

K-means clustering suffers all the problems described above. It is suitable only for simple numerical data, especially laboratory-generated clean scientific data. Generally, business data consist of many categorical variables with complex taxonomic domain structure. These data generally contains noisy information. For instance, CRM, direct marketing customer information, etc., are such examples. K-means is simply not suitable for accurate clustering. Better technology is needed!

Self Organizing Maps (SOM) and Competitive Learning

Self Organizing Maps (SOM), also known as Kohonen Feature Maps, were developed to simulate the way that vision systems work in our brain. Organizations constructed with SOM are very useful in clustering data. It can automatically learn patterns present in data. SOM is based on Neural Network. It is noted that neural networks do not suffer greatly from the limitations discussed above. SOM uses competitive learning techniques to train networks (or to learn patterns). It is often referred to as "Winner takes all strategy", since nodes compete among themselves to display the strongest activation to a given data.

Neural Clustering

CMSR provides a neural clustering procedure that is based on SOM. Neural clustering can be best explained with the figures shown below. In the figures, objects are placed into two dimensional grid cells. There are 81 cells from 9 rows and 9 columns. Note that some cells are empty with no members. Each cell contains most similar objects, i.e., having many similar properties. Objects in neighboring (or nearby) cells are also similar in nature. Closeness of cell distance indicates high degree of similarity. Neural clustering exhibits the following advantages;

• Does not greatly suffer from limitations as in other techniques.
• No extensive data preparation effort is required.
• Clusters are organized in a way that shows closeness of objects in other clusters.
• Provide rich visualization as shown in "Segment Analysis".
• Most importantly, robust pattern detection and clustering.
Segment Analysis

In the left figure, colored circles are pie charts representing distribution for combination of gender and race. Notice that all cells contain objects of the same single type. Furthermore, nearby cells of the same type objects are clustered together. An example of perfect clustering! The middle figure shows histograms for a numerical variable. You will notice that nearby cells have similar distributions. The right is all-in-one distribution charts for a specific cell segment.

Neural clustering is robust in detecting patterns and organizes them in a way that provides powerful cluster visualization, as shown in the above figures. This is extremely useful with marketing and business data. The following is another example of neural clustering. This example is based two numerical variables. You can easily find this type of clustering in scientific research. Notice that how well neural clustering works both numerical and categorical data.

Statistical Predictive Segmentation Modeling

Generally, clustering tools do only one thing. That is to cluster similar objects together of a given dataset. CMSR neural clustering is a segmentation modeling system. It builds segmentation models. Then models are used to segment not only the dataset used for network training but also other datasets stored in database systems. In addition, it can be used for predictive modeling for statistical inferences: probabilities, averages, and classification. As a matter of fact, CMSR uses predictive segmentation modeling to perform data segmentation! Network is trained to learn clustering patterns as well as statistics of resulting segmentation. Then, the network is applied to datasets for segmentation and statistical value prediction. CMSR predictive segmentation modeling is based on Radial Basis Functions (RBF). It's a variant form that allows flexible use of models. The most prominent use of segmentation modeling is behavior modeling on time-variant variables. In addition, it can be used as an alternative modeling method to standard predictive modeling methods.

Behavioral modeling on Time-variant variables

The very idea of clustering along the similar attributes is that people (or objects) with similar properties tend to exhibit similar behaviors. It is very important to note that the similarity in terms of attributes may change over time. Clustering with time-varying variables can be a challenging task with other techniques. In RFM database marketing, for example, it is a common practice to segment customers based on "recency", "frequency", and "monetary values". Once marketing campaign is completed, the values change. Modeling on neither previous nor current values will produce optimal results. It is desirable to develop segmentation models based on the values at the time of previous campaigns, and to segment the customers based on the current values (or at the time of catalog mailing) using the segmentation models. The result will be better segmentation that will exhibit more predictive power! For more, click RFM Segmentation Marketing.

Predictive Modeling vs Statistical Predictive Modeling

The main difference between predictive modeling and statistical predictive segmentation modeling is the sorts of values they predict. In a nutshell, segmentation modeling predicts statistical aggregates such as probabilities and averages. Note that segmentation induces partitions that may consist of multiple instances on which statistical aggregates can be inferred. Predictive modeling, on the other hand, is not designed to predict aggregate values. Segmentation modeling is useful in situations where events interested have very low frequency of occurring and, thus, application of predictive modeling becomes un-suitable. Such events can be best inferred with attached probabilities or averages. Typical examples may include catalog mail marketing, insurance scoring and credit scoring, and so on.

Segmentation Variable Selection Methods

Although neural clustering can automatically adjust variable weights, it is often desirable to work only with variables of significant importance. Limiting to such variables can generate segments with simple and clean profiles. It is noted that careful segmentation-target variable selection is essential in predictive segmentation modeling. Unlike standard predictive modeling, predictive segmentation modeling relies on modeler's manual selection of predictive variables. Otherwise, segmentation may not induce models that show predictive power.

Identification of significant variables can be very difficult without proper tools. CMSR link analysis and predictive neural network can be used for analyzing variable's significance. It is noted that selection of segmentation variables using link analysis and/or predictive neural network assures that segmentation results will have predictive power. For more on segmentation variable selection methods, read Link Analysis.

Similar objects identification : k-nearest neighbors

k-nearest neighbors searching has the similar problems as other clustering techniques. Neural clustering can be used to overcome the limitations posed by other techniques. It is noted that objects in neural clustering are organized into cells where cells are arranged based on similarity. So the cell distance can be used to identify objects similar to a specific object. For example, objects can be segmented into a large number (say, 1000) of cells. Then objects in the cell containing a certain object (and nearby cells) may be selected.

To find out how clustering is used in Market Segmentation.

To find out how clustering is used in Geographic Segmentation.

To find out how clustering is used in Customer Segmentation.

To find out how clustering is used in Database Marketing.

To find out how clustering is used in Direct Marketing.

To find out how clustering is used in Direct Mail Marketing.