Rosella       Machine Intelligence & Data Mining

Profiling

By The Macquarie Dictionary, profiling is "outlining of something seen against a background". Profiles can be developed for human populations, animals, and inanimate objects such as products and parts. Profiling has been used for various purposes. Profiles provide summarized views on study groups. Especially, profiling has been extensively used in describing human populations for various purposes. The following list explains some of special cases of profiling activities on human population;

  • Demographic Profiling describes characteristics of populations in terms of demographic variables such as age, gender, race, education, occupation, income, religion, marital status, family size, children, home ownership, socioeconomic status, and so on.

  • Psychographic Profiling describes characteristics of populations in terms of psychographic variables such as life style, personality, values, attitudes, and so on.

  • Racial Profiling describes characteristics of populations in terms of racial and ethnic background. Often mis-used by law enforcement agencies.

  • Criminal Profiling describes characteristics of criminals.

  • Geographic Profiling describes characteristics of geographic profiles in locating criminals.

  • Customer Profiling describes characteristics of customers.

Combinational Factor Analysis and Combinatorial Blowout!

It is important to note that profiling is often performed on data with the following properties;

  • profiling information can consist of many variables (dozens or hundreds of them).
  • majority of them are categorical variables (or non-numeric variables or nominal variables).

In conventional profiling methods, analysts use visualization and statistical reporting tools. These tools can work on only a few variables at a time. When applied to data with many variables like this, the numbers of cases to be examined grow combinatorially to the numbers of variables. Therefore, thorough systematic accurate analysis of such data is all but impossible. General practice is to examine only variable combinations what experts think promising. However, intuition can omit important trends and patterns emerging. Better ways are needed for timely thorough systematic analysis!

Hotspot Profiling

Hotspot Profiling Analysis can search hotspot profiles systematically from a given set of demographic, geographic, and psychographic variables. It generates accurate profiles based on Artificial Intelligence Search & Incremental Learning techniques. Hotspot search can be based on various performance criteria depending on the types of target variables;

  • Categorical information: probability, Laplace, goodness of fit, entropy, etc.
  • Numerical information: average, total, harmonic ratio, etc.

Generalized Profiling

In business, customer profiling is very important in conducting business. This information provides overall patterns about customers and can play vital role in developing overall marketing strategies.

[Example 1] A company that sells boutique products may develop profiles of customers as follows. Intuitively, most customers are females (91%). Majority of customers work as office worker (78%). Note that office workers have more needs to use beauty products! And so on. This information can be used in selecting magazines or TV and Radio programs for advertisement.

Gender=Female 91%
Vocation=Office worker 78%
Education=High school 67%
Age=20s 36%
Age=30s 31%

Focus Group Profiling

Developing profiles of special focused groups are much harder than that of generalized profiles, especially with many variables. Hotspot analysis can develop focused group profiles very effectively with accuracy.

In insurance industry, profiling is very important in determining premium rates. Typically, insurers collect every information available. However, analyzing thoroughly is not feasible since the number of variables is normally large. The following two examples demonstrate how hotspot analysis can be used in profiling risky insurance policies out of dozens of customer variables;

[Example 2] An insurance company keeps health insurance or life insurance records in its database: gender, age, education, smoking, drinking, sun activity, height, weight (=obesity level), claim payment, etc., as well as contact information. The company wishes to know which health insurance groups are at the highest risk, i.e., have the highest claim ratio. The following is a possible output of hotspot analysis;

profiling software.

[Example 3] An insurance company keeps records on motor vehicle insurance or automobile insurance information in its database containing driver and vehicle information together: Gender, age, license experience, education, occupation, drinking, smoking, mobile phone use; vehicle manufacturer, type, model, year make, etc. The company wishes to know which motor vehicle insurance is at the highest risk groups, i.e., highest average insurance payouts. The following is a possible output of hotspot analysis;

profiling software.

To know more this, click Insurance Risk Analysis.


The following figure shows an example of hotspot analysis output. Top-left is hotspot drill-down tree. Top-right shows detailed statistics of hotspots selected. Bottom left and right provide gains/lift factor analysis.

Hotspot profiling software.

For more information on Hotspot Analysis, click here.


Thorough Systematic Accurate Analysis

The benefit that hotspot analysis provides is that profiling through thorough systematic accurate analysis is possible instantaneously. This frees analysts from time-consuming statistical analysis processes and allows them to focus on interpretation of hotspot profiles identified.

Data Mining Tools for Profiling

It is noted that the content of this page is largely based on CMSR Data Miner tools. For more, read CMSR Data Mining Software.