Rosella Machine Intelligence & Data Mining

RME-EP Rule Specification Language

Rete engine is the de facto industry standards for rule-based expert system shells. It has been used extensively in various industries for developing expert systems. RME-EP is a Rete engine based on RME (Rule-based Model Evaluation) language with predictive models. RME-EP is based on SQL-99 syntax. The popularity of SQL as the standard database query language has been largely due to its intuitiveness and rich logical expressions. This is very important for especially developing business rules. RME-EP provides a superb language platform for business rule specification. The following sections describe the RME-EP rule specification language in detail. Further details can be found from the accompanying language manual.

Here are coding conventions;

Reserved words: Reserved words are used by system as keywords, e.g., IF, THEN, CASE, etc. They can be coded in uppercase capitals or lowercase or in mixed mode.
Identifiers: Identifiers are fact names. Identifiers may contain "." and alpha-numeric letters. However white spaces are not allowed. For example, Cust.name, cust.age, etc. Note that identifiers are case sensitive!
Quoted identifiers: Identifiers are not allowed to have white spaces and special characters. Quoted identifiers can. For example, "My Special Dog.age", "MyNew.bride", etc. Note that quoted identifiers use double quotes and case sensitive.
Literals: Non-numeric literals (or strings) are represented in between two single quotes. Numeric literals without a decimal point are considered as INTEGER. Numbers with a decimal point are considered REAL. For example;
- STRING: 'My new String', ...
- INTEGER: 1200, 1563, ...
- REAL: 1.234, 1.234e23, ...
Comments: Line comments can be added after "//". Block comments are enclosed between "/*" and "*/".

Fact names and Data types

Facts are basis of rule inference. Rules are applied based on a set of facts. Facts have a name and a value. With the popularity and support of XML, RME-EP supports dot-notations for fact names. For example, cust, cust.name, cust.age, etc. In addition, quoted identifiers may be used. This is normally used when white spaces are used in names, e.g., "My house.age", "His dog.doctor", etc. Each fact is associated with a data type. Normally one of "INTEGER", "REAL", and "STRING". To define fact data types, the following expressions are used;

   DECLARE ALL AS <data-type> ;
   DECLARE <fact-name> AS <data-type> ;

The first is used to set default fact types. The second is used to define exceptional fact types. The following example shows this. The first line set the all facts as STRING. The others set special numeric data types as exceptions. (Note that STRING is the system default. No need to have the first expression explicitly!)

   DECLARE ALL AS STRING ;
   DECLARE cust.age AS INTEGER ;
   DECLARE cust.income AS INTEGER ;
   DECLARE cust.weight AS REAL ;

Rules

Rules are the grains of RME-EP. Rules have the following syntax;

   RULE <rule-name>: <conditional-expression> ;

Rule names can be numbers or identifiers without white spaces in between letters. In addition, quoted literals may be used as well, e.g., 1, MyRule, "My Rule", etc. Conditional expressions can be one of the followings;

IF-THEN-ELSE expressions
CASE expressions
ON expressions

The followings are examples of RME-EP rules;

RULE 1: IF cust.offer = 'yes' AND cust.gender ='male' THEN THROWEVENT('email', 'Dear Sir ....') END;
RULE 2: IF cust.offer = 'yes' AND cust.gender ='female' THEN THROWEVENT('email', 'Dear Mdm ....') END;
RULE 3: IF cust.age > 40 and cust.income > 50000 THEN SET cust.status AS 'Interested' END;
RULE 4: IF cust.status = 'Interested' THEN SET cust.offer AS 'yes' END;

Deep Learning Style Risk Modeling Example

The following is a RME-EP example for Risk Scoring, a la Deep Learning style. Note that this uses five neural network models: Model1, Model2, Model3, Model4 and Model5. Model1, 2 and 3 evaluate input data. Model4 and 5 are used to integerate results of Model1 ~ 3. In the begining, variables are declared. Rule 1, 2 and 3 evaluate first three neural network models and store on variables "Model1 score", "Model2 score", and "Model3 score", respectively. Rule 4, then, computes maximum, minimum and average values of these three models. Rule 5 and 6 evaluate second tier neural network models "Model4" and "Model5" using three model output values, maximum, minimum and average values. Final result scores will be stored on "Final score" and "Final score1". Then average of these final value is computed by Rule 7. Based on "Final average" and Max/Min/Avg values, risk levels are verbalized by Rule 8. Viz;

Three nueral networks (Model1, Model2 and Model3) are evaluated (Rule 1, 2 and 3).
Maximum, minimum and average scores of three models are computed (Rule 4).
Two upper-level neural networks (Model4 and Model5) are evaluated (Rule 5 and 6).
Final average score of two upper-level networks is computed (Rule 7).
Final average score and max/min/avg scores are used to classify risk (Rule 8).

// define input data fields and values in appearing order;
DECLARE Gender AS STRING INPUT VALUES IN GENDER OF Model1;
DECLARE Race AS STRING INPUT VALUES IN RACE OF Model1;
DECLARE Jobpost AS STRING INPUT VALUES IN JOBPOST OF Model1;
DECLARE CLASSIFICATION1 AS STRING INPUT VALUES IN CLASSIFICATION1 OF Model1;
DECLARE EDUCLEVEL AS STRING INPUT VALUES IN EDUCLEVEL OF Model1;
DECLARE AGEGROUP AS STRING INPUT VALUES IN AGEGROUP OF Model1;
DECLARE Salary AS INTEGER INPUT VALUES IN SALARY OF Model1;

// define output fields in appearing order;
DECLARE "Model1 score", "Model2 score", "Model3 score" AS REAL OUTPUT;
DECLARE "Maximum score", "Minimum score", "Average score" AS REAL OUTPUT;
DECLARE "Final score", "Final score1", "Final average" AS REAL OUTPUT;
DECLARE "Risk level" AS STRING OUTPUT;

RULE 1: // compute model 1 prediction;
IF TRUE THEN
	SET "Model1 score" AS PREDICT(Model1) USING(
		GENDER AS Gender,
		RACE AS Race,
		JOBPOST AS Jobpost,
		CLASSIFICATION1 AS CLASSIFICATION1,
		EDUCLEVEL  AS  EDUCLEVEL,
		AGEGROUP AS  AGEGROUP,
		SALARY AS Salary
	) 
END;

RULE 2: // compute model 2 prediction;
IF TRUE THEN
	SET "Model2 score" AS PREDICT(Model2) USING(
		GENDER AS Gender,
		RACE AS Race,
		JOBPOST AS Jobpost,
		CLASSIFICATION1 AS CLASSIFICATION1,
		EDUCLEVEL  AS  EDUCLEVEL,
		AGEGROUP AS  AGEGROUP,
		SALARY AS Salary
	) 
END;

RULE 3: // compute model 3 prediction;
IF TRUE THEN
	SET "Model3 score" AS PREDICT(Model3) USING(
		GENDER AS Gender,
		RACE AS Race,
		JOBPOST AS Jobpost,
		CLASSIFICATION1 AS CLASSIFICATION1,
		EDUCLEVEL  AS  EDUCLEVEL,
		AGEGROUP AS  AGEGROUP,
		SALARY AS Salary
	) 
END;

// compute max/min/avg;
RULE 4:  // evaluated after local rules evaluated
IF TRUE THEN 
{
	SET "Maximum score" AS MAX("Model1 score", "Model2 score", "Model3 score");
	SET "Minimum score" AS MIN("Model1 score", "Model2 score", "Model3 score");
	SET "Average score" AS AVG("Model1 score", "Model2 score", "Model3 score");
}
END;

RULE 5: // compute final score using Model4;
IF TRUE THEN
	SET "Final score" AS PREDICT(Model4) USING(
		Model1Score AS "Model1 score",
		Model2Score AS "Model2 score",
		Model3Score AS "Model3 score",
		MaximumScore AS "Maximum score",
		MinimumScore  AS  "Minimum score",
		AverageScore AS  "Average score"
	) 
END;

RULE 6: // compute final score1 using Model5;
IF TRUE THEN
	SET "Final score1" AS PREDICT(Model5) USING(
		Model1Score AS "Model1 score",
		Model2Score AS "Model2 score",
		Model3Score AS "Model3 score",
		MaximumScore AS "Maximum score",
		MinimumScore  AS  "Minimum score",
		AverageScore AS  "Average score"
	) 
END;

RULE 7:  // compute the average of final scores;
IF TRUE THEN 
	SET "Final average" AS AVG("Final score", "Final score1")
END;

RULE  8:   // classify risk levels;
CASE  
WHEN "Final average" >= 0.7 THEN 
	SET "Risk level" AS 'Very high risk'    
WHEN "Final average" < 0.01 THEN 
	SET "Risk level" AS 'Low risk'    
WHEN "Final average" < 0.1 THEN 
	SET "Risk level" AS 'Medium risk'    
WHEN "Maximum score" >= 0.3 OR "Minimum score" >= 0.05 OR "Average score" >= 0.2 THEN 
	SET "Risk level" AS 'High risk'    
WHEN "Maximum score" >= 0.2 OR "Minimum score" >= 0.0 OR "Average score" >= 0.1 THEN 
	SET "Risk level" AS 'Medium risk'    
ELSE 
	SET "Risk level" AS 'Low risk'   
END;

For more, please read Predictive Modeling Software.

CMSR Data Miner / Machine Learning Studio

Data Mining Methods

Data Mining Tools