Rosella       Machine Intelligence & Data Mining

Big Data Analytics and Tools for Big Databases: Hive/Hadoop, MySQL, PostgreSQL, ...

Big data means just a big data. Size alone doesn't make it special. Prolonged business activity can create big data. Analyzing time-series patterns of such data can reveal many useful information.

CMSR Data Miner / Machine Learning / Rule Engine Studio is an advanced analytics system incorporating machine learning, rule engine and big data analytics. It provides easy to use big data analytic tools for big data database users. It incorporates dimensional data analysis reporting with advanced time-series regression such as moving average/exponential smoothing, seasonal adjustment, chisquare static analysis, etc. Especially visualization tools provide intuitive information.

Supported and fully tested database systems include Apache Hive+Hadoop, MySQL, MariaDB, PostgreSQL, MS SQL Server, MS Office Access. Other systems such as Oracle, DB2, etc., should work as well. In addition, CSV/TSV text files also supported.

CMSR Studio runs on Windows, Linux, and MacOS.

Main features include;

  • group-by-group reports with regression forecasting and seasonal adjustment
  • time-series charts with regression forecasting and seasonal adjustment
  • cross table reports
  • produce summation data csv files
  • regression
  • smoothing: moving average and exponential smoothing
  • seasonal adjustment
  • chi-square statistic analysis
  • hotspot visualization
  • dimensional charts

For more information and downloads, please visit CMSR Data Miner / Machine Learning / Rule Engine Studio.

Big Data Time-series Trend Analysis

The following figures show group-by-group time-series trend analysis tables. It incorporates moving average/exponential smoothing, seasonal adjustment and chisquare statistic analysis. Green columns are series data. Orange columns are projected values with regression. It uses advanced function fitting techniques to determine best regression functions.

Group by group time-series analysis.

Group by group time-series analysis.

Forecasting with Seasonal Adjustment Using Neural Network

As an alternative approach to regression, neural network can be used to capture time-series trends and seasonal patterns. Note that regression is limited in terms of information used. Neural network can include various related indicators. Neural network is a robust modeling tool. It can capture time-series trends along with seasonal patterns. Details are discussed in the following link. The link also describe how to import neural network models into Excel sheets.

The following YouTube video shows how to develop Time-series Neural Network Models;

Big Data Bar Charts

Categorical bar charts also can reveal trends and patterns as in the following figure;

Big data bar chart.

Big Data 3D Bar Charts

3D bar charts provide three dimensional view of information;

Big data 3D bar chart.

Big Data Cross Table Reports

CMSR cross table reports incorporate chi-square analysis and hotspot visualization as in the following figure;

Big data cross table reports.

Big Data Group-by Report Table

Group-by tables are very common use in report generation. CMSR incorporates hotspot visualization, time-series regression with smoothing and seasonal adjustment. This can be used when database columns represent time-series data.

Big data group-by report table.

Big Data SQL Tools

CMSR provides metadata browsing and data transfers between databases and CSV/TSV files. In addition, the SQL tool can be used to prepare SQL statements and execute. The following figure shows a Hive SQL DDL example for Hadoop CSV/TSV files.

Hive SQL DDL example for CSV file.

How to turn CSV/TSV files into Hive Database Tables

To turn CSV/TSV files into Hive Database Tables, perform the followings;

1. Create a Hadoop directory for your CSV/TSV file as follows. Change the path names for your data

        hadoop fs -mkdir hive/csvfiles
        hadoop fs -mkdir hive/csvfiles/yourcsvfile

2. Load the CSV/TSV file into Hadoop server as follows;

        hadoop fs -put /root//yourcsvfile.csv hive/csvfiles/yourcsvfile/

3. Define a Hive EXTERNAL table as follows. You can do this from the CMSR SQL tool.

CREATE EXTERNAL TABLE yourcsvfile (
CID int,
GENDER string,
RACE string,
AGE int,
SALARY int,
BUYFLAG int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','  --  for TSV, use \t
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/hive/csvfiles/yourcsvfile'  -- hadoop csv file path
tblproperties ("skip.header.line.count"="1"); -- if header has column name

4. Now you are ready to use CMSR Big Data Analytic Tools.


For more and download, please visit CMSR Data Miner / Machine Learning / Rule Engine Studio.