Rosella       Machine Intelligence & Data Mining

Computer Vision and Object Detection

Object detection in Computer Vision is to detect certain types of objects in images. More specifically, it detects bounding box information of objects. That is, X/Y-coordinates, width and height of objects. The following images show detected objects in bounding boxes. Green boxes are detected objects.

Vehicle/Car Detection Convolutional Neural Network

Person Detection Convolutional Neural Network

Limitations of YOLO Object Detection Modeling

The biggest limitation of YOLO-style object detection is the poor accuracy. Average accuracy of the claimed state-of-the-art models is below 56%! Only limited applications can be useful. In addition, depth, distance and physical size of objects are not provided. Alternative techniques such as Lidar or radar are recommended.

Object Detection Computer Vision Modeling

Object Detection Convolutional Neural Networks (OD-CNN), aka, YOLO, are used to model object detection computer vision tasks. To do object detection computer vision modeling, fairy good understanding of CNN is essential. Object detection computer vision model development involves the following stages;

  • Image data collection and preparation.
  • Labeling objects of training data images.
  • OD-CNN model network configuration.
  • Model training.
  • Model testing.
  • Deploy into edge computing applications such as Orange Pi.

Computer vision machine learning is a very complex process. You need powerful but still easy to use and learn Machine Learning Software. CMSR Machine Learning Studio is a fully GUI based machine learning model development platform. You don't code anything until you embed CMSR generated model codes into your application projects. You just need to call a model function from your main program. Users can train machine learning models without coding. It's very easy to use and comes with powerful features.

Special Free Full License for Single Board Computer Developers

Special free full license of CMSR ML Studio is available for single board computer developers. With user registration, (limited) technical support will be provided as well.

For free downloads, please visit CMSR Download / Install.

Labeling Objects of Training Data Images

Object detection requires objects in training data images to be marked with a bounding box and a classification label name. In the following figure, two objects in red box are added in the object list (located at the bottom left). One blue box is a current selected object to be added to the object list. To mark an objec bounding box, click the mouse at the top left corner of the bounding box and drag to the bottom right corner. Select a label name and press the "Add" button.

It is noted that this labeling software is also used for creating augmented images: cropping and rotating.

Model Training: Powerful GPU and Large RAM Memory Computer is Essential!

Computer vision is extremely compute intensive. Especially training will take huge computing time on powerful computers. You cannot do on Nvidia Jetson SBC! Computer with powerful GPU and large memory RAM is essential. High performance gaming computers are ideal for machine learning. However this doesn't prevent you developing computer vision models since you can develop small models. More shading GPU cores are better as CMSR Studio can take advantage of bigger shading cores such as Nvidia Cuda cores. CMSR employes fine grained data parallelism. On 896 Nvidia Cuda cores, we observed 165 times fast than single CPU core model training. With more Cuda cores, it can get over 1,000 times fast.

Model training is done with randomized order images. Otherwise, models will develop skews towards later images. To read images randomly, all training data images must be brought into main memory. Otherwise training will be extremely slow. You can estimate needed total RAM memory size in bytes with the following formula;

   total image dataset size = ((image width) * (image height) * 3 + 3) * (number of images)

This should be your maximum image dataset size. For your computer RAM, it should be about twice of this size as OS will also use RAM. If you don't have large memory, you will have to content with small image training datasets. Note that this is CPU RAM size. GPU VRAM size is different. It can be much smaller as GPU VRAM stores only model parameters and some extras.

Object Detection and Classification

Object detection and classification can be done at the same time with OD-CNN. However this is not a good idea for two reasons. First, training with classification is much harder than training without classification. Secondly, model input image size is limited. Small objects don't have enough pixel information to determine accurate classification. This problem can be solved by separating classification from object detection.

Create an object detection model and train. And create an classification model using images appearing in object detection model. This training data can be created automatically using the "Generate classification training data" tool of the "File" menu of the labeling software. Select the import configuration file of the OD-CNN model. And select an output configuration file. Note that the folder of the output configuration file must not contain subfolders. It will automatically create relevant files including object classification images.

Aspect ratio can play an important role in classification. For example, humans have vertically long shape where as dogs have horizontally long shape. Reflecting shape of objects can improve classification accuracy. "Shape & Aspect Ratio" options provide this. When you model with "Inversion", choose "Cubic keep aspect ration with 255 fill". When you model without "Inversion", choose "Cubic keep aspect ration with 0 fill". These options create cubic images with fill values. When width is larger than height, height is extended to have width size with fill values. Acutal values in the middle. When height is bigger than width, width is extended to have height size.

Model Code Generation for Embedded Applications

Forget about ChatGPT thing! CMSR ML Studio can generate highly efficient AI ML codes.

CMSR ML Studio provides easy embedding into applications. Just generate program codes and compile with your project codes. Just call a function from your main code. CMSR can generate the following type codes;

  • Single CPU thread: Java, C, Swift.
  • Multicore CPU: C++, Java.
  • GPU: OpenCL (Java, C++), Cuda (Java, C++), OpenGL ES3 (C++), Metal (Swift, Objective-C).

For a generated program example, please see CMSR Generated Program Example: C++ OpenCL.

Embedding Models into Applications

CMSR generated codes are of very high efficiency. Especially in edge computing, efficiency and speed is one of the most important factors. All you need to code is what you actually use them. The following code shows usage of OD-CNN object detection model in C++ for OpenCL GPU. You will create arrays to receive results. Then call the main evaluate function with parameters. You can repeat "evaluate" function as much as you need in your applications. Of course, your applications should get image data from onboard camera!

#include <iostream>
#include "CMSRModel.hpp"
using namespace std;

int main(void) {
	char gpukernelfile[] = "../"; // GPU shader program
	char devicename[] = "Intel(R) HD Graphics 5500"; // GPU name
	char filename[]   = "data/modelfile.odcnn"; // model parameter file
	char imagefile[]  = "data/odcnnimages.rgb"; // test image file. can get from camera.

	int  IMAGEARRAY[245*245*3];

	int   outputcount;
	int   maxOutCount = 10;
	int   *outClassIndex = new int[maxOutCount+1]; // notice +1 here.
	float *outClassProbability = new float[maxOutCount+1]; // notice +1 here.
	float *outX = new float[maxOutCount+1]; // notice +1 here.
	float *outY = new float[maxOutCount+1]; // notice +1 here.
	float *outWidth = new float[maxOutCount+1]; // notice +1 here.
	float *outHeight = new float[maxOutCount+1]; // notice +1 here.
	int   blackandwhite = 0;
	int   r0g1b2 = 1;

	CMSRModel *model = new CMSRModel();
	// model->verbose = true;

	// initialize resources.
	model->initializeGpuAndModel(devicename, gpukernelfile, filename);

	// the followings can be repeated as many times as needed.
	model->populateImageArray((int*)IMAGEARRAY, imagefile, 245*245*3); // can get from camera.
	outputcount = model->evaluate (
		blackandwhite, /* 1 if black and white, otherwise 0 */
		r0g1b2,        /* 1 if IMAGEARRAY[][][0] is red, otherwise 0 */
		IMAGEARRAY,   /* [row/height][column/width][colors] */
	cout << "Results;\n";
	for (int i=0; i < outputcount; i++) {
		cout << i << ": " 
		<< outClassIndex[i] << " / " << outClassProbability[i] << " / " 
		<< outX[i] << " / " << outY[i] << " / " 
		<< outWidth[i] << " / " << outHeight[i] << " / " 
		<< "\n";

	// release resources and finish.
	model->releaseResources(); cout << " releaseAllGpuResources\n";
	delete model;
	cout << "End.\n";

	return 0;