Rosella Machine Intelligence & Data Mining | |||
Install OpenCL on Debian and Ubuntu and Armbian for Orange Pi 5 RK3588This installation should also work for other systems that use Rockchip RK3588/RK3588S such as Radxa Rock 5, OKdo Rock 5, Mixtile Core 3588E and Khadas Edge 2! For certain applications such as computer vision (CNN) and generative AI (LLM), GPUs can unleash huge performance that cannot be met by multicore CPUs. Single board computers are very good for robotics. Computer vision is very compute intensive and important for robotics. Orange Pi 5 has quite decent GPU with 64 shading cores. Object detection of computer vision is a good example of GPU usage. For OpenCL GPU programming guides, please read OpenCL Programing Guides with Example. If you are using Joshua Riek's Ubuntu, OpenCL 3.0 may be already installed. In this case, you just need to create a symbolic link and copy CL include files if you wish to compile OpenCL programs. If so, jump to Create Symbolic Link for libOpenCL.so. If this fails, continue with the followings. This page explains how to enable OpenCL on Debian (Bookworm) and Armbian and Ubuntu (Jammy) for Orange Pi 5 Single Board Computers. OS images we tested are Debian and Ubuntu and Armbian images from the Orange Pi 5 Home.
To enable OpenCL, take the following steps; cd /usr/lib sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-x11-wayland-gbm.so cd /lib/firmware sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/firmware/g610/mali_csffw.bin Add the Mali GPU blob to the OpenCL ICD config file as follows; sudo apt install mesa-opencl-icd clinfo On Ubuntu, you may get not found errors, especially huawei links. Ignore them. It looks OK. Proceed with the followings; sudo mkdir -p /etc/OpenCL/vendors echo "/usr/lib/libmali-valhall-g610-g6p0-x11-wayland-gbm.so" | sudo tee /etc/OpenCL/vendors/mali.icd Set the dependencies of the Mali OpenCL as follows; sudo apt install libxcb-dri2-0 libxcb-dri3-0 libwayland-client0 libwayland-server0 libx11-xcb1 Now you can run "clinfo" to check whether OpenCL is working. You will see the following if OpenCL is installed correctly; ...> clinfo Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 2.1 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics ... Platform Extensions function suffix ARM Platform Host timer resolution 1ns ... Check whether some dependencies are missing using ldd command as follows; ldd /usr/lib/libmali-valhall-g610-g6p0-x11-wayland-gbm.so Create Symbolic Link for libOpenCL.soThe directory "/usr/lib/aarch64-linux-gnu/" will have "libOpenCL.so.1.0.0". But no "libOpenCL.so" file. In this case, create a symbolic link as follows. You need to log into root account to create this, say, "su -"; cd /usr/lib/aarch64-linux-gnu/ ln -s libOpenCL.so.1.0.0 libOpenCL.so Copying OpenCL "CL" Folder into "/usr/include"If you want to compile OpenCL programs, you will need to do this section. Your "/usr/include/" directory may not have "CL" folder. In this case, you need to copy "CL" folder in this CLv2.zip (version 2) file (or CLv3.zip (version 3 for Joshua Riek's Ubuntu) file) into the "/usr/include/" directory. Extract/copy "CL" folder into your convenient folder. Log into "root" account, say "su -". "cd" to your folder that has the copied "CL" folder. From there, copy it to "/usr/include" folder as follows; cp -r CL /usr/include Restart your Orange Pi 5. Now you are ready to compile and run. Compile command example is as follows. Change as your need. CMake files can be adjusted accordingly. g++ CMSRModel.cpp OpenclModel.cpp Main.cpp -L "/usr/lib/aarch64-linux-gnu/" -lOpenCL -o app RK3588/Orange Pi 5 OpenCL GPU InformationThe following is Orange Pi 5 GPU information reported by CMSR Machine Learning Studio. [Device 0] Mali-LODX r0p0 - Platform: ARM Platform - Platform version: OpenCL 2.1 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd - Vendor: ARM - Driver version: 2.1 - Address bits: 64 - Compute units: 4 - Max work group size: 1024 - Clock frequency: 1000 MHz - Global memory size: 3724 MB - Local memory size: 32 KB - Max allocation size: 3724 MB - Max work item dimensions: 3 - Max work item sizes: 1024x1024x1024 - Max constant buffer size: 3905794048 - Global memory cache size: 1024 KB - Max on device queues: 1 - Device profile: FULL_PROFILE - C version: OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd On Joshua Riek's Ubuntu, you will see OpenCL version 3.0 as follows; [Device 1] Mali-G610 r0p0 - Platform: ARM Platform - Platform version: OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4 - Vendor: ARM - Driver version: 3.0 - Address bits: 64 - Compute units: 4 - Max work group size: 1024 - Clock frequency: 1000 MHz - Global memory size: 3724 MB - Local memory size: 32 KB - Max allocation size: 3724 MB - Max work item dimensions: 3 - Max work item sizes: 1024x1024x1024 - Max constant buffer size: 3905794048 - Global memory cache size: 1024 KB - Max on device queues: 1 - Device profile: FULL_PROFILE - C version: OpenCL C 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4 RK3588/Orange Pi 5 OpenCL GPU Performance ComparisonThe following shows times (in minutes) taken to train 104 layer computer vision deep neural network with 77 convolution layers on GPUs. This is one epoch training time with 3,400 images. -OPi5 Rk3588 4 eu 64 cores 1000MHz : (76m) -Toshiba Intel i5 internal 24 cu 900MHz : (143m) -MacMini Intel i5 internal 40 cu 1100MHz : (77m) -Acer Intel i7 internal 24 cu 1150MHz : (61m) -Acer Nvidia Quardro T1000 896 cores 1725MHz : (8m) You can see even high-end Intel internal GPUs can barely beat OPi5 GPU! Nvidia external GPU is outstanding. That's why everyone is rushing to buy Nvidia GPUs. Useful OpenCL Example ProgramThe following OpenCL program lists names of GPUs in your system. Create a file, say, "cldevices.c" and copy the following program into the file. #ifndef CL_TARGET_OPENCL_VERSION #define CL_TARGET_OPENCL_VERSION 120 #endif #include <CL/cl.h> #include <stdio.h> int main() { int i, j, k=0; cl_int ret; cl_uint numPlatforms; cl_uint numDevices; int verbose = 1; // get platform IDs; ret = clGetPlatformIDs(0, NULL, &numPlatforms); if (CL_SUCCESS != ret) { if (verbose) printf("Error clGetPlatformIDs() : %d\n", ret); return ret; } cl_platform_id platforms[numPlatforms]; ret = clGetPlatformIDs(numPlatforms, platforms, NULL); char local_dev_buf[250]; cl_device_id devices[20]; // maximum number of GPU devices, say, 20; // search named device or the first GPU device if not specified; for (i = 0; i < numPlatforms; i++) { ret = clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices); if (CL_SUCCESS != ret) { continue; } ret = clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_GPU, numDevices, devices, NULL); if (CL_SUCCESS != ret) { continue; } for (j=0; j < numDevices; j++) { ret = clGetDeviceInfo(devices[j], CL_DEVICE_NAME, sizeof(local_dev_buf), local_dev_buf, NULL); if (CL_SUCCESS != ret) { if (verbose) printf("Error clGetDeviceInfo() : %d\n", ret); continue; } printf("%d : %s\n", k++, local_dev_buf); } } return 0; } To compile this, run the following command. Then run "cldevices". gcc cldevices.c -L "/usr/lib/aarch64-linux-gnu/" -lOpenCL -o cldevices |
|||