OpenCL and GPU (Apalis iMX8QM SOM)

toddhwang · March 3, 2021, 9:41pm

I got the following through “clinfo”

root@apalis-imx8:~# clinfo
Number of platforms                               1
  Platform Name                                   Vivante OpenCL Platform
  Platform Vendor                                 Vivante Corporation
  Platform Version                                OpenCL 1.2 V6.4.3.p1.305572
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             viv

  Platform Name                                   Vivante OpenCL Platform
Number of devices                                 1
  Device Name                                     Vivante OpenCL Device GC7000XSVX.6009.0000
  Device Vendor                                   Vivante Corporation
  Device Vendor ID                                0x564956
  Device Version                                  OpenCL 1.2 
  Driver Version                                  OpenCL 1.2 V6.4.3.p1.305572
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             996MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     (n/a)
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
=== CL_PROGRAM_BUILD_LOG ===
(6:0) : error : syntax error at 'kernel'
  Preferred work group size multiple              <getWGsizes:1200: create kernel : error -45>
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                4 / 4       
    int                                                  4 / 4       
    long                                                 4 / 4       
    half                                                 0 / 0        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           <printDeviceInfo:68: get CL_DEVICE_HALF_FP_CONFIG : error -30>
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              268435456 (256MiB)
  Error Correction support                        Yes
  Max memory allocation                           134217728 (128MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        65536 (64KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 8192 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             8192x8192x8192 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_ex 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [viv]
  clCreateContext(NULL, ...) [default]            Success [viv]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Vivante OpenCL Platform
    Device Name                                   Vivante OpenCL Device GC7000XSVX.6009.0000
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Vivante OpenCL Platform
    Device Name                                   Vivante OpenCL Device GC7000XSVX.6009.0000
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Vivante OpenCL Platform
    Device Name                                   Vivante OpenCL Device GC7000XSVX.6009.0000

Is GPU correctly working ? I enabled extra packages , “dnn” and “opencl” then OpenCV program just keep using GPU for their own compuation?

PACKAGECONFIG ??= "gapi dnn python3 eigen jpeg png tiff v4l libv4l opencl gstreamer samples tbb gphoto2 \
    ${@bb.utils.contains("DISTRO_FEATURES", "x11", "gtk", "", d)} \
    ${@bb.utils.contains("LICENSE_FLAGS_WHITELIST", "commercial", "libav", "", d)}"

jeremias.tx · March 4, 2021, 7:36pm

Greetings @toddhwang,

It’s not clear, what version of our software are you running?

Best Regards,
Jeremias

toddhwang · March 4, 2021, 7:47pm

I copied and pasted the result of “repo info” on the yocto build sources in your site. I am now buiiling a final image through “bitbake tdx-reference-multimedia-image”. Further, I added up gpu option in a kernel by modifying DTS file. The kernel version begins with “Linux apalis-imx8 5.4.91-32991-g590db576d04d-dirty #1 SMP PREEMPT …” .

Manifest branch:
Manifest merge branch: refs/heads/dunfell-5.x.y
Manifest groups: all,-notdefault
----------------------------
Project: meta-freescale.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-freescale
Current revision: 5337af9072484509a996dcf8e9872972cbcfd8d1
Manifest revision: 5337af9072484509a996dcf8e9872972cbcfd8d1
Local Branches: 0
----------------------------
Project: meta-freescale-3rdparty.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-freescale-3rdparty
Current revision: ed841161a97307ebd901c31c62f8ecbee6baaacf
Manifest revision: ed841161a97307ebd901c31c62f8ecbee6baaacf
Local Branches: 0
----------------------------
Project: meta-freescale-distro.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-freescale-distro
Current revision: 5d882cdf079b3bde0bd9869ce3ca3db411acbf3b
Manifest revision: 5d882cdf079b3bde0bd9869ce3ca3db411acbf3b
Local Branches: 0
----------------------------
Project: meta-openembedded.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-openembedded
Current revision: 5bba79488b7d393d2258d6e917f7bf7b0d7c4073
Manifest revision: 5bba79488b7d393d2258d6e917f7bf7b0d7c4073
Local Branches: 0
----------------------------
Project: meta-qt5.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-qt5
Current revision: 0d8eb956015acdea7e77cd6672d08dce18061510
Manifest revision: 0d8eb956015acdea7e77cd6672d08dce18061510
Local Branches: 0
----------------------------
Project: meta-toradex-bsp-common.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-toradex-bsp-common
Current revision: 2b830c7a4aaf39dc7ea971c638b5042290c9ee1e
Manifest revision: 2b830c7a4aaf39dc7ea971c638b5042290c9ee1e
Local Branches: 0
----------------------------
Project: meta-toradex-demos.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-toradex-demos
Current revision: bf1bc67f2de9f0245da66a3a15d5c439ee7462d7
Manifest revision: bf1bc67f2de9f0245da66a3a15d5c439ee7462d7
Local Branches: 0
----------------------------
Project: meta-toradex-distro.git
Mount path: /home/infot/karma/yocto/oe-core/layers/meta-toradex-distro
Current revision: 2561e4a6cc6d5a8f6badb32ed3ead6eb4b536519
Manifest revision: 2561e4a6cc6d5a8f6badb32ed3ead6eb4b536519
Local Branches: 0

jeremias.tx · March 4, 2021, 9:22pm

Oh you’re using our reference images then, this is different from Torizon. Also what do you mean by “I added up gpu option in a kernel by modifying DTS file.”?

toddhwang · March 5, 2021, 1:38am

&imx8_gpu_ss {
    status = "okay";
}

Doen’t it work for GPU enabing? --;

jeremias.tx · March 5, 2021, 10:35pm

Where are you changing this? The node is already enabled here: imx8-apalis-eval.dtsi « freescale « dts « boot « arm64 « arch - linux-toradex.git - Linux kernel for Apalis, Colibri and Verdin modules

toddhwang · March 8, 2021, 5:56pm

Then you mean that OpenCV is now working with the assistance of GPU ? Wow… it’s too slow and stuttered. Even 70% CPU power is now consumed for this. Would you let me know how I can see GPU is now working through OpenCV libraries?

gustavo.tx · March 8, 2021, 6:19pm

@toddhwang,

You can check the OpenCV build information to make sure it’s compiled with OpenCL support. On Python, you can do this with:

import cv2
print (cv2.getBuildInformation())

Look for the line

OpenCL:                        YES (no extra features)

to check if it’s enabled.

toddhwang · March 8, 2021, 6:36pm

Now I am using OpenCV C++ version, and further I am sure that it had been already compiled with OpenCL when I tested HAVE_OPENCL condition check last week. Is this everything fine only with compilation method? Please would check that an app attached in this post? CPU app Now I am seeing more than 70% CPU utilization when I run face recognition app… 70% is not a good number for us to determine next platform… so please take it again for me. App → “cpu -r -i 1”

denis.tx · March 10, 2021, 12:03am

@toddhwang , which layer are you using to add the OpenCV support? Are you using the recipe from meta-imx? opencv-imx - i.MX OpenCV

toddhwang · March 11, 2021, 5:10pm

Dear Denis.

Not exaclty I am sure that I can understand your question, but I can tell you that I am now using all the software materials from toradex official distribution. It should be right since I used a command “bitbake tdx-reference-multimedia-image” on trying to get final image. you mean that I need to replace opencv sources by another one from source.codeaurora.org site ?

More and more basic+fundamental question - Is it right that we can expect a little bit better computation resource saving from OpenCV application when it comes with “GPU” ? We knew that 50% of the compuation resources was used when we run SVM machnine inside of OpenCV. If we did put single C statement calling SVM machine out of application code, CPU app just printing out 15% ~ 20% CPU utilization - very substantial saving of CPU utilization depending on the invocation of SVM inside of OpenCV. Can it be tightly related with GPU? GPU isn’t simply contributing to 2D/3D graphic manipulation? Do you think I am correctly tracing back to the origination of the problem? I really hope to hear about your opinion. Did you have any experiences on that?

jeremias.tx · March 16, 2021, 7:12pm

I believe Denis is asking what recipe in Yocto are you adding OpenCV support from. The opencv-imx recipes are built specifically to utilize the hardware features of the i.MX platform. Use other generic OpenCV recipes probably wouldn’t be able to utilize the GPU.