GPU acceleration on iMX8MP

Sudharshan · December 2, 2024, 6:33am

Hi,

I’ve been trying to utilize GPU on iMX8MP for showing a live feed on screen using wxWidgets.
By default wxwidgets won’t utilize openGL, so I used wxCanvas to get the job done. But I’m getting below mentioned error

runtime directory ‘/tmp/1000-runtime-dir’ is not owned by UID 0, but a directory permissions 0700 owned by UID 1000 GID 0
libEGL warning: MESA-LOADER: failed to open vivante: /usr/lib/dri/vivante_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/aarch64-linux-gnu/dri:$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)

I am running a weston container for GUI. My application reside in wayland-vivante based container. How to make sure vivante drivers works?

Regards
Samarth

ToradexAI · December 2, 2024, 6:34am

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

Hello Samarth,

Thank you for reaching out with your question about GPU acceleration on the iMX8MP using wxWidgets. It sounds like you’re encountering an issue with the Vivante drivers. Let’s go through some steps to help resolve this:

Verify Driver Installation: First, ensure that the Vivante GPU drivers are installed in your container. These drivers are crucial for enabling hardware acceleration on the iMX8MP. Check if the package libgal-vivante1 or a similar package is installed in your container.
Container Configuration: Make sure your Wayland-Vivante container is configured to access the GPU. This typically involves ensuring that the /dev/galcore device is accessible within the container. You can achieve this by adding the following to your Docker run command:
```
--device=/dev/galcore
```
Weston Configuration: Since you are using a Weston container for the GUI, ensure it is correctly set up to communicate with your application container. This involves sharing the Wayland socket between the containers. You can do this by bind-mounting the /tmp directory:
```
-v /tmp:/tmp
```
This allows the Wayland client (your application) to communicate with the Weston compositor.
Check Permissions: The error message about the runtime directory suggests a permissions issue. Ensure that the UID and GID used in your container match those expected by the Weston container. You might need to adjust the user running the application to match the UID 1000.
Troubleshooting: If the Vivante drivers are still not found, verify the library paths in your container. The error indicates that the library is being searched in /usr/lib/dri. Ensure that the Vivante driver files are present in the expected directories or adjust the LD_LIBRARY_PATH to include the correct path.

By following these steps, you should be able to resolve the issue with the Vivante drivers and ensure that your application can utilize GPU acceleration effectively on the iMX8MP with Torizon.

If you have any further questions, feel free to ask.

Best regards,
Toradex AI Assistant

Please, also check if these links can help you solve your problem:

GPU performance in Chromium container on Torizon OS iMX8M-Plus.

jeremias.tx · December 2, 2024, 7:21pm

Greetings @Sudharshan,

Just to clarify what version of Torizon OS are you using here?

Also when you say:

I am running a weston container for GUI. My application reside in wayland-vivante based container.

Do you mean you’re using the torizon/wayland-base-vivante container image as your base? If yes, then what version tag of this are you using?

Best Regards,
Jeremias

Sudharshan · December 3, 2024, 2:19am

Hi @jeremias.tx ,

PFB my software stack

Software summary
------------------------------------------------------------
Bootloader:               U-Boot
Kernel version:           5.15.129-6.5.0+git.6f8fd49366db #1-TorizonCore SMP PREEMPT Fri Dec 22 11:15:52 UTC 2023
Kernel command line:      root=LABEL=otaroot rootfstype=ext4 quiet logo.nologo vt.global_cursor_default=0 plymouth.ignore-serial-consoles splash fbcon=map:3 ostree=/ostree/boot.0/torizon/0675ef60b6510e69b99103e48d52b4fc37504a16b10cb79a10beaee292d75b0c/0
Distro name:              NAME="TorizonCore"
Distro version:           VERSION_ID=6.5.0-build.8
Distro variant:           VARIANT="Docker"
Hostname:                 verdin-imx8mp-xxxxx
------------------------------------------------------------

Hardware info
------------------------------------------------------------
HW model:                 Toradex Verdin iMX8M Plus on Verdin Development Board
Toradex version:          0063 V1.1A
Serial number:            xxxxxxxx
Processor arch:           aarch64
------------------------------------------------------------

Base for GUI application.
torizon/wayland-base-vivante 3

Weston for GUI on torizon os
torizon/weston-vivante 3

The idea was to keep the application in separate container than weston-vivante and to leverage GPU for opencv and opengl based processing.

FYR, This is how I created two containers.

Weston-vivante :

 docker run -e ACCEPT_FSL_EULA=1 -d --rm --name=weston --net=host --cap-add CAP_SYS_TTY_CONFIG   \

             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \

             --device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' \

             --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \

             torizon/weston-vivante:$CT_TAG_WESTON_VIVANTE --developer  \

            --tty=/dev/tty7

Application container based on wayland-vivante with little customization in my dockerfile:

FROM torizon/wayland-base-vivante:3

# Set the Wayland display variable
ENV WAYLAND_DISPLAY=wayland-0
ENV XDG_RUNTIME_DIR=/tmp/1000-runtime-dir

# Create the runtime directory with appropriate permissions
RUN mkdir -p /tmp/1000-runtime-dir && chmod 0700 /tmp/1000-runtime-dir

Start command for container:

docker run -it --rm --name qt-vivante \
    --device=/dev/galcore \
    --device=/dev/dri \
    -v /tmp:/tmp \
    -v /dev:/dev \
    -e WAYLAND_DISPLAY=wayland-0 \
    -e XDG_RUNTIME_DIR=/tmp/1000-runtime-dir \
    torizon/wayland-base-vivante:custom

Let me know if my missing out something. Is this the standard way to use GPU for GUI application which required opencv hardware acceleration ?

Regards
Samarth

jeremias.tx · December 3, 2024, 7:15pm

After looking into this I have the following findings. Your application is complaining about:

libEGL warning: MESA-LOADER: failed to open vivante: /usr/lib/dri/vivante_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/aarch64-linux-gnu/dri:$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)

The issue is that Vivante is a vendor specific GPU interface from NXP themselves. What this means is that Vivante doesn’t provide any DRI shared library file, which is what your framework is looking for here.

We don’t typically see this with other graphics frameworks because most frameworks will use higher level abstractions that will then use the proprietary mechanisms. In your case here the graphics framework you are using seems to be trying to use the lower software layers directly, which Vivante doesn’t provide.

I don’t have any further suggestions since this seems specific to how the wx framework operates.

Best Regards,
Jeremias

Sudharshan · December 6, 2024, 1:24am

I see. But then how can I make hardware accelerated GUI applications?
I’m trying with torizon/qt6-wayland-examples-vivante:3 container.
So weston+qt-vivante containers, and I ran the openglcube and other example apps mentioned here. On checking with the gputop tool(I created qt container with --privilege). I don’t see any application attached to GPU.

Clients attached to GPU | 0 / 6  (sample_mode: TIME - 1.0 secs)
Galcore version:6.4.3.398061, gpuperfcnt:1b1261761bbe, 1.4
3D:GC7000,Rev:6204 Core: 1000 MHz, Shader: 1000 MHz
3D:GC8000,Rev:8002 Core: 1000 MHz, Shader: 1000 MHz 2D:GC520,Rev:5341
3D Cores:2,2D Cores:1,VG Cores:0

IMX8_DDR0: axid-read:10.15,axid-write:2.19
IMX8_DDR1:

     PID   RES(kB)   CONT(kB)   VIRT(kB)  Non-PGD(kB)  Total(kB)              CMD
      31     28118          0          0            0      28118             cube
     208     18490          0          0            0      18490             cube

TOT:         46608          0          0            0      46608

So,

1)Is the vivante GPU really being utilized with the sample openGL apps that toradex is providing? If yes then, how can I develop similar applications? I tried building a simple QT6 app with openGL ES support on the qt container itself but ran into build issue because of the qt6 apt package not supporting openGL. Probably have to build QT from source. I also tried to use openGL libs directly but no luck. Which is the most preferred framework for using graphics acceleration in torizon stack?
2) In continuation to point 1, I ultimately want to showcase the image processing capability of iMX8MP leveraging its GPU because I see almost 80% CPU utilization running face detection on a rtsp stream running opencv haarcascade face detection by modifying example given by toradex here. This was with native opencv application. When I integrate this with wxWidget it basically not usable because of very high CPU usage by my application.
3) I also tried building the tsflite-rtsp container to check but running into multiple build errors. The sample was updated 2 years ago. Can you confirm that the bookworm version is tested?

I don’t want to go with a custom OS build as I’m not aware of that as well my timeline doesn’t allow.
How can I use the toradex torizon stack(dev env, sdk, libraries, packages) to create an application that can fully utilize the hardware on iMX8MP. Since my main application uses wxWidgets that’s the first choice of framework for me. I thought of using wxCanvas library which supports openGL in my framework and probably use opencv with openCL/vulkan/gstreamer so that my application runs not only on CPU but no luck so far.

Best regards
Samarth

jeremias.tx · December 6, 2024, 8:38pm

Is the vivante GPU really being utilized with the sample openGL apps that toradex is providing?

I’m confused what you shared clearly shows the processes for the cube example in gputop. I can see the same on my side, also top shows these processes barely utilize any CPU usage. It seems fairly clear they are utilizing the GPU then.

I tried building a simple QT6 app with openGL ES support on the qt container itself but ran into build issue because of the qt6 apt package not supporting openGL.

Not sure what you did but basic Qt6 apps should be using the GPU as shown.

Which is the most preferred framework for using graphics acceleration in torizon stack?

We have various graphical framework templates in our VSCode extension: GitHub - torizon/vscode-torizon-templates: VS Code Torizon Integrated Development Environment Templates

Any of these should work. Any other graphical framework we can’t guarantee since it’s not feasible for us to test/try every framework out there.

The sample was updated 2 years ago. Can you confirm that the bookworm version is tested?

Are you sure you’re using the bookworm version? From what i see this was last updated 6 months ago. These samples are not formally tested and are provided as is. Any issues are examined as reported.

Best Regards,
Jeremias

Sudharshan · December 9, 2024, 7:20am

Okay I was referring to the “clients associated to GPU” in the gputop output. By default any QT application will use GPU? I believe has to be build with GL support. The libs for which cannot be found on toradex debian repos.

I came across few Toradex Pages about using Gstreamer/ Opencv in this context.

From this page:
OpenCV provides modules for Machine Learning and Neural Networks. These modules target Arm Cortex-A cores and utilize Arm Neon to process models. However, until this moment, these modules are not accelerated by GPU. You can use other inference engines in conjunction with OpenCV to increase AI processing performance. See other AI solutions that Toradex offers through it network of partner.

Which other inference engines can be used? I don’t want to approach toradex partners at the moment.

And basically I cannot use GPU acceleration as of now? I was going through few articles to integrate opencv+openGL but from what I’ve read so far is haar cascade models will use only CPU. Has anyone from toradex tried using opencv with openGL?

For using Gstreamer, I cloned the branch bookworm-new from GitHub - toradex/torizon-samples: All sample code related to TorizonCore project..
I’m using imx8mp so kept the MACHINE argument unchanged in Dockerfile.

On building the Dockerfile I’m getting below mentioned issues,

[+] Building 27.9s (13/29)                                                                                                         docker:default
 => [internal] load build definition from Dockerfile                                                                                         0.0s
 => => transferring dockerfile: 8.19kB                                                                                                       0.0s
 => WARN: StageNameCasing: Stage name 'Base' should be lowercase (line 22)                                                                   0.0s
 => [internal] load metadata for docker.io/torizon/wayland-base-vivante:3                                                                    2.9s
 => [internal] load .dockerignore                                                                                                            0.0s
 => => transferring context: 56B                                                                                                             0.0s
 => [internal] load build context                                                                                                            0.0s
 => => transferring context: 7.24kB                                                                                                          0.0s
 => CACHED [base  1/17] FROM docker.io/torizon/wayland-base-vivante:3@sha256:4ec6af586a7f2b955f116cbdcc454f253732007736017a28e9d67b4a4e0a7e  0.0s
 => CACHED [base  2/17] RUN wget -O- https://feeds.toradex.com/debian/toradex-debian-repo.gpg     | gpg --dearmor     | tee /usr/share/keyr  0.0s
 => CACHED [base  3/17] RUN echo "\nTypes: deb\nURIs: https://feeds.toradex.com/debian/snapshots/20230328T143626Z\nSuites: testing\nCompone  0.0s
 => CACHED [base  4/17] RUN echo "\nTypes: deb\nURIs: http://snapshot.debian.org/archive/debian/20220209T031931Z\nSuites: bookworm\nCompone  0.0s
 => CACHED [base  5/17] RUN cat /etc/apt/sources.list.d/toradex.sources                                                                      0.0s
 => CACHED [base  6/17] RUN echo "Acquire::Check-Valid-Until false;"     | tee -a /etc/apt/apt.conf.d/10-nocheckvalid                        0.0s
 => CACHED [base  7/17] COPY version-pinned-packages /etc/apt/preferences.d/version-pinned-packages                                          0.0s
 => CACHED [base  8/17] RUN apt-get update && apt-get install -y     imx-gpu-viv-wayland-dev     libg2d-viv-dev     linux-imx-headers-dev    0.0s
 => ERROR [base  9/17] RUN apt-get update && apt-get install -y     libdrm-dev     libgstreamer1.0-dev     libgstreamer-plugins-base1.0-de  24.8s
------
 > [base  9/17] RUN apt-get update && apt-get install -y     libdrm-dev     libgstreamer1.0-dev     libgstreamer-plugins-base1.0-dev     libgstreamer-plugins-bad1.0-dev     && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*:
1.481 Get:2 http://deb.debian.org/debian bookworm InRelease [151 kB]
1.501 Get:1 http://snapshot.debian.org/archive/debian/20220209T031931Z bookworm InRelease [129 kB]
1.591 Get:3 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
1.591 Get:4 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
2.057 Get:5 https://feeds.toradex.com/debian/snapshots/20230328T143626Z testing InRelease [13.0 kB]
2.592 Get:6 https://feeds1.toradex.com/debian/snapshots/20240227T131454Z testing InRelease [15.1 kB]
2.699 Get:7 http://deb.debian.org/debian bookworm/main arm64 Packages [8688 kB]
3.708 Get:8 http://deb.debian.org/debian bookworm-updates/main arm64 Packages [8844 B]
4.664 Get:9 http://deb.debian.org/debian-security bookworm-security/main arm64 Packages [211 kB]
5.686 Get:10 http://snapshot.debian.org/archive/debian/20220209T031931Z bookworm/main arm64 Packages [8180 kB]
6.790 Get:11 https://feeds.toradex.com/debian/snapshots/20230328T143626Z testing/non-free arm64 Packages [5276 B]
6.909 Get:12 https://feeds.toradex.com/debian/snapshots/20230328T143626Z testing/main arm64 Packages [63.3 kB]
8.841 Get:13 https://feeds1.toradex.com/debian/snapshots/20240227T131454Z testing/non-free arm64 Packages [5573 B]
9.106 Get:14 https://feeds1.toradex.com/debian/snapshots/20240227T131454Z testing/main arm64 Packages [59.9 kB]
9.978 Fetched 17.6 MB in 9s (1967 kB/s)
9.978 Reading package lists...
16.42 Reading package lists...
22.74 Building dependency tree...
23.57 Reading state information...
24.15 Some packages could not be installed. This may mean that you have
24.15 requested an impossible situation or if you are using the unstable
24.15 distribution that some required packages have not yet been created
24.15 or been moved out of Incoming.
24.15 The following information may help to resolve the situation:
24.15
24.15 The following packages have unmet dependencies:
24.68  gir1.2-gst-plugins-bad-1.0 : Depends: gir1.2-gst-plugins-base-1.0 (>= 1.22.0) but 1.20.0-2+toradex1 is to be installed
24.68  libgstreamer-opencv1.0-0 : Depends: libgstreamer-plugins-base1.0-0 (>= 1.22.0) but 1.20.0-2+toradex1 is to be installed
24.68  libgstreamer-plugins-bad1.0-0 : Depends: libgstreamer-gl1.0-0 (>= 1.22.0) but 1.20.0-2+toradex1 is to be installed
24.68                                  Depends: libgstreamer-plugins-base1.0-0 (>= 1.22.0) but 1.20.0-2+toradex1 is to be installed
24.69 E: Unable to correct problems, you have held broken packages.
------
Dockerfile:68
--------------------
  67 |
  68 | >>> RUN apt-get update && apt-get install -y \
  69 | >>>     libdrm-dev \
  70 | >>>     libgstreamer1.0-dev \
  71 | >>>     libgstreamer-plugins-base1.0-dev \
  72 | >>>     libgstreamer-plugins-bad1.0-dev \
  73 | >>>     && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*
  74 |
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get update && apt-get install -y     libdrm-dev     libgstreamer1.0-dev     libgstreamer-plugins-base1.0-dev     libgstreamer-plugins-bad1.0-dev     && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 100

This sample was updated 19 days ago on the repo. I was expecting this to run without any issues. Please correct me if I’m missing out something.

Best Regards,
Samarth

jeremias.tx · December 11, 2024, 10:51pm

By default any QT application will use GPU?

If you use the Qt packages that we package into our container images then these should be built to utilize the GPU.

Which other inference engines can be used? I don’t want to approach toradex partners at the moment.

These are all of our articles on machine learning: Machine Learning Documentation Overview | Toradex Developer Center

Of course not all of these are necessarily hardware accelerated and some are just more demos. Anything more advance than what we have documented is not really the expertise of Toradex.

And basically I cannot use GPU acceleration as of now?

Last I checked our tensorflow-lite example should be capable of using the GPU/NPU.

Has anyone from toradex tried using opencv with openGL?

I checked around but the only thing we have is what we have documented on our website.

This sample was updated 19 days ago on the repo. I was expecting this to run without any issues. Please correct me if I’m missing out something.

Could you try modifying the Dockerfile to have ARG BASE_VERSION=3.3. This seems similar to another known issue.

Best Regards,
Jeremias

Sudharshan · January 14, 2025, 8:11am

Hi @jeremias.tx,

Firstly I tried building OpenCV with openGL/openCL headers provided by NXP from imx-wayland-dev package, but the CMake refuses to accept the openCL headers path provided by NXP. I;m able to use openCL in an independent terminal based code but trying to use it with openCV.

Secondly I tried modifying the tflite/rtsp source to accept a rtsp feed as a source rather than a usb camera, I’m able to read the rtsp feed as source in an independent code, but when I tried to read the same feed in tflite/rtsp python file gstreamer fails to start stream. Also the tflite/rtsp takes around and hour and half to build the image after src modification, how can I speed up this process, I don’t want to build inference model every time as I don’t need any change in that. Just the src object_detection.py file needs modification.

Regards
Samarth

jeremias.tx · January 14, 2025, 11:22pm

Firstly I tried building OpenCV with openGL/openCL headers

I don’t think you need to rebuild this yourself. We should already provide an opencv package that has openCL support built in. I built a container containing opencv using the following example as a template: torizon-samples/opencv/Dockerfile at bookworm · toradex/torizon-samples · GitHub

Than I ran the container and used python to confirm existence of opencl support:

docker run --privileged -v /dev:/dev --rm -it test /bin/bash
root@6916ea895137:/home/torizon# python3
Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> print(cv2.ocl.haveOpenCL())
True

Also the tflite/rtsp takes around and hour and half to build the image after src modification, how can I speed up this process, I don’t want to build inference model every time as I don’t need any change in that. Just the src object_detection.py file needs modification.

The Dockerfile will rebuild every line starting from the modification onwards. So if you move the python file to the end of the Dockerfile it shouldn’t need to run the prior steps. If your docker build isn’t caching the steps properly it might be some configuration issue that is preventing the caching mechanism for the build.

Best Regards,
Jeremias

Sudharshan · January 28, 2025, 12:44am

Hi @jeremias.tx ,

I’ve used the vpu-generic sample available under bookworm-new branch to run the gstreamer pipeline for showing rtsp video feeds on my displays connected on iMX8MP.

gst-launch-1.0 rtspsrc location=rtsp://user:pwd@myIP/axis-media/media.amp?camera=2 latency=0 ! rtph264depay ! h264parse ! vpudec ! waylandsink

I tried with running 5 simultaneous streams this way and since the VPU dec was working, the CPU usage was barely going over 15-20% showing the feeds without any issue.

Now I tried to open a single stream using openCV using the gstreamer pipeline mentioned above, and my CPU usage is going over 40%.

import cv2
gst_pipeline = (
    "rtspsrc location=rtsp://usr:pwd@my_ip:554/axis-media/media.amp?camera=2 latency=0 ! "
    "rtph264depay ! h264parse ! vpudec ! "
    "videoconvert ! appsink"
)
cap = cv2.VideoCapture(gst_pipeline, cv2.CAP_GSTREAMER)
while True:
    ret, frame = cap.read()
    if not ret:
        print("Failed to grab a frame. Exiting...")
        break
    cv2.imshow('VPU Decoded Frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        print("Exiting on user request...")
        break
cap.release()
cv2.destroyAllWindows()

Is that expected or I’m missing out something? Just wanted to show multiple feeds using openCV keeping the CPU usage under control. Is it possible or, opencv will process frames through CPU only leading to high CPU usage? What could be high level framework to properly utilize the HW acceleration to show video feed on iMX8MP?

Regards
Samarth

jeremias.tx · January 28, 2025, 6:05pm

On Torizon OS 6 containers we are making use of the openCV fork and libraries from NXP. It’s hard to say what the “expectation” is since we don’t have any official benchmarks. Also CPU usage is very use-case dependent and we can’t reasonably test every use-case available. Meaning it’s difficult for us to give you an answer on what is “expected”.

Also considering as I said earlier the openCV is coming from an NXP fork, so it’s hard to comment on what we are receiving from them.

Best Regards,
Jeremias