How to check if vivante GPU is being used in a container

Hi,

I am using the imx8mp module on a custom carrier board. Torizon OS is based on your 6.7.
As far as I know vivante GPU is enabled both in the SDK and debug containers.
However, when running the application I get only 16 fps which makes the rendering look slow. If using a Yocto image based on your BSP I get around 60 fps. Is this normal?

I have tried to investigate the GPU performance using gputop, based on the message threads mentioned here GPU performance in Chromium container on Torizon OS iMX8M-Plus, however I get the following errors:

Failed to mount debugfs. Make sure you are root!
Failed to mount debugfs!
Failed to init debugfs
Segmentation fault (core dumped)

Is there a way to reliably check that the GPU is active?

Regards,
direttore

Hi, I’m the Toradex AI Assistant. I’m in early beta, and I would appreciate your feedback.
The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Hello,

To verify if the Vivante GPU is being utilized within your container on the iMX8MP module with Torizon OS, you can follow these steps:

  1. Check for Vivante GPU Drivers in the Container: Ensure that the container has access to the Vivante GPU drivers. This can be done by checking the presence of /dev/dri or /dev/galcore inside the container. You can use the command ls /dev/dri or ls /dev/galcore within the container to verify their presence.

  2. Use Vivante GPU Tools: Torizon provides containers with tools optimized for the Vivante GPU. You can use the viv_samples (if available in your SDK or debug containers) to run GPU-specific tests and benchmarks. Running these samples will not only confirm the GPU’s activity but also give you an insight into its performance.

  3. Leverage Environment Variables: For applications utilizing the GPU, ensure that you have set the appropriate environment variables that many graphics applications require to use the GPU acceleration. For example, setting VIVANTE_GPU=1 or similar environment-specific variables that your application might require.

  4. Debugging with gputop: Regarding the issues with gputop, the errors you encountered suggest a lack of necessary permissions or missing configurations for debugfs. Ensure that your container is run with the necessary privileges and has access to the host’s system resources. For gputop to function correctly, it needs access to debugfs mounted on /sys/kernel/debug. You can mount this directory into your container by adding -v /sys/kernel/debug:/sys/kernel/debug to your docker run command. Additionally, ensure that your container runs in privileged mode or with the necessary capabilities added to access hardware performance counters.

  5. Comparing Performance with Yocto Images: If you observe a significant performance difference between the Torizon and Yocto-based images, ensure that both environments are configured similarly, especially regarding GPU drivers and acceleration libraries. Differences in driver versions, kernel configurations, or missing GPU acceleration libraries in the container could lead to performance discrepancies.

If after these steps, you still face issues or cannot confirm GPU utilization, it might be necessary to review the container’s configuration and ensure it’s correctly set up for GPU acceleration, including verifying the container’s base image and any Torizon-specific settings or optimizations for the Vivante GPU.

Best regards,
Toradex AI Assistant

Please, also check if these links can help you solve your problem:

How to use OpenCL 1.2 in iMX8 on Torizon | Toradex Developer Center.

After adding:

cap_add:
-	SYS_ADMIN
...
volumes:
-	type: bind
        source: /sys/kernel/debug
        target: /sys/kernel/debug

to the docker-compose.yml, I was able to start gtop and got the following information:

Clients attached to GPU | 0 / 6  (sample_mode: TIME - 1.0 secs)
Galcore version:6.4.3.398061, gpuperfcnt:1b1261761bbe, 1.4
3D:GC7000,Rev:6204 Core: 1000 MHz, Shader: 1000 MHz
3D:GC8000,Rev:8002 Core: 1000 MHz, Shader: 1000 MHz 2D:GC520,Rev:5341
3D Cores:2,2D Cores:1,VG Cores:0

IMX8_DDR0: axid-read:437.65,axid-write:276.59
IMX8_DDR1:

     PID   RES(kB)   CONT(kB)   VIRT(kB)  Non-PGD(kB)  Total(kB)              CMD
      39     53930          0          0            0      53930              app

TOT:         53930          0          0            0      53930
TOT_CON:         -          -          -            -     208213

What does this mean, is the GPU working at full capacity?

Hello @direttore,

Just to confirm, which application do you see this drop in framerate?
If you are using Chromium, I would recommend that you test it with GPU compositing disabled (--disable-gpu-compositing). We have seen performance improvements by changing this configuration.

In general, the performance in Torizon OS should be very similar to the Reference Multimedia Image, as the container has no significant overhead for most use cases.


Regarding being able to tell if the GPU is being used, there are a few different approaches that can be taken, such as:

  • Check the CPU usage, if it is too high, some sort of software rendering may be active
  • Use gputop, as you have setup, to see the GPU usage.

Please note that gputop has multiple pages and only one of them has the actual occupancy of the GPU. You can change the current page using the arrow keys.
I would also recommend that you test the system at idle first, to see what that looks like in gputop. The reason for this is that some earlier versions of gputop showed the GPU occupancy inverted, so when the GPU was idle it would show 100% load and when it was fully used it would show 0% load.

Best Regards,
Bruno

Hi Bruno,

I am running a Qt5 application in a qt5-wayland-vivante based container. The CPU load is low and balanced.
Is there a similar option such as the one you mentioned for Chromium (-disable-gpu-compositing) that I can use here?

Hello @direttore,

If using a Yocto image based on your BSP I get around 60 fps.

Are you running your application in our BSP or do you have this FPS on the default Qt application from our multimedia image?

Also, can you share your docker-compose file so can have a look?

Hi Rudhi,

In the BSP based image I am running our own Qt application.

Here is the docker-compose file:
docker-compose.yml (2.4 KB)

Hi @direttore
maybe this blog post is useful to understand what happens.
If I understood what you wrote:

  • you have a Qt application
  • if you run it on your Yocto build using Toradex BSP you get 60 fps
  • if you run the same application inside a container on Torizon OS you have 16 fps

Is this your situation?

Hi @vix

correct. Thank you for the link. I have run the glmark2 inside the application container with following results:

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Vivante Corporation
    GL_RENDERER:    Vivante GC7000UL
    GL_VERSION:     OpenGL ES 3.1 V6.4.3.p4.398061
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 192 FrameTime: 5.208 ms
[build] use-vbo=true: FPS: 238 FrameTime: 4.204 ms
[texture] texture-filter=nearest: FPS: 242 FrameTime: 4.148 ms
[texture] texture-filter=linear: FPS: 235 FrameTime: 4.263 ms
[texture] texture-filter=mipmap: FPS: 252 FrameTime: 3.980 ms
[shading] shading=gouraud: FPS: 280 FrameTime: 3.583 ms
[shading] shading=blinn-phong-inf: FPS: 268 FrameTime: 3.739 ms
[shading] shading=phong: FPS: 275 FrameTime: 3.647 ms
[shading] shading=cel: FPS: 266 FrameTime: 3.765 ms
[bump] bump-render=high-poly: FPS: 296 FrameTime: 3.379 ms
[bump] bump-render=normals: FPS: 245 FrameTime: 4.091 ms
[bump] bump-render=height: FPS: 286 FrameTime: 3.499 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 186 FrameTime: 5.379 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 98 FrameTime: 10.282 ms
[pulsar] light=false:quads=5:texture=false: FPS: 255 FrameTime: 3.932 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 102 FrameTime: 9.826 ms
[desktop] effect=shadow:windows=4: FPS: 183 FrameTime: 5.472 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 54 FrameTime: 18.672 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 55 FrameTime: 18.308 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 84 FrameTime: 12.015 ms
[ideas] speed=duration: FPS: 112 FrameTime: 8.963 ms
[jellyfish] <default>: FPS: 128 FrameTime: 7.828 ms
[terrain] <default>: FPS: 17 FrameTime: 61.359 ms
[shadow] <default>: FPS: 235 FrameTime: 4.258 ms
[refract] <default>: FPS: 44 FrameTime: 22.810 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 282 FrameTime: 3.554 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 142 FrameTime: 7.079 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 292 FrameTime: 3.432 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 228 FrameTime: 4.387 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 155 FrameTime: 6.484 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 222 FrameTime: 4.516 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 222 FrameTime: 4.523 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 156 FrameTime: 6.434 ms
=======================================================
                                  glmark2 Score: 190
=======================================================

Not sure if this is a good score though.

I’m not an expert, but I see good FPS results (greater than 60, for me, but the target depends on the application), except some items:

...
[terrain] <default>: FPS: 17 FrameTime: 61.359 ms
...
[refract] <default>: FPS: 44 FrameTime: 22.810 ms

Maybe these benchmarks use some specific GPU features.
What I learned in the past few months is that some of these features were buggy in NXP drivers (latest release of linux-imx should clontain the fixes - this is what NXP tech support told me).
But the problem is that NXP follows every release of Yocto (no matter if LTS or not-LTS), and so you should update Yocto too.
Maybe your Yocto build is nanbield or scarthgap and so it includes newer NXP linux-imx?
Torizon OS follows Yocto LTS only (I agree with this) and so Torizon OS 6 it’s kirkstone. For this reason it’s stuck on an old linux-imx (the latest release by NXP for kirkstone).

If you’re curious you can try the following benchmarks https://webglsamples.org/ with Yocto and Torizon OS. I would expect more or less the same FPS for some of them, and better Yocto performance for other (fishtank, lots-o-images).
Can you confirm this?

After some more testing and editing the docker-compose file I found the cause in ENABLE_VNC=1.
After removing this line the fps went from 16 to 63.

Thanks everyone for your support!