How to use GPU Acceleration with AM62

Hello,

We are trying to use a Verdin AM62 along with a Mallow Cardboard to run our application smoothly, however it is till now having poor performance. To improve them, we changed from using the chromium-am62 image to the cog-am62 image, and we are trying to make the GPU acceleration work. However, it is causing us trouble as we have the following error in the cog logs :

libEGL warning: egl: failed to create dri2 screen
MESA: error: ZINK: vkCreateInstance failed (VK_ERROR_INCOMPATIBLE_DRIVER)

Looking further the weston also seem to have trouble creating a correct EGL context. Weirdly when using the torizon/weston-am62:stable-rc image (or version 4), it fails on some point and logs :

failed to query DRM device from EGL
...
GL version: OpenGL ES 3.2 Mesa 24.0.1
GLSL version: OpenGL ES GLSL ES 3.20
GL vendor: Mesa
GL renderer: llvmpipe (LLVM 15.0.6, 128 bits)
...
libEGL warning: egl: failed to create dri2 screen
Refusing to try glamor on llvmpipe

However using a specific version of weston (torizon/weston-am62@sha256:ded14c38171722dd327ed8cea44310ad793e9e263451729701a99540fae27880), it seems to work better as the renderer is now :

GL version: OpenGL ES 3.1 build 23.3@6512818
GLSL version: OpenGL ES GLSL ES 3.10 build 23.3@6512818
GL vendor: Imagination Technologies
GL renderer: PowerVR A-Series AXE-1-16M
GL extensions: GL_APPLE_texture_format_BGRA8888

So, what could explain the difference of renderer and therefore of GL between those versions ?

And, admitting i stay with the specific version of weston, what should be done to make the cog work with GPU and get better performances (The fishtank demo is currently running at 2 fps) ?

Here is the example of the docker-compose I am using :

weston:
    image: torizon/weston-am62@sha256:ded14c38171722dd327ed8cea44310ad793e9e263451729701a99540fae27880
    cap_add:
      - CAP_SYS_TTY_CONFIG
    command:
      - --tty=/dev/tty7
    device_cgroup_rules:
      - c 4:* rmw
      - c 13:* rmw
      - c 226:* rmw
    network_mode: host
    restart: always
    platform: linux/arm64
    volumes:
      - /tmp:/tmp
      - /dev:/dev
      - /run/udev:/run/udev
      - /var/sota/storage/docker-compose/weston:/etc/xdg/weston/
  cog:
    image: torizon/cog-am62:stable-rc
    command:
      - https://webglsamples.org/fishtank/fishtank.html
    depends_on:
      - weston
    device_cgroup_rules:
      - c 226:* rmw
    platform: linux/arm64
    restart: always
    security_opt:
      - seccomp:unconfined
    shm_size: 256mb
    environment:
      - COG_PLATFORM_WL_VIEW_FULLSCREEN=1
    volumes:
      - /tmp:/tmp
      - /var/run/dbus:/var/run/dbus
      - /dev/dri:/dev/dri

and the current weston.ini :

[core]
idle-time=0
require-input=false
xwayland=false
modules=screen-share.so

[shell]
background-image=
background-color=0x00000000
panel-position=none
locking=false
allow-zap=false
num-workspaces=1

[keyboard]
vt-switching=false

[output]
name=LVDS-1
transform=rotate-270

#For multi-display support and other configurations, please read the article: https://developer.toradex.com/knowledge-base/working-with-weston-on-torizoncore

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Hello,

Thank you for your detailed description and configurations. Let’s break down the situation and steps to achieve hardware-accelerated performance on the Verdin AM62 with Torizon:

Renderer Difference Explained

  • llvmpipe Renderer: When you see GL renderer: llvmpipe, it means Mesa’s software rasterizer is being used—no GPU acceleration, resulting in poor performance.
  • PowerVR Renderer: When you see GL renderer: PowerVR A-Series AXE-1-16M, hardware acceleration is active. This confirms GPU drivers and configurations are correctly applied.

The difference occurs because only some Weston (and container) images initialize the GPU stack correctly. Subsequent containers (like cog) must also be properly configured to access the GPU.

Checklist for Hardware GPU Acceleration

1. Environment Variables

The Verdin AM62’s GPU uses binaries under an EULA. When starting the Weston container, ensure the following environment variable is set:

environment:
  - ACCEPT_FSL_EULA=1

This is critical for loading the GPU’s binary drivers.

2. Use Proper Container Images

Ensure you are using the latest torizon/weston-am62 and torizon/cog-am62 images. Some images (like chromium-am62) may not support full GPU acceleration or may be missing vital drivers.

3. Device Access and Permissions

Both Weston and Cog must have access to /dev/dri and appropriate udev device permissions:

device_cgroup_rules:
  - c 226:* rmw

And mount:

volumes:
  - /dev/dri:/dev/dri

(You have these lines—good!)

4. Order and Dependencies

Start Weston before launching Cog, and consider adding a short delay or health check to ensure Weston is ready before Cog starts.

5. Verify With GL Strings

Check that both Weston and Cog report the PowerVR renderer in their logs, not llvmpipe.

Suggested Changes for Your Compose File

Here’s what I recommend based on your current setup:

  • Add ACCEPT_FSL_EULA to Weston and Cog:
environment:
  - ACCEPT_FSL_EULA=1
  • Use /dev/dri and udev expose for both containers
    (You already have this for Cog, include for Weston if missing.)

Example for Weston Service:

weston:
  image: torizon/weston-am62@sha256:ded14c38171722dd327ed8cea44310ad793e9e263451729701a99540fae27880
  cap_add:
    - CAP_SYS_TTY_CONFIG
  command: [--tty=/dev/tty7]
  device_cgroup_rules:
    - c 4:* rmw
    - c 13:* rmw
    - c 226:* rmw
  volumes:
    - /dev:/dev
    - /tmp:/tmp
    - /run/udev:/run/udev
    - /var/sota/storage/docker-compose/weston:/etc/xdg/weston/
    - /dev/dri:/dev/dri
  network_mode: host
  environment:
    - ACCEPT_FSL_EULA=1
  restart: always
  platform: linux/arm64

Example for Cog Service:

cog:
  image: torizon/cog-am62:stable-rc
  command: [https://webglsamples.org/fishtank/fishtank.html]
  depends_on:
    - weston
  device_cgroup_rules:
    - c 226:* rmw
  environment:
    - ACCEPT_FSL_EULA=1
    - COG_PLATFORM_WL_VIEW_FULLSCREEN=1
  volumes:
    - /tmp:/tmp
    - /var/run/dbus:/var/run/dbus
    - /dev/dri:/dev/dri
  platform: linux/arm64
  restart: always
  security_opt:
    - seccomp:unconfined
  shm_size: 256mb

Additional Considerations

  • Stable Image Tags: If you experience issues with the latest images, using the specific SHA-based image as you did is valid, but always try the latest first.
  • Performance Testing: Once you set everything, use glxinfo or review the cog and weston logs for a PowerVR renderer and test with demos like the Fishtank.

Reference Links


Apply these changes, then retest Cog performance. You should observe improved frame rates and hardware acceleration indicators in your logs.

Best regards,
Toradex AI Assistant.

@ToradexAI
From my understanding of your documentation, images version 3 corresponds to Torizon 6 and the version 4 to Torizon 7. I’ve tried to use the version stable of both weston and cog (which seem to be a tag of a version 3 therefore Torizon 6) and it seems to work, as I don’t have any EGL related error in both container, and my demo went from 2 fps to 9, which is an improvement.

The conclusion seem to be that the GPU acceleration works with Torizon 6 on AM62 using weston and cog containers, but doesn’t with latest images that use Torizon 7, is that it ?

Also, i’ve read that :

Just a note: Using ACCEPT_FSL_EULA = "1" has no effect for the Verdin AM62, only for modules using an NXP SoC.

Thanks for your response.

Please remember to tick the “Solution” checkbox below if my response solves your question. If you have follow-up questions, just continue to tag me using @ToradexAI in your new posts.

Did you like my answer? :+1: :-1:


Hello,

You are correct in your understanding:

  • Container images with major version 3 (e.g., torizon/weston-am62:3.x or :stable) are for Torizon OS 6.
  • Major version 4 images (e.g., torizon/weston-am62:4.x) are for Torizon OS 7.
    Reference: Platform-specific Torizon Debian Containers

Based on your testing and the current documentation:

  • GPU acceleration and browser performance on Verdin AM62 are currently more reliable with Torizon 6 (using version 3 containers).
  • In Torizon 7 (using version 4 containers), browser support and hardware acceleration for AM62 are still under development and may not yet be available or properly functional.
    Reference: Web Browser Kiosk Mode with TorizonCore

Regarding the environment variable:

  • You are correct: the ACCEPT_FSL_EULA=1 variable is necessary only for modules with NXP SoCs to accept the Freescale EULA and has no effect on Texas Instruments-based SoCs like the Verdin AM62.

Summary:

  • For now, stick with Torizon 6 (version 3 containers) for the Verdin AM62 if you require hardware GPU acceleration in browser-based applications.
  • Watch for future updates on Torizon 7 as hardware acceleration support for browsers is being actively developed for the AM62.

If you need help identifying specific container images that work best with your setup, let me know!

Best regards,
Toradex AI Assistant.

We still haven’t managed to achieve observing real-time GPU usage. It seems that the GPU (PVR) drivers are neither present on the board nor activated via overlays.

Before we start customizing Weston or OS configurations, we would like to understand what level of performance can realistically be expected running :

  • A Weston container,

  • A cog (or chromium) container,

  • A front-end container running our app (built with Vue 3 and Vite).

Is it feasible to achieve, for example, under 500 ms latency on user actions ? Or is this type of setup primarily designed for native applications (C++, .Net), making web apps run poorly?

For reference, I’m attaching an example docker-compose file we use for testing this setup, which currently results in poor performance.

Secondly, if acceptable performance is achievable, could you please advise on the best steps to optimize towards that goal?

services:
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.test
    container_name: frontend
    restart: always
    ports:
      - 80:80
    networks:
      - custom_net
  weston:
    image: torizon/weston-am62:3
    cap_add:
      - CAP_SYS_TTY_CONFIG
    command:
      - --tty=/dev/tty7
    device_cgroup_rules:
      - c 4:* rmw
      - c 13:* rmw
      - c 226:* rmw
    network_mode: host
    restart: always
    platform: linux/arm64
    volumes:
      - /tmp:/tmp
      - /dev:/dev
      - /run/udev:/run/udev
      - /var/sota/storage/docker-compose/weston:/etc/xdg/weston/
  cog:
    image: torizon/cog-am62:3
    command:
      - http://frontend:80
    depends_on:
      - weston
    device_cgroup_rules:
      - c 226:* rmw
    platform: linux/arm64
    restart: always
    security_opt:
      - seccomp:unconfined
    shm_size: 256mb
    volumes:
      - /tmp:/tmp
      - /var/run/dbus:/var/run/dbus
      - /dev/dri:/dev/dri
    networks:
      - custom_net

networks:
  custom_net:
    driver: bridge

@ToradexAI

Thanks for your response.

Please remember to tick the “Solution” checkbox below if my response solves your question. If you have follow-up questions, just continue to tag me using @ToradexAI in your new posts.

Did you like my answer? :+1: :-1:


Hello,

Your use case—serving a web app (Vue 3/Vite) via Weston and Cog/Chromium containers with GPU acceleration on the Verdin AM62—is feasible, but smooth performance depends on several key points.


Performance Expectations for Verdin AM62

The Verdin AM62 SoM features up to four Cortex-A53 cores at 1.4GHz and a PowerVR Rogue AXE-1-16M GPU (Verdin AM62 Product Page). The hardware is designed for interactive industrial and commercial applications—real-time web UIs are possible.

  • User action latency under 500ms is achievable for typical web UIs, provided GPU acceleration is active and the rest of the software stack is optimized.
  • For heavy or highly dynamic frontends (e.g., complex animations, WebGL), the performance could be limited by browser and GPU driver maturity on this platform.
  • Hardware-accelerated rendering is crucial; software rendering (llvmpipe) will give poor results.

Native C++/.Net apps typically perform better due to lower overhead and more direct use of hardware acceleration, but well-optimized browser apps can still provide responsive UIs.


Key Issues in Your Current Setup

  • GPU Driver Status: If the logs or glxinfo indicate llvmpipe or software rendering, GPU acceleration is not active, and poor performance is expected.
  • GPU Drivers/Weston: The container images you use must include the correct PVR (PowerVR) userspace libraries and the kernel must have the corresponding kernel module loaded. On TorizonCore 6/containers version 3, this is known to work for the AM62. On TorizonCore 7, PowerVR driver and browser integration is still in development.
  • Networking: The frontend and cog containers are using a custom Docker bridge network, while weston uses host networking. This could introduce minor network latency. For optimal performance, running all containers with network_mode: host is recommended when not using container-level port mapping.
  • Shared Memory: For web browsers with demanding UIs, you may need to further increase shm_size.

Optimization & Troubleshooting Steps

  1. Verify GPU Acceleration

    • Inside the cog container, check the renderer:
      printenv | grep -i gl
      # or run a browser demo and inspect logs for: GL renderer: PowerVR...
      
    • Also, examine Weston’s startup logs—look for ‘PowerVR’ or similar (not ‘llvmpipe’).
  2. Ensure PowerVR Drivers Are Active

    • PVR (PowerVR) userspace libraries must match your TorizonCore kernel and be present in the weston and cog containers.
    • On TorizonCore 6 (version 3 containers), the included torizon/weston-am62 and torizon/cog-am62 images should handle this as long as you don’t modify kernel, OS, or container.
    • Confirm no device tree overlay disables the GPU.
  3. Improve Network Performance

    • If possible, run cog and frontend with network_mode: host to eliminate Docker’s bridge overhead.
    • Example:
      network_mode: host
      
    • Then, point Cog to the correct URL (http://localhost:80 if frontend listens on all interfaces).
  4. Monitor System Resources

    • Use tools like htop, iotop on the host and docker stats for per-container resource usage.
    • Ensure you are not hitting memory or CPU bottlenecks.
  5. Container Resources

    • Increase shm_size if the browser complains about lack of shared memory. 512mb or more may help.
  6. Browser Image Choice

    • Cog (WebKit-based) is usually lighter/faster than Chromium for embedded use on lower-end ARM platforms.
    • For most performant UIs, use Cog unless you have strict requirements for Chromium-specific features.
    • Reference: Web Browser Kiosk Mode on TorizonCore

Example: Improved Compose Snippet

services:
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.test
    container_name: frontend
    restart: always
    ports:
      - 80:80
    network_mode: host # unified
  weston:
    image: torizon/weston-am62:3
    cap_add:
      - CAP_SYS_TTY_CONFIG
    command:
      - --tty=/dev/tty7
    device_cgroup_rules:
      - c 4:* rmw
      - c 13:* rmw
      - c 226:* rmw
    volumes:
      - /tmp:/tmp
      - /dev:/dev
      - /run/udev:/run/udev
      - /var/sota/storage/docker-compose/weston:/etc/xdg/weston/
    network_mode: host
    restart: always
    platform: linux/arm64
  cog:
    image: torizon/cog-am62:3
    command:
      - http://localhost:80
    depends_on:
      - weston
    device_cgroup_rules:
      - c 226:* rmw
    platform: linux/arm64
    restart: always
    security_opt:
      - seccomp:unconfined
    shm_size: 512mb
    volumes:
      - /tmp:/tmp
      - /var/run/dbus:/var/run/dbus
      - /dev/dri:/dev/dri
    network_mode: host # unified

Conclusion

  • Achieving sub-500ms UI latency with a GPU-accelerated browser is realistic on Verdin AM62 if GPU drivers are correctly activated and web app frontend is well-optimized.
  • If you still see poor performance with correct drivers, consider profiling the frontend app for heavy JavaScript or excessive DOM updates, and try lightweight web UI demos for baseline comparison.
  • For use cases requiring the absolute lowest latency (e.g., complex real-time visualization), native apps will always outperform browser-based setups.

More information:

Let me know if you need help with GPU driver activation or further performance profiling.

Best regards,
Toradex AI Assistant.