Verdin iMX8MP + Qt6 Performance Issues

Hi, I’m attempting to run a Qt6 application on a Verdin iMX8M Plus module with a custom-built Yocto image, but I’m running into some performance issues.

Currently, I’m only achieving roughly 18-25 FPS instead of the target 30+ FPS minimum (60FPS would be much more ideal but seems less feasible at this point), with all UI elements enabled and in a running state.

Hardware:

  • Verdin iMX8M Plus 2GB v1.1 (on a Mallow carrier board)

  • 1920x720@60Hz display via HDMI

Software:

  • TorizonCore/TDX Wayland 7.4.0

  • Qt 6.7.3 (custom ARM64 cross-compile with Wayland support)

  • Qt Quick application with OpenGL ES rendering

  • EGLFS backend with KMS/GBM integration

Root cause analysis:

  1. Rendering performance:

    • Scene graph rendering: 3-6ms

    • GPU utilization: ~10% on GC7000UL

  2. Buffer swap

    • eglSwapBuffers() blocking for 2 vsync periods (31-35ms)

    • At 60Hz: 1 vsync = 16.67ms, so 2 vsyncs = 33.33ms

    • Total frame time: render (4ms) + swap (32ms) = 36-40ms (~20 FPS)

  3. Buffer Allocation: 3 buffers available

    • Verified via /sys/kernel/debug/dri/1/framebuffer:

      • framebuffer[48], [43], [46] allocated by QSGRenderThread

      • All have refcount > 0

    • This disproves initial theory that driver only allocates 2 buffers

  4. Swap synchronization policy

    • Despite having 3 buffers, eglSwapBuffers() waits for previous frame scanout to complete before queueing next page flip

    • Should be: Render → Queue flip immediately → Return (non-blocking)

    • Actually: Render → Wait for previous flip to complete → Queue flip → Return

    • This artificial serialization wastes 1 vsync period per frame

  5. GPU utilization:

    • Noticed only the 3D GPU (GC7000UL) is only being utilized, 2D GPU (GC520L) is sitting idle

Current Timing Breakdown (from Qt logs):

qt.scenegraph.time.renderloop: frame rendered in 36-42ms
sync=0-1ms
render=3-6ms
swap=31-35ms

What we’ve tried:

Environmental Variables (no effect):

QT_QPA_EGLFS_SWAPINTERVAL=0
QT_QPA_EGLFS_FORCEVSYNC=0
QSG_RENDER_LOOP=threaded
QSG_RHI_BACKEND=opengl

Kernel-Level Modifications (made it worse):

  • Applied DRM vblank disable patches to kernel

  • Result: FPS decreased from 20-22 to 18-21 FPS

  • Conclusion: Qt’s swap path (Mesa EGL → GBM → drmModePageFlip) bypasses kernel vblank helpers

Mesa GBM Patching (incompatible):

  • Attempted to patch Mesa GBM for triple-buffering

  • Discovered device uses proprietary NXP imx-gpu-viv driver, not open-source Mesa

  • Patches cannot apply

Qt Wayland Rebuild (marginal gain, critical issues):

  • Rebuilt Qt 6.7.3 with full Wayland support (9848 tasks, ~2 hours)

  • Deployed to device with Weston 12.0.4 compositor

  • Tested configurations:

    • Weston GL renderer (gl-renderer.so):

      • 23-24 FPS (only 3-4 FPS improvement)

      • Boot animation broken (video not displaying)

      • Gradients broken (accordion effect)

      • OOM crashes within 1-2 minutes (2GB RAM insufficient)

    • Weston G2D renderer (g2d-renderer.so):

      • Black screen, nothing displays

      • Swap times 165-275ms

      • G2D glitter cannot composite Qt OpenGL ES surfaces (needs simple 2D buffers, not GPU textures)

  • Conclusion: Wayland adds complexity/overhead without solving core swap synchronization issue

UI Optimization:

  • Replaced QtQuick.Shapes SVG rendering with pre-rendered PNG images, simplified effects, removed unnecessary background rendering and FBOs

  • Slight improvement (+4-8 FPS)

  • Only other way to get past 30 FPS is by disabling most UI visual components entirely, rendering the application (and entire project) useless

  • I understand that my app may be a bit visually heavy, but I was under the impression that this module could handle what I was going to be throwing at it, and I feel like if the only way to achieve adequate performance as it currently is by eliminating half of my UI, then it’s worth checking to see if the custom build was actually done correctly and ensure we’re taking advantage of all available resources before scrapping UI features/visuals or upgrading hardware.

What we need:

  1. Is there a Qt configuration or driver parameter to make eglSwapBuffers() non-blocking when 3 buffers are available?

  2. Can the NXP imx-gpu-viv driver be configured to allow queueing page flips without waiting for previous scanout completion?

  3. Is there a Qt patch or environment variable to use DRM_MODE_PAGE_FLIP_ASYNC or equivalent non-blocking page flip mode?

  4. Should we patch Qt’s EGLFS KMS backend (qeglfskmsgmbscreen.cpp) to modify flip synchronization behavior? If so, what specific changes are recommended?

  5. Is this a known limitation of the GC7000UL/imx-gpu-viv stack with Qt EGLFS, and if so, are there any workarounds?

  6. With 3 buffers allocated, why is Qt/EGL/driver forcing a 2-vsync wait on swap instead of allowing immediate page flip queueing for smooth triple-buffered rendering?

  7. Is there something we’re missing that’s causing the 2D GPU to be sitting idle, when it could be utilized for additional rendering tasks?

  8. Should we be going an entirely different route for this build altogether (hardware, Torizon/Wayland/Qt versions, build recipe, config, etc.)?

Hi, I’m the Toradex AI Assistant.

I might make mistakes or occasionally get you wrong. Our support team will step in if needed.


Qt6 Performance Optimization on Verdin iMX8M Plus

Based on your analysis, you’ve identified the core issue: eglSwapBuffers() blocking for 2 vsync periods despite having 3 buffers available. This is a known limitation with the Qt EGLFS backend on iMX8M Plus hardware.

Hardware Configuration Validation

Your hardware setup is appropriate for Qt graphics acceleration (1). The Verdin iMX8M Plus supports OpenGL ES acceleration through the Wayland graphics backend (2). However, there are some important considerations for your specific use case.

Graphics Backend Recommendations

According to Toradex documentation, Wayland is the preferred graphics backend for iMX 8 Series modules (2). The documentation states:

Wayland: Supported on i.MX 8 Series modules as well as BSPs starting from 5.0. You should prefer to use Wayland whenever possible, but if X support is absolutely required, you may evaluate using the XWayland compatibility layer (2).

Supported GUI Frameworks

Toradex validates several GUI frameworks that support OpenGL ES acceleration (2):

  • Qt - Specifically documented as supported
  • Crank Storyboard
  • DiSTI GL Studio

Performance Benchmarking Context

For reference, Toradex validates OpenGL ES performance using glmark2 benchmarks. On Verdin iMX8M Plus, the documented performance is (2):

  • Test conditions: BSP 5.2.0, HDMI screen 1920x1080
  • glmark2 OpenGL ES 2, Wayland score: 953 (2)

Addressing Your Specific Questions

1. Buffer Management and Swap Synchronization

The issue you’re experiencing with eglSwapBuffers() blocking for 2 vsync periods is likely related to the interaction between Qt’s EGLFS backend and the NXP imx-gpu-viv driver. This is a driver-level synchronization behavior that cannot be easily modified through Qt environment variables.

2. 2D GPU Utilization

You mentioned the 2D GPU (GC520L) sitting idle. The iMX8M Plus has separate 2D and 3D GPU cores, but Qt’s OpenGL ES rendering pipeline primarily utilizes the 3D GPU. The 2D GPU is typically used for different rendering paths that Qt doesn’t automatically leverage.

3. Alternative Approaches

Given your performance requirements and the limitations you’ve encountered, consider these approaches:

Option 1: Optimize Qt Application Architecture

  • Profile your Qt Quick application to identify specific bottlenecks
  • Consider using Qt’s threaded render loop more effectively
  • Implement level-of-detail (LOD) rendering for complex UI elements

Option 2: Custom Yocto Build Optimization

If you need to customize your Yocto build further, you can create custom recipes (3). The documentation shows how to:

$ cd ../layers/meta-customer/
$ mkdir -p recipes-kernel/linux/linux-toradex
$ touch recipes-kernel/linux/linux-toradex/linux-toradex%.bbappend

(3)

Option 3: Hardware Considerations

Consider upgrading to a higher-memory variant if memory pressure is contributing to performance issues. The Verdin iMX8M Plus is available in configurations up to 8GB (1).

Recommendations

  1. Stick with Wayland: Despite the challenges you experienced, Wayland remains the recommended graphics backend for iMX8M Plus (2)

  2. Profile Application-Level Performance: Focus on Qt Quick application optimization rather than low-level driver modifications

  3. Consider Memory Upgrade: The 2GB variant may be limiting performance for complex Qt applications

  4. Validate Against Reference: Compare your performance against Toradex’s validated Qt framework implementation (2)

The core issue appears to be the interaction between Qt’s rendering pipeline and the NXP GPU driver’s synchronization behavior, which is not easily configurable through standard Qt or driver parameters.

Best regards, Toradex AI Assistant.


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.

Hi! Considering you went all the way here… eglSwapBuffers() is only called in two pieces of code within Qt (I’m looking at 6.9.2 tough):

qtbase’s qt6-base-6.9.2+dfsg/src/gui/opengl/platform/egl/qeglplatformcontext.cpp

qtwayland’s qt6-wayland-6.9.2/src/hardwareintegration/client/wayland-egl/qwaylandglcontext.cpp and also mentioned in qt6-wayland-6.9.2/src/client/qwaylandwindow.cpp:1728

Now Qt Wayland’s code has this interesting snippet:

int swapInterval = m_supportNonBlockingSwap ? 0 : format().swapInterval();
eglSwapInterval(eglDisplay(), swapInterval);
if (swapInterval == 0 && format().swapInterval() > 0) {
    // Emulating a blocking swap
    glFlush(); // Flush before waiting so we can swap more quickly when the frame event arrives
    window->waitForFrameSync(100);
}

In order to use this you would need to use Qt Wayland’s composer instead of Weston, which kind of makes sense: Weston is only a demo composer, which might be enough for many situations, but not really something you would expect high efficiency from. Of course you will need to set m_supportNonBlockingSwap and format().swapInterval() accordingly to your needs.

I also understand that the GPU’s drivers expect you to use Wayland… but don’t take me for granted there.

Hope that helps, Lisandro.