Hi, I’m attempting to run a Qt6 application on a Verdin iMX8M Plus module with a custom-built Yocto image, but I’m running into some performance issues.
Currently, I’m only achieving roughly 18-25 FPS instead of the target 30+ FPS minimum (60FPS would be much more ideal but seems less feasible at this point), with all UI elements enabled and in a running state.
Hardware:
-
Verdin iMX8M Plus 2GB v1.1 (on a Mallow carrier board)
-
1920x720@60Hz display via HDMI
Software:
-
TorizonCore/TDX Wayland 7.4.0
-
Qt 6.7.3 (custom ARM64 cross-compile with Wayland support)
-
Qt Quick application with OpenGL ES rendering
-
EGLFS backend with KMS/GBM integration
Root cause analysis:
-
Rendering performance:
-
Scene graph rendering: 3-6ms
-
GPU utilization: ~10% on GC7000UL
-
-
Buffer swap
-
eglSwapBuffers() blocking for 2 vsync periods (31-35ms)
-
At 60Hz: 1 vsync = 16.67ms, so 2 vsyncs = 33.33ms
-
Total frame time: render (4ms) + swap (32ms) = 36-40ms (~20 FPS)
-
-
Buffer Allocation: 3 buffers available
-
Verified via /sys/kernel/debug/dri/1/framebuffer:
-
framebuffer[48], [43], [46] allocated by QSGRenderThread
-
All have refcount > 0
-
-
This disproves initial theory that driver only allocates 2 buffers
-
-
Swap synchronization policy
-
Despite having 3 buffers, eglSwapBuffers() waits for previous frame scanout to complete before queueing next page flip
-
Should be: Render → Queue flip immediately → Return (non-blocking)
-
Actually: Render → Wait for previous flip to complete → Queue flip → Return
-
This artificial serialization wastes 1 vsync period per frame
-
-
GPU utilization:
- Noticed only the 3D GPU (GC7000UL) is only being utilized, 2D GPU (GC520L) is sitting idle
Current Timing Breakdown (from Qt logs):
qt.scenegraph.time.renderloop: frame rendered in 36-42ms
sync=0-1ms
render=3-6ms
swap=31-35ms
What we’ve tried:
Environmental Variables (no effect):
QT_QPA_EGLFS_SWAPINTERVAL=0
QT_QPA_EGLFS_FORCEVSYNC=0
QSG_RENDER_LOOP=threaded
QSG_RHI_BACKEND=opengl
Kernel-Level Modifications (made it worse):
-
Applied DRM vblank disable patches to kernel
-
Result: FPS decreased from 20-22 to 18-21 FPS
-
Conclusion: Qt’s swap path (Mesa EGL → GBM → drmModePageFlip) bypasses kernel vblank helpers
Mesa GBM Patching (incompatible):
-
Attempted to patch Mesa GBM for triple-buffering
-
Discovered device uses proprietary NXP imx-gpu-viv driver, not open-source Mesa
-
Patches cannot apply
Qt Wayland Rebuild (marginal gain, critical issues):
-
Rebuilt Qt 6.7.3 with full Wayland support (9848 tasks, ~2 hours)
-
Deployed to device with Weston 12.0.4 compositor
-
Tested configurations:
-
Weston GL renderer (gl-renderer.so):
-
23-24 FPS (only 3-4 FPS improvement)
-
Boot animation broken (video not displaying)
-
Gradients broken (accordion effect)
-
OOM crashes within 1-2 minutes (2GB RAM insufficient)
-
-
Weston G2D renderer (g2d-renderer.so):
-
Black screen, nothing displays
-
Swap times 165-275ms
-
G2D glitter cannot composite Qt OpenGL ES surfaces (needs simple 2D buffers, not GPU textures)
-
-
-
Conclusion: Wayland adds complexity/overhead without solving core swap synchronization issue
UI Optimization:
-
Replaced QtQuick.Shapes SVG rendering with pre-rendered PNG images, simplified effects, removed unnecessary background rendering and FBOs
-
Slight improvement (+4-8 FPS)
-
Only other way to get past 30 FPS is by disabling most UI visual components entirely, rendering the application (and entire project) useless
-
I understand that my app may be a bit visually heavy, but I was under the impression that this module could handle what I was going to be throwing at it, and I feel like if the only way to achieve adequate performance as it currently is by eliminating half of my UI, then it’s worth checking to see if the custom build was actually done correctly and ensure we’re taking advantage of all available resources before scrapping UI features/visuals or upgrading hardware.
What we need:
-
Is there a Qt configuration or driver parameter to make eglSwapBuffers() non-blocking when 3 buffers are available?
-
Can the NXP imx-gpu-viv driver be configured to allow queueing page flips without waiting for previous scanout completion?
-
Is there a Qt patch or environment variable to use DRM_MODE_PAGE_FLIP_ASYNC or equivalent non-blocking page flip mode?
-
Should we patch Qt’s EGLFS KMS backend (qeglfskmsgmbscreen.cpp) to modify flip synchronization behavior? If so, what specific changes are recommended?
-
Is this a known limitation of the GC7000UL/imx-gpu-viv stack with Qt EGLFS, and if so, are there any workarounds?
-
With 3 buffers allocated, why is Qt/EGL/driver forcing a 2-vsync wait on swap instead of allowing immediate page flip queueing for smooth triple-buffered rendering?
-
Is there something we’re missing that’s causing the 2D GPU to be sitting idle, when it could be utilized for additional rendering tasks?
-
Should we be going an entirely different route for this build altogether (hardware, Torizon/Wayland/Qt versions, build recipe, config, etc.)?