Gstreamer and DMABuf for GPU-VPU operations

Hi all,

I’m working on a Verdin iMX8M Plus board which is connected to a camera. I’m testing a Gstreamer pipeline to evaluate if we can do some image processing on the GPU side and then stream it. The pipeline is the following:

GST_GL_API=gles2 GST_GL_PLATFORM=egl GST_GL_WINDOW=surfaceless gst-launch-1.0 -v v4l2src device=/dev/video2 io-mode=dmabuf ! video/x-raw,format=NV12,width=1920,height=1080,framerate=24/1 ! glupload ! gleffects effect=identity ! gldownload ! v4l2h264enc ! fpsdisplaysink video-sink=fakesink sync=true text-overlay=false

I’m using identity just to place an operation there. The command works but I’m having a bottle neck between gldownload and v4l2h264enc as it seems gldownload is falling to SystemMemory. That is producing a drop in framerate to almost 10. Chances are I’m missing something important. If I remove the encoding part the framerate is basically 24.

I was considering moving to a newer version of the BSP but I don’t if that would help.

Any help or suggestions will be highly appreciated.

Software summary

Bootloader: U-Boot
Kernel version: 5.15.148-6.8.0-devel+git.8c5c2dcbf6ba #1 SMP PREEMPT Tue Aug 6 10:01:59 UTC 2024
Kernel command line: root=PARTUUID=1415262f-02 ro rootwait console=tty1 console=ttymxc2,115200 consoleblank=0 earlycon
Distro name: NAME=“TDX Wayland with XWayland”
Distro version: VERSION_ID=6.8.0-devel-20250527114751-build.0
Distro variant: -
Hostname: sebscam

Hardware info

HW model: Toradex Verdin iMX8M Plus on Verdin Development Board
Toradex version: 0063 V1.1B
Serial number: 15677588
Processor arch: aarch64

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


GPU-to-VPU Pipeline Bottleneck on Verdin iMX8M Plus

The issue you’re experiencing is related to the memory transfer bottleneck between the GPU and VPU components. The Verdin iMX8M Plus supports hardware-accelerated graphics processing, but there are specific considerations for efficient pipeline configuration.

Graphics Backend Support

Toradex BSPs support Wayland as the graphics backend on i.MX 8 Series modules starting from BSP 5.0(1). Your current BSP 6.8.0 should have proper Wayland support for OpenGL ES acceleration(1).

Hardware Acceleration Validation

Toradex validates OpenGL ES using glmark2 benchmarks on the Verdin iMX8M Plus, achieving a score of 953 under specific test conditions(1). This confirms that GPU acceleration is functional on your hardware.

Pipeline Optimization Recommendations

  1. Direct Memory Buffer Usage: Your pipeline already uses io-mode=dmabuf for the v4l2src, which is good for avoiding memory copies.

  2. Format Considerations: The gldownload element falling back to SystemMemory suggests a format or memory allocation mismatch. Try using compatible formats that both GPU and VPU can handle efficiently.

  3. Alternative Pipeline Structure: Consider bypassing the GPU processing if it’s causing bottlenecks. For camera capture and encoding, a direct path might be more efficient:

gst-launch-1.0 -v v4l2src device=/dev/video2 ! video/x-raw ! videoconvert ! waylandsink

(2)

BSP Version Considerations

Based on community feedback, there are known issues with VPU encoding in BSP 7.x versions. A community discussion indicates that v4l2h264enc has problems in recent BSP versions, with users being advised to use BSP 6 until fixes are available(3).

Since you’re already on BSP 6.8.0, upgrading to BSP 7 might introduce additional VPU-related issues rather than solving your current problem.

Suggested Troubleshooting Steps

  1. Test VPU independently: Verify that v4l2h264enc works properly without GPU processing
  2. Check memory allocation: Use GST_DEBUG=*:3 to identify memory allocation issues
  3. Try alternative formats: Experiment with different video formats between gldownload and v4l2h264enc
  4. Consider direct encoding: If GPU processing isn’t essential, use direct camera-to-encoder pipeline for better performance

The Verdin iMX8M Plus features a Neural Processing Unit (NPU) and Video Processing Unit (VPU) designed to offload processing tasks(4), but efficient pipeline configuration is crucial for optimal performance.

Best regards, Toradex AI Assistant.

Hello @sebsacuna,

Sorry for the delay on this topic.
Could you try with vpuenc_h264 instead of v4l2h264enc?

Best Regards,
Bruno