Getting the Most Out of VPU and DMA

Good day,
My team and I are looking at ways to reduce the overhead from our gstreamer pipeline with the verdin IMX8MP.

We have a yocto build based on

recipes-images/images/tdx-reference-minimal-image.bb

With the following gst and imx imaging related recipes

gstreamer1.0 \
    gstreamer1.0-plugins-base \
    gstreamer1.0-plugins-good \
    gstreamer1.0-plugins-imx \
    imx-gst1.0-plugin \
    gstreamer1.0-vaapi \
    gstreamer1.0-plugins-bad \
    gstreamer1.0-plugins-ugly \
    imx-codec \
    imx-dsp \
    imx-vpu-hantro \
    imx-vpu-hantro-daemon \
    imx-vpuwrap \
   imx-codec \
    packagegroup-fsl-gstreamer1.0-full \
    packagegroup-fsl-tools-gpu \
    packagegroup-fsl-tools-gpu-external \
    packagegroup-fsl-tools-benchmark \
IMAGE_INSTALL:remove += "packagegroup-fsl-isp"

The pipeline is in 2 parts so we can interface it with our software, the first part, with fakesink replacing appsink when running with GST_DEBUG level 3

v4l2src device=/dev/video2 do-timestamp=true ! video/x-raw,format=UYVY,framerate=30/1,width=1920,height=1080 ! videoconvert ! video/x-raw,format=BGR ! fakesink

We intially see the following errors

ERROR              videometa gstvideometa.c:242:default_map: plane 0, no memory at offset 0
ERROR                default video-frame.c:168:gst_video_frame_map_id: failed to map video frame plane 0
WARN             videofilter gstvideofilter.c:296:gst_video_filter_transform:<videoconvert0> warning: invalid video buffer received
WARNING: from element /GstPipeline:pipeline0/GstVideoConvert:videoconvert0: Internal GStreamer error: code not implemented.  Please file a bug at https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/new.
Additional debug info:
/usr/src/debug/gstreamer1.0-plugins-base/1.22.5.imx/gst-libs/gst/video/gstvideofilter.c(296): gst_video_filter_transform (): /GstPipeline:pipeline0/GstVideoConvert:videoconvert0:
invalid video buffer received
Redistribute latency...
ERROR              videometa gstvideometa.c:242:default_map: plane 0, no memory at offset 0
ERROR                default video-frame.c:168:gst_video_frame_map_id: failed to map video frame plane 0
WARN             videofilter gstvideofilter.c:296:gst_video_filter_transform:<videoconvert0> warning: invalid video buffer received

But the pipeline is able to continue as normal with

WARN                 v4l2src gstv4l2src.c:1353:gst_v4l2src_create:<v4l2src0> lost frames detected: count = 2 - ts: 0:00:03.607138167

every .1 seconds or so, but the pipeline provides images that are able to be handled. Overhead is large however.
Taking a look at the IMX 8 Gstreamer User Guide we noticed that we can leverage DMABUF to reduce the strain on the imx8mp. However, that provides the same initial errors and does not produce any usable data for the second half of the pipeline where we encode
appsrc emit-signals=false ! videoscale ! video/x-raw, format=BGR, height=1080, width=1920 ! "
"videoconvert ! video/x-raw, format=BGRx ! queue ! vpuenc_h264 ! h264parse ! matroskamux ! filesink location=test.mkv

Is there any way for us to leverage the DMA and the vpu encoding such that we minimise the strain on the imx8mp’s cpu?

Software summary

Bootloader: U-Boot
Kernel version: 5.15.148-6.7.0-devel-7.0.0-devel+git.284966e8d6ee #1 SMP PREEMPT Wed Jun 26 17:51:19 UTC 2024
Kernel command line: root=PARTUUID=67a02fe7-02 ro rootwait console=tty1 console=ttymxc2
Distro name: NAME=“POTENTIAL-OS”
Distro version: VERSION_ID=7.0.0-devel-20240808141945-build.0
Hostname: verdin-imx8mp-15289169

Hardware info

HW model: Custom Device tree for potential motors’ Mallow Breakout Board
Toradex version: 0058 V1.1B
Serial number: 15289169
Processor arch: aarch64

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


To reduce the overhead from your GStreamer pipeline on the Verdin iMX8MP, you can consider the following strategies:

  1. Use Hardware Acceleration: Ensure that you are leveraging the hardware acceleration capabilities of the iMX8MP. This includes using the VPU for video encoding and decoding. The vpuenc_h264 element in your pipeline is a good start, but make sure that all parts of your pipeline that can be hardware-accelerated are using the appropriate plugins.

  2. DMABUF Usage: You mentioned trying to use DMABUF to reduce strain. Ensure that your pipeline is correctly set up to use DMABUF. This involves making sure that all elements in the pipeline support DMABUF and that the memory is being shared correctly between elements. You might need to adjust the pipeline to ensure compatibility with DMABUF.

  3. Optimize Pipeline Elements: Review the elements in your pipeline to ensure they are necessary and optimized. For example, videoconvert can be CPU-intensive, so ensure it’s needed or see if it can be replaced with a hardware-accelerated alternative.

  4. Debugging and Profiling: Use GStreamer debugging tools to identify bottlenecks in your pipeline. The errors you are seeing might indicate issues with memory handling or element compatibility. Addressing these errors could improve performance.

  5. Check for Updates: Ensure that you are using the latest versions of the GStreamer plugins and the NXP BSP, as updates might include performance improvements or bug fixes.

  6. Consult Documentation: Refer to the iMX8 GStreamer User Guide and other relevant documentation for specific instructions on optimizing pipelines for the iMX8MP.

If these strategies do not resolve the issue, you may need to delve deeper into the specific errors and warnings you are encountering, possibly with the help of the GStreamer community or Toradex support.

Please, also check if these links can help you solve your problem:

When building tdx-reference-multimedia-image, around 5000 task, it hangs. How to solve this?.

The problem is now more or less managed. While the deployed GST pipeline could very well be optimised further, it appears that the nxp-imx gstreamer version from meta-freescale is currently 1.22 on the scarthgap branch. Creating a bbappend such that gstreamer and the relevant plugins point towards their 1.24 counter parts has greatly improved the performance and reduced the dropped frames from the v4l2src side.

Hey @Mackenzie_Savoy,

Good to hear. While not officially released it is possible to build in scarthgap (BSP 7). You can see the posted scarthgap-7.x.y manifest. I was able to build the tdx-reference-minimal-image without issues. Of course there would be more to check for your build.

-Eric