VPU Performance Issues on iMX8DX with Torizon OS 6

Hi,

I’m using an iMX8DX with Viola, Torizon OS 6, and the wayland-base-vivante:3 image. I’ve been trying to use the VPU, but I’m still facing performance issues, and CPU usage is very high.

I’m using a USB camera with MJPEG video format.

Here is the pipeline I’m currently using:


gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg, width=1280, height=720, framerate=30/1 ! v4l2jpegdec ! videoconvert ! video/x-raw, format=NV12 ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=output.mp4

Here’s the output from htop during execution:

And here’s the output before execution:

The result is a poor video around 16 fps.

I also tried using dmabuf, but it didn’t work:

root@colibri-imx8x-07202809:~/app# gst-launch-1.0 v4l2src device=/dev/video4 io-mode=dmabuf ! v4l2jpegdec output-io-mode=dmabuf-import ! videoconvert ! video/x-raw, format=NV12 ! v4l2h264enc output-io-mode=dmabuf-import ! h264parse ! mp4mux ! filesink location=output.mp4
Setting pipeline to PAUSED ...

====== V4L2ENC: 1.20.0 build on Mar 21 2023 18:43:32. ======

====== V4L2DEC: 1.20.0 build on Mar 21 2023 18:43:32. ======
Pipeline is live and does not need PREROLL ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Redistribute latency...
Redistribute latency...
ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Internal data stream error.

Additional debug info:

../libs/gst/base/gstbasesrc.c(3132): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming stopped, reason error (-5)
Execution ended after 0:00:01.219342288
Setting pipeline to NULL ...
Freeing pipeline ...
root@colibri-imx8x-07202809:~/app# 

I based this setup on the iMX8 VPU example, although it seems that this example may not apply to the iMX8DX specifically.

Is there VPU support for the iMX8DX in Torizon 7?

Could you please help me troubleshoot this issue?

Dockerfile (8.0 KB)

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Hello, and thank you for sharing a detailed description of your setup and issue.

1. VPU Support on iMX8DX with Torizon OS 6

Based on the currently available documentation, there is no explicit confirmation that the i.MX8DX’s VPU is officially supported for hardware video encoding/decoding in Torizon OS 6 or 7. While Toradex provides VPU-accelerated GStreamer pipelines for i.MX8 families, specific support or stability for the i.MX8DX variant is not documented. This can result in software fallbacks and high CPU usage (Video Processing GStreamer), (How to Use GStreamer on TorizonCore).

2. Pipeline Analysis and Troubleshooting

Your GStreamer pipeline:

gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg, width=1280, height=720, framerate=30/1 ! v4l2jpegdec ! videoconvert ! video/x-raw, format=NV12 ! v4l2h264enc ! h264parse ! mp4mux ! filesink location=output.mp4

Observations:

  • Both v4l2jpegdec and v4l2h264enc depend on the underlying VPU driver support and correct device mapping in the container.
  • High CPU usage usually indicates that decoding/encoding is happening in software due to missing or unsupported hardware acceleration.

Key checks and suggestions:

A. VPU-Backed Elements

  • The recommended hardware-accelerated encoder for i.MX8 is vpuenc_h264 (not v4l2h264enc). Example usage:
    ... ! videoconvert ! vpuenc_h264 ! video/x-h264 ! h264parse ! ...
    
  • Make sure the necessary VPU-related GStreamer plugins (e.g., gstreamer1.0-imx or equivalent) are available in your container. The wayland-base-vivante image is designed for this.

B. Pipeline Optimization

  • Try running with vpuenc_h264 if available:
    gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg, width=1280, height=720, framerate=30/1 ! jpegdec ! videoconvert ! vpuenc_h264 ! h264parse ! mp4mux ! filesink location=output.mp4
    
  • Replace jpegdec with v4l2jpegdec only if hardware-accelerated decoding is confirmed to be available.

C. Container Device Access

  • Ensure the container is running with access to /dev/video4, /dev/video-dec0, and /dev/video-enc0 (or similarly named devices, depending on your kernel, usually exposed by the VPU).

D. dmabuf Mode

  • Failure with dmabuf usually suggests that the kernel, driver, or GStreamer plugins lack the necessary support or compatibility for zero-copy operation on your current OS and hardware combination.

3. Next Steps

  • Double-check the presence of VPU device nodes (e.g., ls /dev/video*).
  • List the available GStreamer encoders:
    gst-inspect-1.0 | grep 264enc
    gst-inspect-1.0 | grep vpu
    
  • If vpuenc_h264 is available, use it as in the official example:
    GStreamer Hardware Encoding Example
  • If not available or pipeline still falls back to software, then current Torizon version may not support hardware VPU on iMX8DX, and performance will be CPU-bound.

References:

If you need help interpreting your pipeline, please paste the output of gst-inspect-1.0 | grep vpu and any error logs from a pipeline invocation using vpuenc_h264, if available.

Best regards,
Toradex AI Assistant.

Greetings @AhaUhu,

I based this setup on the iMX8 VPU example, although it seems that this example may not apply to the iMX8DX specifically.

I assume you mean this example here?: torizon-samples/vpu/generic/vpuGeneric at bookworm-new · toradex/torizon-samples · GitHub

If that is correct then this example was only made for the VPUs on the Verdin i.MX8M Mini, Verdin i.MX8M Plus, and Apalis i.MX8. No work was done on this example for the Colibri i.MX8X.

Is there VPU support for the iMX8DX in Torizon 7?

At the current moment there is no official VPU support for any SOM on Torizon OS 7.X. We do plan to do this and I can bring your case to our team to try and get this prioritized sooner.

Do you have a rough timeline of when you would need this capability for your product?

Best Regards,
Jeremias

Hello @jeremias.tx ,

Apologies for the late reply, I’ve been quite busy recently.

Yes, the example I referred to is exactly the one from the torizon-samples/vpu/generic/vpuGeneric repository. I’d like to check if there is any similar example or recommended approach that works with the Colibri i.MX8DX running Torizon OS 6, since that’s the platform I’m currently working with.

Additionally, I recently acquired an i.MX6DualLite and would like to experiment with the same video capabilities on that platform as well.

For context, I have two main goals for my projects:

  • One is to capture MJPEG video from a USB camera and run some real-time analysis on the frames. This can be done on either the i.MX8DX or the i.MX6DL, depending on which performs better or is easier to implement.
  • The second is to perform real-time video overlay (OSD) and then encode the output stream — this specifically needs to be done on the i.MX8DX.

Given that, I need to have the encode and decode functionality working within approximately one month. Considering this timeline, what would you recommend:

  • Stick with Torizon OS 6,
  • Wait for VPU support in Torizon OS 7,
  • Or prepare a custom Yocto-based distribution with the required multimedia support?

I really appreciate your advice on this. Thank you very much!

Best regards

Given that, I need to have the encode and decode functionality working within approximately one month.

It’s not likely we can prioritize and schedule the work for the VPU in Torizon OS within the next month.

You can try to prepare a custom yocto-based distribution. A build based on our multimedia reference image should have the basic libraries: Build a Reference Image with Yocto Project/OpenEmbedded | Toradex Developer Center

Though for other packages and libraries you may need for your application you’ll need to add these in as needed. If you’re not experienced with Yocto this can be a bit of a learning curve.

Best Regards,
Jeremias

Hi Jeremias,

Thank you for the feedback.

After several tests, I found out that the issue was actually related to the camera itself. I was initially using a USB camera with an integrated microphone and autofocus (not sure if those features are relevant to the problem). After switching to another camera, I was able to achieve the required frame rate without any problems — decoding MJPEG at 60 FPS, 720p resolution.

Additionally, hardware-accelerated MJPEG decoding worked perfectly using v4l2jpegdec. With a monochrome camera I tested, I even reached 120 FPS at the same resolution.

Regarding the previous camera, I noticed several QoS warnings in the logs. Regardless of the video sink used, the frame rate was limited to around 8 FPS. When using autovideosink with sync=false, I could get up to 16 FPS. However, using applications like Cheese on Linux or the Camera app on Windows, I had no problem achieving the expected 30 FPS.

Do you have any suggestions regarding this? It seems to be a limitation or configuration issue in GStreamer, as I could reproduce the exact same behavior on my notebook — both with GStreamer and FFplay (16 FPS with sync=false, and 8 FPS otherwise).

I’m currently facing some performance issues when converting video formats. Using videoconvert or autovideoconvert leads to significant performance drops. While researching, I came across the i.MX8 GStreamer User Guide and found that the plugin imxvideoconvert_g2d should provide hardware-accelerated color space conversion.

However, when I try to use it, I get the following error:

g2d_open: 2D/VG PIPE not found!
g2d_open: fail with status -13

I tried modifying the Dockerfile I previously shared to manually download and build the following components:

i* mx-parser

  • imx-vpu-hantro

  • imx-vpu-hantro-vc (not entirely sure if this one is appropriate, seems more related to i.MX8MP)

  • imx-vpuwrap

  • imx-gst1.0-plugin

# imx-codec
RUN wget -q https://www.nxp.com/lgfiles/NMG/MAD/YOCTO/imx-codec-4.7.2.bin \
    && chmod +x imx-codec-4.7.2.bin \
    && ./imx-codec-4.7.2.bin --auto-accept \
    && cd imx-codec-4.7.2 \
    && ./configure --enable-armv8 --disable-vpu --prefix=/usr \
        --libdir=/usr/lib/aarch64-linux-gnu --includedir=/usr/include \
    && make -j${nproc} \
    && make install DESTDIR=/vpusysroot \
    && rm -r /vpusysroot/usr/lib/aarch64-linux-gnu/imx-mm/video-codec \
    && mv /vpusysroot/usr/lib/aarch64-linux-gnu/imx-mm/audio-codec/lib* \
        /vpusysroot/usr/lib/aarch64-linux-gnu/ \
    && cp -r /vpusysroot/* / \
    && cd .. && rm -rf imx-codec-4.7.2.bin imx-codec-4.7.2

# imx-parser
RUN wget -q https://www.nxp.com/lgfiles/NMG/MAD/YOCTO/imx-parser-4.7.2.bin \
    && chmod +x imx-parser-4.7.2.bin \
    && ./imx-parser-4.7.2.bin --auto-accept \
    && cd imx-parser-4.7.2 \
    && ./configure --enable-armv8 --prefix=/usr \
        --libdir=/usr/lib/aarch64-linux-gnu --includedir=/usr/include \
    && make -j${nproc} \
    && make install DESTDIR=/vpusysroot \
    && cp -r /vpusysroot/* / \
    && cd .. && rm -rf imx-parser-4.7.2.bin imx-parser-4.7.2

RUN wget https://www.nxp.com/lgfiles/NMG/MAD/YOCTO/imx-vpu-hantro-1.27.0.bin && \
    chmod +x imx-vpu-hantro-1.27.0.bin && \
    ./imx-vpu-hantro-1.27.0.bin --auto-accept && \
    cd imx-vpu-hantro-1.27.0 && \
    make -j$(nproc) PLATFORM=IMX8QXP all && \
    libdir=/usr/lib/aarch64-linux-gnu/ make DEST_DIR=/vpusysroot PLATFORM=IMX8QXP install && \
    cd .. && \
    cp -r /vpusysroot/* / && \
    rm -rf imx-vpu-hantro-1.27.0.bin imx-vpu-hantro-1.27.0 /vpusysroot && \
    \
    wget https://www.nxp.com/lgfiles/NMG/MAD/YOCTO/imx-vpu-hantro-vc-1.9.0.bin && \
    chmod +x imx-vpu-hantro-vc-1.9.0.bin && \
    ./imx-vpu-hantro-vc-1.9.0.bin --auto-accept && \
    mkdir -p /vpusysroot/usr/include/ /vpusysroot/usr/lib/aarch64-linux-gnu/ && \
    cp -r imx-vpu-hantro-vc-1.9.0/usr/include/* /vpusysroot/usr/include/ && \
    cp -r imx-vpu-hantro-vc-1.9.0/usr/lib/* /vpusysroot/usr/lib/aarch64-linux-gnu/ && \
    cp -r /vpusysroot/* / && \
    rm -rf imx-vpu-hantro-vc-1.9.0.bin imx-vpu-hantro-vc-1.9.0 /vpusysroot

RUN git clone https://github.com/nxp-imx/imx-vpuwrap.git && \
    cd imx-vpuwrap && \
    git checkout MM_04.07.02_2210_L5.15.y && \
    autoreconf -Wcross --verbose --install --force && \
    ./configure --prefix=/usr --libdir=/usr/lib/aarch64-linux-gnu --disable-static && \
    make -j$(nproc) && \
    make install DESTDIR=/vpusysroot && \
    cp -r /vpusysroot/* / && \
    rm -rf /vpusysroot/usr/share && \
    cd .. && \
    rm -rf imx-vpuwrap /vpusysroot
    


# imx-gst1.0-plugin
RUN git clone https://github.com/nxp-imx/imx-gst1.0-plugin.git \
        -b MM_04.07.02_2210_L5.15.y \
    && cd imx-gst1.0-plugin \
    && meson -Dplatform=MX8 --prefix=/usr --libdir=/usr/lib/aarch64-linux-gnu \
        -Dc_args="-I/usr/include -I/usr/include/imx"  build \
    && ninja -v -j$(nproc) -C build \
    && DESTDIR=/vpusysroot ninja -v -j$(nproc) -C build install \
    && cp -r /vpusysroot/* / \
    && cd .. && rm -rf imx-gst1.0-plugin

I attempted different approaches, including forcing PLATFORM=IMX8QXP and testing variations with or without the -vc variant of the VPU drivers. Unfortunately, I couldn’t get the G2D-based plugins (imxvideoconvert_g2d, imxcompositor_g2d) to work.

Interestingly, I had access to another board based on the i.MX8DX with the latest Yocto BSP and the Reference Multimedia Image. On this device, the G2D-based plugins worked flawlessly — both imxvideoconvert and imxcompositor load and function without issues. That leads me to suspect it could be a kernel module, driver, or device node issue.

I’m also able to access the GPU through OpenGL, so it doesn’t seem to be a general GPU problem.

Here’s the relevant part of my docker-compose.yml setup, in case it’s related to permissions or device access:

version: '3.8'
services:
  hud:
    image: luccasparentex/hud:v1_t6_modified
    container_name: hud
    privileged: true
    network_mode: host
    cap_add:
      - SYS_TTY_CONFIG
    volumes:
      - /dev:/dev
      - /tmp:/tmp
      - /run/udev/:/run/udev/
    device_cgroup_rules:
      - 'c 4:* rmw'
      - 'c 253:* rmw'
      - 'c 13:* rmw'
      - 'c 226:* rmw'
      - 'c 10:223 rmw'
      - 'c 199:0 rmw'

Do you have any suggestions on what I could try? If you can share any reference on how the video acceleration components were set up in the original Dockerfile, I could check if there’s something similar for the NXP i.MX8DX. I’m not sure if this is a kernel, driver, or configuration issue. I’ve also tried running with sudo, but perhaps something like a missing cgroup_rule or permission is still blocking it.

Any suggestions or insights would be greatly appreciated.

Best regards

Do you have any suggestions regarding this? It seems to be a limitation or configuration issue in GStreamer, as I could reproduce the exact same behavior on my notebook — both with GStreamer and FFplay (16 FPS with sync=false, and 8 FPS otherwise).

Hmm if it’s a limitation in Gstreamer itself, then we’re not aware of this. This would most likely require a more in-depth investigation.

On this device, the G2D-based plugins worked flawlessly — both imxvideoconvert and imxcompositor load and function without issues. That leads me to suspect it could be a kernel module, driver, or device node issue.

Since this works on the reference image, I’m going to assume it’s just an issue with the VPU container itself. As I said prior, the VPU container example was just a proof-of-concept. It wasn’t really meant to be a fully tested and validated solution.

If you can share any reference on how the video acceleration components were set up in the original Dockerfile

The only real reference for this proof-of-concept is the Dockerfile itself: torizon-samples/vpu/generic/vpuGeneric/Dockerfile at bookworm-new · toradex/torizon-samples · GitHub

In the Dockerfile it can be seen we fetch the various VPU libraries from NXP sources. Though again this was only for the 3 select modules. The Colibri i.MX8X wasn’t one of them, so some aspects will probably differ.

Best Regards,
Jeremias