TorizonCore IMX6Q Apalis video decode with hardware acceleration

Hello.

My question is, is it possible to play video stream with hardware acceleration on IMX6Q 1GB Apalis module and Torizoncore system? Does it support using the VPU, GPU for video playback?
The best way for us would be to play the video stream in a browser, but I tried chromium and I couldn’t find a video format that played well, the CPU usage was always high, and the video playback performance was poor. I run the chromium container as you descibe here:
Web Browser / Kiosk Mode with TorizonCore | Toradex Developer Center

and i tried videos from here:
https://test-videos.co.uk/.

I tried the gstreamer also following this: How to use Gstreamer on TorizonCore | Toradex Developer Center . On the Reference Multimedia Image i can run this pipeline:

gst-launch-1.0 udpsrc port=5000 ! tsdemux ! queue ! "video/x-h264" ! vpudec ! imxg2dvideosink force-aspect-ratio=false

And this works very well on the Reference Multimedia Image. But this pipeline doesn’t work on torizonecore, it says this error:
WARNING: erroneous pipeline: no element "vpudec".

What is the difference between the settings used in the description above and those used in the Reference Multimedia image?

I tried this rtp pipline also:
gst-launch-1.0 -v udpsrc port=5000 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264, payload=(int)96" ! rtph264depay ! decodebin ! videoconvert ! autovideosink sync=false

, but it seems that the decoding is not hardware-based here, because the cpu usage is high.

We would like to play network video stream minimum resolution 1280x720 and 25fps. The video format is not fixed, we use what is supported by the hardware.
Environment:
TorizonCore: torizon-core-docker-apalis-imx6-Tezi_5.7.2+build.20
Module: Apalis iMX6Q 1GB V1.1B

Thank you for your help.

Greetings @vigh.balazs,

I was more or less able to reproduce or observe similar behavior to the findings you listed here. Allow me to check with our team internally and get clarification on what should and should not be working in our containers with regards to hardware acceleration. I’ll get back to you once I have more information to share.

Best Regards,
Jeremias

Thank you for your answer.
I found a note on the TorizonCore Issue Tracker page in 6.4.0-devel-202309 release: “Note that Apalis/Colibri iMX6 already have upstream support for VPU acceleration”. This feature is in the 5.7.2 version also?
I read that maybe should install a gstreamer imx plugin. If so, what plugins should i use and how can i install in the docker file? I was able to add line “gstreamer1.0-imx” to the gstreamer example docker file, and it builds successfully, but it has no effect, there is no vpudec option.

Thank you for your help.
Best Regards,
Balázs Vígh

I think I know what the issue is now. First of all there is video decode hardware acceleration. But there’s a difference in TorizonCore we use the mainline kernel, meaning the processes and details behind hardware acceleration differ a bit. When you used the multimedia image here:

And this works very well on the Reference Multimedia Image.

This uses a downstream kernel close to what NXP provides. That said we also have a “Reference Multimedia Image (upstream)”, that uses the upstream kernel like TorizonCore. If you try your pipeline on this image you’ll get the same error of:

WARNING: erroneous pipeline: no element "vpudec".

To make use of video decoding you need to do the following:

  • Create a container image with the various gstreamer packages as documented like before.
  • Don’t forget to run a separate Weston container to provide a graphical compositor for your videos to play on and such.
  • Now unlike the documentation when you run this Gstreamer container you need certain arguments. I used:
docker run -d --rm --name=weston-gstreamer --net=host --cap-add CAP_SYS_TTY_CONFIG \
             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
             --device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 226:* rmw' --device-cgroup-rule='c 81:* rmw'\
              <name of your gstreamer container>
  • The important arguments here are the cgroup rules. These allow the container access and use of the CODA video encoder and decoder that is used in the mainline kernel. If you omit these then video decoding and encoding will default to using the CPU which will result in poor performance and high CPU usage.
  • At this point I tested the video decoding by downloading and playing a video file:
root@apalis-imx6-05228985:~# wget http://linode.boundarydevices.com/videos/trailer_1080p_h264_mp3.avi
--2023-10-11 20:44:02--  http://linode.boundarydevices.com/videos/trailer_1080p_h264_mp3.avi
Resolving linode.boundarydevices.com (linode.boundarydevices.com)... 173.255.200.20
Connecting to linode.boundarydevices.com (linode.boundarydevices.com)|173.255.200.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10527382 (10M) [video/x-msvideo]
Saving to: ‘trailer_1080p_h264_mp3.avi’

trailer_1080p_h264_mp3.avi              100%[============================================================================>]  10.04M  13.9MB/s    in 0.7s

2023-10-11 20:44:03 (13.9 MB/s) - ‘trailer_1080p_h264_mp3.avi’ saved [10527382/10527382]

root@apalis-imx6-05228985:~# gst-launch-1.0 filesrc location=trailer_1080p_h264_mp3.avi ! avidemux ! decodebin ! waylandsink

In this test the video played with acceptable performance and not too much CPU usage. Notice in the gstreamer pipeline I just used decodebin to handle selection of the decoding elements since vpudec does not exist in mainline.

As a side-note depending on your use-case you may need to also adjust the default CMA value as described here: Contiguous Memory Allocator - CMA (Linux) | Toradex Developer Center

Otherwise you may not have enough memory to execute your gstreamer commands. I had to do this in my test to play the test video.

Best Regards,
Jeremias

Thank you for your reply.
I got a lot of useful information.
I tried your example and it works fine on torizoncore version 6.4.0, but on version 5.7.2 it only works with high cpu usage. Which version did you test this video playback on?
So far I have used version 5.7.2, but would it be more appropriate to use version 6.4.0?

Best Regards,
Balázs Vígh

Actually I explicitly did this test on 5.7.2. I didn’t even try it on 6.4.0.

I just double-checked on 5.7.2 and it still works fine as I documented. For reference when the VPU is in use the CPU usage of the gst-launch-1.0 process is about 10%-20% according to top. Meanwhile if I just do CPU rendering (by omitting the c-group rules in the docker run command) I can see the same process takes 300%+ CPU usage. Not to mention the performance in the video playback is drastically worse.

Are you sure you followed every step? Or what exactly are you defining as “high cpu usage”?

Best Regards,
Jeremias

Thank you for your reply. I think I do something wrong.

I tried two method to run the wget and gst-launch-1.0 commands. First from the gstreamer container, but i used something different or not the correct way, because in your example the prompt was “~#”, the prompt in my container was “/#”. How or where did you run these commands?
This is what i used in the container:

root@apalis-imx6-10678703:/# wget http://linode.boundarydevices.com/videos/trailer_1080p_h264_mp3.avi
--2023-10-17 11:40:55--  http://linode.boundarydevices.com/videos/trailer_1080p_h264_mp3.avi
Resolving linode.boundarydevices.com (linode.boundarydevices.com)... 173.255.200.20
Connecting to linode.boundarydevices.com (linode.boundarydevices.com)|173.255.200.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10527382 (10M) [video/x-msvideo]
Saving to: ‘trailer_1080p_h264_mp3.avi’

trailer_1080p_h264_mp3.avi     100%[==================================================>]  10.04M  3.74MB/s    in 2.7s

2023-10-17 11:40:58 (3.74 MB/s) - ‘trailer_1080p_h264_mp3.avi’ saved [10527382/10527382]

root@apalis-imx6-10678703:/# gst-launch-1.0 filesrc location=trailer_1080p_h264_mp3.avi ! avidemux ! decodebin ! waylandsink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Redistribute latency...
Redistribute latency...
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Redistribute latency...
WARNING: from element /GstPipeline:pipeline0/GstWaylandSink:waylandsink0: A lot of buffers are being dropped.
Additional debug info:
../libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstWaylandSink:waylandsink0:
There may be a timestamping problem, or this computer is too slow.

The second time, I added the gst-launch-1.0 command to the docker run parameter:

torizon@apalis-imx6-10678703:~$ wget -P /tmp http://linode.boundarydevices.com/videos/trailer_1080p_h264_mp3.avi
Connecting to linode.boundarydevices.com (173.255.200.20:80)
saving to '/tmp/trailer_1080p_h264_mp3.avi'
trailer_1080p_h264_m 100% |*****************************************************************| 10.0M  0:00:00 ETA
'/tmp/trailer_1080p_h264_mp3.avi' saved

torizon@apalis-imx6-10678703:~$ docker run -d --rm --name=weston-gstreamer --net=host --cap-add CAP_SYS_TTY_CONFIG -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ -v /tmp:/tmp --device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 226:* rmw' --device-cgroup-rule='c 81:* rmw' own_user/gst_example gst-launch-1.0 filesrc location=/tmp/trailer_1080p_h264_mp3.avi ! avidemux ! decodebin ! waylandsink

I increased the CMA to 320MB:

torizon@apalis-imx6-10678703:~$ dmesg | grep cma
[    0.000000] cma: Reserved 320 MiB at 0x3c000000
[    0.000000] Kernel command line: enable_wait_mode=off vmalloc=400M root=LABEL=otaroot rootfstype=ext4 quiet logo.nologo vt.global_cursor_default=0 plymouth.ignore-serial-consoles splash fbcon=map:3 ostree=/ostree/boot.1/torizon/80b84d3908d9a18e712de5f31c58a5db3f8d2647b72698051cee740e1554abae/0 cma=320MB
[    0.000000] Memory: 686788K/1048576K available (8192K kernel code, 900K rwdata, 4196K rodata, 1024K init, 435K bss, 34108K reserved, 327680K cma-reserved, 90112K highmem)

I got same result both method, the CPU usage was high, over 300%, in the TorizonCore 5.7.2 version.
The weston container what i used:
docker run -d --rm --name=weston --net=host --cap-add CAP_SYS_TTY_CONFIG -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ --device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' torizon/weston:$CT_TAG_WESTON --developer weston-launch --tty=/dev/tty7 --user=torizon

The dockerfile of the gstreamer container:

ARG BASE_NAME=wayland-base
ARG IMAGE_ARCH=linux/arm/v7
ARG IMAGE_TAG=3
ARG DOCKER_REGISTRY=torizon

FROM --platform=$IMAGE_ARCH $DOCKER_REGISTRY/$BASE_NAME:$IMAGE_TAG
ARG IMAGE_ARCH
 
RUN apt-get -y update && apt-get install -y --no-install-recommends \
	libgstreamer1.0-0 \
	gstreamer1.0-plugins-base \
	gstreamer1.0-plugins-good \
	gstreamer1.0-plugins-bad \
	gstreamer1.0-plugins-ugly \
	gstreamer1.0-libav \
	gstreamer1.0-tools \
	gstreamer1.0-x \
	gstreamer1.0-alsa \
	gstreamer1.0-gl \
	gstreamer1.0-gtk3 \
	gstreamer1.0-pulseaudio \
	v4l-utils \
	&& if [ "${IMAGE_ARCH}" = "linux/arm64/v8" ]; then \
		apt-get install -y --no-install-recommends \
		gstreamer1.0-qt5; fi \
	&& apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*

On the TorizonCore 6.4.0 version both method works well with your solution, it uses 10% CPU with cgroup parameter and over 300% CPU without cgroup parameter.

Unfortunately, I don’t see where the problem might be. Please let me know if you know where I’m doing it wrong.

Best Regards,
Balázs Vígh

Oh I think I see the issue. In your Dockerfile your IMAGE_TAG is 3 which corresponds to our Debian bookworm containers. Our bookworm containers are meant to be used with TorizonCore 6.

The gstreamer packages in bookworm probably aren’t compatible with TorizonCore 5 in terms of hardware acceleration which is why in your case it’s defaulting to CPU rendering

When you’re referencing our examples make sure they’re correct for whatever version of TorizonCore you are using. So for TorizonCore 5.X you want to use Debian bullseye containers. Our samples repository has a branch for bullseye: https://github.com/toradex/torizon-samples/blob/bullseye/gstreamer/bash/simple-pipeline/Dockerfile

This is what I was referencing when I did my test.

Best Regards,
Jeremias

You’re right. That was the problem. Thanks for the help.

Glad I was able to help!