Video processing pipeline for i.MX8

alexxs · June 24, 2022, 12:17pm

Hi,

I’m currently using an i.MX8 QuadMax Apalis module for a product we’re considering. Have done a basic proof-of-concept with the following processing pipeline.

OpenCV captures video from the USB video camera using v4l2
A simple text overlay is added on the video
The resulting image is piped as a series of images to stdout and then into ffmpeg
Ffmpeg encodes it and streams it to YouTube

./apalis-opencv-test | ffmpeg -thread_queue_size 1024 -i pipe: -f lavfi -i anullsrc -f flv -c:v libx264 -c:a aac -preset ultrafast rtmp://a.rtmp.youtube.com/live2

Now, my main issue is that it’s slow. The pipeline above only gets ~18FPS, but I can’t determine which bit is causing the slowdown and whether or not it’s using the VPU to encode.

Thanks!

gclaudino.tx · June 27, 2022, 12:19pm

Dear @alexxs,

Welcome back to our community. Could you please share more information about your setup?

Which image and version are you using? Did you do any modifications to the base image packages/kernel configuration?
Do you intend to use stdout only or have you checked for other solutions like OpenCV VideoWriter? OpenCV: cv::VideoWriter Class Reference. You could arrive to a pipeline that has something like: appsrc ! video/x-raw,format=BGR ! videoconvert ! v4l2h264enc ! flvmux streamable=true ! queue ! rtmpsink location='rtmp://<ip_address>/live live=true'. You may need to add rtpmsink to a custom image.
You’d need to use ffmpeg or you could foresee gstreamer (Video Encoding and Playback With GStreamer (Linux) | Toradex Developer Center) for your end application, for instance?

alexxs · June 27, 2022, 2:37pm

Thanks for the reply!

Will come back to that, it was built a while ago. It’s a Torizon-core-multimedia built with Yocto and with OpenCV and ffmpeg added to it.
The stdout was merely a test. I found this example on how to integrate OpenCV with a gstreamer pipeline and will give that a go.
I don’t necessarily need ffmpeg. I will try gstreamer and try to build on a simpler example to try and determine where the bottleneck is.

What I’m thinking right now is start only with frame acquisition from the USB camera and confirm it’s able to produce 30/60 FPS. Will push that into a fakesync then start adding encryption and see where the problem starts to appear.

gclaudino.tx · June 27, 2022, 3:32pm

Dear @alexxs,

Thanks for the update. Please tell us about your progress so that we could see how to help you.

Best regards,

alexxs · July 3, 2022, 9:09pm

So I managed to make some progress today, but as it happens sometimes, one answer yielded a million questions. Please bear with me

I have this gstreamer pipeline working fine and streaming to my test YT channel as a proof-of-concept.

gst-launch-1.0 -v videotestsrc pattern=ball is-live=true ! video/x-raw,width=640,height=480,framerate=30/1 ! v4l2h264enc ! h264parse ! flvmux name=mux audiotestsrc ! queue ! audioconvert ! avenc_aac ! aacparse ! mux. mux. ! tee name=t t. ! rtmpsink location="rtmp://a.rtmp.youtube.com/live2/

I am not able to use gstreamer to display the feed from my USB camera.

This works

gst-launch-1.0 -v videotestsrc ! autovideosink

This doesn’t:

gst-launch-1.0 -v v4l2src device=/dev/video2 ! autovideosink

root@apalis-imx8-06959030:~# gst-launch-1.0 -v v4l2src device=/dev/video2 ! autovideosink
Setting pipeline to PAUSED …
Pipeline is live and does not need PREROLL …
ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Internal data stream error.
Additional debug info:
…/git/libs/gst/base/gstbasesrc.c(3072): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming stopped, reason not-negotiated (-4)
ERROR: pipeline doesn’t want to preroll.
Setting pipeline to PAUSED …
Setting pipeline to READY …
Setting pipeline to NULL …
Freeing pipeline …

Similarly, if I’m trying to display the output of the v4l2h264encoder, that also doesn’t work:

gst-launch-1.0 -v videotestsrc ! v4l2h264enc ! autovideosink

If I decode the h264 back to raw, it works.

gst-launch-1.0 -v videotestsrc ! v4l2h264enc ! v4l2h264dec ! autovideosink

As an extension, streaming the webcam to YouTube doesn’t work either. Would be great if someone could point me towards a tutorial for better understanding the syntax of the GStreamer pipeline and, more importantly, how to debug various problems. I’ve been looking at their online tutorial, but it’s more aimed for the next step of development and integrating it all into an application that uses the GStreamer libraries.

alexxs · July 3, 2022, 9:51pm

I managed to get the webcam displaying relatively fluently (20 FPS with FPSDisplaySink element), but I can’t pipe the output of the MJPEG decoder into the H264 encoder:


GST_DEBUG=4 gst-launch-1.0 -v v4l2src device=/dev/video2 ! v4l2jpegdec ! v4l2h264enc ! v4l2h264dec ! autovideosink

Output from running the command is attached as a text file.
gstreamer_log (73.8 KB)

gclaudino.tx · July 4, 2022, 2:35pm

Dear @alexxs,

There may be a problem with the formats between the required inputs and outputs for gstreamer. What I’d suggest you is to have a look at: gst-inspect. This tool will help you to analyse the required formats for every step in the pipeline.
https://gstreamer.freedesktop.org/documentation/tools/gst-inspect.html

You may also want to have a look on videoconvert to change the formats in order for the pipelines to work properly. This could also help you to achieve higher FPS.

Please tell me if this helps.

Best regards,

alexxs · July 4, 2022, 4:42pm

Hi. I’ve inspected carefully the two elements using gst-inspect and the source pad of the JPEG decoder and the sink pad of the encoder match nicely. I also used a caps filter between them to make sure it’s all ok. Have tried videoconvert and it does run, but just produces green artifacts.

Trying to think about my end goal here, I have two questions:

As I’ll be putting an OpenCV app between the webcam and the H264 encoding, is there a good chance of not running into the same problem then?
Is there enough processing power on the i.MX8QM to decode the MJPEG from two cameras, do a small bit of processing and then encode in H264?

Thanks!

gclaudino.tx · July 4, 2022, 6:25pm

Hi @alexxs,

I’ll check internally to see if I can get better answers for these two questions and I’ll come back to you soon.

Best regards,

alexxs · July 5, 2022, 7:04am

Thank you! Much appreciated.

gclaudino.tx · July 5, 2022, 2:49pm

Dear @alexxs,

This should not be a problem with gstreamer. We’ll try to reproduce this issue on our side in the next days. It’ll be really helpful if you can test with OpenCV in the meantime.

It should work. On the chip datasheet from NXP it says that it’d be able to work with up to 4 line streams. If you want more information, you can look their datasheet directly.

Best regards,

alexxs · July 7, 2022, 6:28am

Please let me know if you have any success.

I might be mistaken, but it seems it only supports two H264 encoding pipelines at the same time? Just trying to make sure I’m fully understanding everything and more importantly that I’m on the right platform for the job.

Thanks!

alexxs · July 7, 2022, 7:53pm

Ok, managed to test with OpenCV. Got it working, but there are big issues. First of all, this is the pipeline needed:

capture = VideoCapture(“v4l2src device=/dev/video2 io-mode=2 ! image/jpeg, witdh=1920 ! v4l2jpegdec ! video/x-raw ! imxvideoconvert_g2d ! video/x-raw, format=BGRx ! queue ! videoconvert ! video/x-raw,format=BGR ! appsink”, CAP_GSTREAMER);

Up until the SW videoconvert element, the FPS is excellent, (15ms for processing, so running in realtime). As soon as I add that element, which essentially only needs to remove the 4th channel (from BGRx to BGR), the processing times per frame go up to 400-500ms. Horrendous.

Did you get a chance to test at your end? What alternatives are there for OpenCV for simple 2D image manipulation?

LE: On the other hand, if I use the V4L2 subsystem to capture the frames, I’m getting between 40 and 100ms per decoded frame. Not perfect, but still much better.

capture = VideoCapture(2, CAP_V4L2); // /device/video2

Thank you!

gclaudino.tx · July 8, 2022, 4:21pm

Dear @alexxs,

Thanks for the updates.

I think you’re right. The 4 streamlines seem to include encoding and decoding features as you can see here from the chip reference manual:

However, I’d suggest you ask on NXP community to see if they are available to give more information on this specific topic.

This is indeed not the desired processing time. I haven’t tested it properly yet but I’ll return to you early next week. The main alternative would really be using GStreamer but we need to understand what may be causing your issues.

For what we know, the problem may be linked to the videoconvert on the pipeline. So here are a few suggestions for what you could test:

Can you try passing additional parameters to the video convert such as:
- n-threads=4
- primaries-mode=“fast”
Have you tried receiving in BGRA with opencv?
Have you confirmed that imxvideoconvert_g2d cannot output BGR directly?

Best regards,

alexxs · July 10, 2022, 9:14pm

Hi and thanks again for your help. After a busy weekend debugging this, I think I have some better conclusions and a decent amount of questions for you and your colleagues.

I was able to narrow the problem down to the JPEG to Raw decoder. How did I do that? First of all I confirmed that the camera itself is able to output 60FPS at 1920x1080 on both Windows (with a test app) and Linux. I then modified the GStreamer pipeline to not decode the frames and just retrieve them from the camera. That yielded the expected 60 FPS as well.

Moving on to the JPEG decoder. I tested the gstreamer pipeline on my Ubuntu 20.04 machine and it was able to output the expected 60FPS, even after decoding the JPEG stream. When I run the same pipeline on the device, I’m only getting 20FPS at most, so that surely points to the v4l2jpegdec pipeline element, right? Similarly, using the V4L2 backend in OpenCV yielded very similar results.

Now, I’m running an image based on the Torizon 5.5.0 (only Gstreamer RTP and opencv added). From my research, that image ships out with NXP BSP version 5.4.70-2.3.3. I opened NXP’s release notes for that BSP and, lo and behold, it seems like the MJPEG decoder is not actually done properly through the VPU. Which would at least confirm why I’m not able to get at least 30 FPS (normally 60).

This section clearly shows that the i.MX8QuadMax that I’m running doesn’t actually have MJPEG VPU decoding support in that version, but the i.MX6 does.

As such, can you please:

Confirm whether or not this release has full VPU support for MJPEG decoding. For me, going by that document, it would at least be an explanation for the subpar performance.
I’d appreciate it if you could reproduce it in your labs. If you’re somehow able to get a gstreamer pipeline to run even close to 30FPS at 1920x1080 resolution on a MJPEG camera, please post the steps taken to get there.

i.MX_Linux_Release_Notes-L5.4.70_2.3.3.pdf (364.9 KB)

gclaudino.tx · July 11, 2022, 5:24pm

Hi @alexxs,

Thanks for the reply.

Please note that the plugins that work with the VPU usually have imx on their name such as the imxvideoconvert_g2d mentioned before. Software plugins will have a lower performance on the modules.

It would be better to confirm it directly with NXP as the reference manual seemed to point out that it was available. You can find their community at: community.nxp.com

I’ve been made aware of some problems on TorizonCore regarding VPU Usage on iMX8 Modules. There are currently some investigations and efforts to migrate additional things to the image. There are new debian packages that have been created that you should add to your container for things to work better such as: linux-imx-headers, imx-codec and imx-parser that you can install with apt on your container. There are others still being migrated such as imx-gst1.0-plugins. Therefore, it’s not guaranteed that you’ll be able to get the performance you wish as of today but you could try adding the packages that are available already to see if this helps.

Otherwise, the video capabilities work properly on the BSP. Would this be an option for you at least until everything is migrated to Torizon? You could use our reference multimedia image for testing and then generate a custom one with Yocto.

alexxs · July 11, 2022, 5:45pm

@gclaudino.tx , thank you for the reply!

I have several threads in their community as well, but suffice to say they are not nearly as quick to reply as you and your colleagues (one of the main reasons I chose Toradex was the support and documentation). In short, I’m waiting their confirmation as well.

gclaudino.tx:

alexxs:

I’d appreciate it if you could reproduce it in your labs. If you’re somehow able to get a gstreamer pipeline to run even close to 30FPS at 1920x1080 resolution on a MJPEG camera, please post the steps taken to get there.

I’ve been made aware of some problems on TorizonCore regarding VPU Usage on iMX8 Modules. There are currently some investigations and efforts to migrate additional things to the image. There are new debian packages that have been created that you should add to your container for things to work better such as: linux-imx-headers, imx-codec and imx-parser that you can install with apt on your container. There are others still being migrated such as imx-gst1.0-plugins. Therefore, it’s not guaranteed that you’ll be able to get the performance you wish as of today but you could try adding the packages that are available already to see if this helps.

Otherwise, the video capabilities work properly on the BSP. Would this be an option for you at least until everything is migrated to Torizon? You could use our reference multimedia image for testing and then generate a custom one with Yocto.

Not sure I understand everything you’ve said. My understanding based on NXP’s own release notes is that although the hardware inside the chip can indeed decode up to 4x MJPEG streams, NXP’s own BSP doesn’t fully support the functionality yet.

Similarly, I’m currently compiling my own Yocto image (based your reference multimedia image, of course), so I guess the only way to add those packages would be to rebuild the whole image with Yocto. That’s not a problem, but from what I understand, you’re unsure that doing so would enable HW MJPEG decoding? From what I understand the situation to be, NXP releases the BSP first, then it’s pulled over into TorizonCore. Is that correct?

Would it be correct to assume that the i.MX6 modules have everything working and, given this stage, would be a better choice to get everything working?

Thank you!
Alexandru

gclaudino.tx · July 11, 2022, 6:26pm

Hi @alexxs!

Thanks for the feedback on our support. We’re always here to help! Please update us if you find something new from the NXP Community.

These packages that I mentioned should come by default on the BSP reference multimedia image (for instance gstreamer is already available - Video Encoding and Playback With GStreamer (Linux) | Toradex Developer Center) and you’d have less trouble setting this up. On TorizonCore, however, there are still some works being conducted on that topic.

For the first question, the short answer is yes. On TorizonCore, even if you add the libraries I mentioned to your container, this is not tested until the end. On our BSP this may depend if the chip is able to do so as stated in the reference manual or not. About the workflow, NXP releases their BSP and then we create our own BSP and TorizonCore based on that distro with lots of customizations to better adapt it to our end products. In the end, it works more like an “include”.

Indeed it’s a valid assumption from what concerns TorizonCore. Please note however that the i.MX6 have lower performance than the i.MX8 based-modules in general.

benjamin.tx · July 12, 2022, 2:28am

I used to try to encode the MJPEG stream from a USB camera into H.264 file but got the error with that. Here is the thread on NXP community. Unfortunately, I test it again on Linux BSP v5.6.0 today but still don’t get it to work. The jpegdec element is fine but it has low performance because it is CPU-based instead of VPU.

alexxs · July 12, 2022, 6:17am

Interesting. Did you try the v4l2jpegdec and v4l2videojpegdec elements?