iMX8 vpu dewarp


I have a gstreamer pipeline as follows:

v4l2src device=/dev/camera ! video/x-raw,width=1920,height=1080,framerate=30/1 ! videoconvert ! v4l2h264enc ! h264parse ! tee name=t ! queue ! matroskamux ! tcpserversink host= port=9001 t. ! queue ! rtph264pay ! udpsink host= port=5005

This pipeline works smoothly. However, I would like to dewarp the image, and I can do with by modifying the pipeline as follows:

v4l2src device=/dev/camera ! video/x-raw,width=1920,height=1080,framerate=8/1 ! videoconvert ! fisheyeptz ! videoflip video-direction=0 ! videoconvert ! v4l2h264enc ! h264parse ! tee name=t ! queue ! matroskamux ! tcpserversink host= port=9001 t. ! queue ! rtph264pay ! udpsink host= port=5005

When I add the fisheyeptz and videoflip elements, the stream really bogs down the SOM and becomes choppy, unstable, and very laggy (~20-30 second lag). With the former pipeline, it is not this way.

Is there any possibility to move the dewarp processing to the hardware (VPU) on the iMX8, or is that not possible? Any other thoughts?

@grh , the first pipeline uses basically elements optimized by the GPU and zero-copy strategies. On the second pipeline, as it implements plugins not optimized for the hardware, the elements spend a long time copying memory from one location to the other. If we consider that the elements process the frames on 1920x1080 pixels, on RGB 24-bits (3 bytes), @ 30fps, for each gstreamer sw element, the CPU will have to copy at least 186,624,000 bytes/second (192010803*30). So, most of the CPU processing will be spent doing memory copies. That’s the main reason the iMX SoCs implements “zero-copy” strategies, such as using DMA or dedicated image buses, so it will offload the CPU of the memory transfer task.
About the Dewarping process, it is processed by an ISP. i.MX8 does not have ISP hardware ( the new Verdin iM8M Plus has). Therefore, maybe the easiest solution would be either check if you can migrate to i.MX8M Plus, or to find a camera with an integrated ISP with dewarping functions or to use an external ISP hardware.

Thanks. Is there an API or some kind of tie-in to access the hardware for optimization in custom code? I seem to remember reading somewhere that this is not possible, but I can’t for the life of me figure out where I read that.

NXP provides documentation for the VPU and other elements of the SoC, so developers can try to make some optimization on the Linux driver level. Maybe an alternative would be to reduce the image resolution from 1080p to a smaller resolution and check if the performance is good enough.

Got it. Looks like there is an API to do some VPU programming using RPC calls. Not sure if a) we would be able to wrap our dewarp code with this and offload, and b) how much of a lift that would be for us.

From the NXP website:

iMX VPU Application Programming Interface Linux Reference Manual

@grh just for your reference here an NXP application note discussing dewrap for the Surround View application.

Thanks @daniel.tx

@denis.tx has highlighted the importance of zero-copy strategies, which makes sense from a programmatic perspective. In looking at the code we inherited, there is a memcpy to load a Gstreamer buffer into an OpenCV matrix frame, then it does some picture-in-picture stuff, then another memcpy to load from the OpenCV matrix back into the Gstreamer buffer.

That’s two memcpy’s per frame, which I’m sure is less than ideal…?

@grh , thanks for letting us know. As a reference, please, look the NXP’s gstreamer repo of zero-copy plugins: plugins - imx-gst1.0-plugin - i.MX Gstreamer 1.0 Plugin . You will probably need to create a custom de-warp gstreamer plugin for it.