Only 8fps with linux drm fb grab on apalis imx8

We have an apalis imx8 and are trying to read the framebuffer continuously via the linux kernels drm API.
That works, but it’s slower than we’d expect. We get about 8 fps and expect something close to 60 fps (we would still be happy with ~20 fps, though).

We are open to try other methods of grabbing the display too, indeed, we tried several others like doing it via opengl, but we always seem to hit this ~100-120 MB/s throughput wall.

root@ARGUS-M0000:~# sh /tmp/tdx-info 

Software summary
Bootloader:               U-Boot
Kernel version:           5.15.129-6.3.0-devel+git.67c3153d20ff #1 SMP PREEMPT Wed Sep 27 12:30:36 UTC 2023
Kernel command line:      rootfstype=ext4 ro rootwait pci=nomsi panic=3 fsck.mode=force console=ttyLP1 consoleblank=0 no_console_suspend=1 vt.global_cursor_default=0 quiet loglevel=2 rauc.slot=B root=/dev/mmcblk0p4 ro rootwait console=tty1 console=ttyLP1,115200 consoleblank=0 earlycon
Distro name:              NAME="HMI-01"
Distro version:           VERSION_ID=0.2.3
Distro variant:           -
Hostname:                 ARGUS-M0000

Hardware info
HW model:                 Toradex Apalis iMX8QP V1.1 on Apalis Evaluation Board
Toradex version:          0049 V1.1C
Serial number:            12345678
Processor arch:           aarch64

As for the distro, we are running a custom yocto setup based on the BSP 6.4.0.

While getting the data (via DMA, drmPrimeHandleToFD) works, the bandwith is very limited at roughly ~100-120 MB/s.
This can be reproduced with stock software too, for example with ffmpeg and teh kmsgrab plugin:

ffmpeg -vsync 2 -f kmsgrab -i - -vf 'hwdownload,format=bgr0' /tmp/screen_%03d.png

We could not figure out what actually limits the bandwidth. Are we overlooking something we need to configure or is there a hard limitation from the hardware or the graphics driver?

Dear @Lars1,

Thank you for contacting us.
The downstream of the GPU might indeed not be fast enough. We will test that at our sight.
From what I can gather right now it might be useful to use mjpeg instead of png, since pngs tend to have a large overhead.

ffmpeg -vsync 2 -f kmsgrab -i - -vf 'hwdownload,format=bgr0' -f mpjpeg /tmp/screen.mp4

This might however not increase the framerate significantly though, since you are using gstreamer. In case you are using wayland/weston you could consider using grabbing, which might be more efficient:

Thanks for the answer.

It’s unlikely that changing the encoding will increase the performance here, because the problem is the fb grab call to the DRM subsytem itself. If I understand it correctly, the mpjpeg flag will only change the encoding that ffmpeg will do once it grabbed the frame. The raw calls to DRM (we made sure to do exactly the same thing as ffmpeg) are already limited to 8fps. To be more precise, the DRM calls themselves are fast, but they only setup a memory map. Actually memcpying out the bytes out of that map into another buffer is the slow operation. Comparing ffmpeg to grabbing it raw from DRM is identical in performance, which means the encoding cost is negligible.

The biggest mystery to me is why that memcpy from DRM is so much slower than a “normal” memcpy to and from CPU memory. Yes, we have to brigde the GPU<->CPU gap, but as far as I know GPU and CPU memory are located on the same chip. Indeed, we can control the allocation sizes at boot time. That’s why I would expect a throughput in roughly the same ballpark as CPU memory.

Also, I don’t understand how Wayland and Weston could be any faster, since they should also map to the same kernel APIs (either directly to DRM or indirectly via opengl and MESA).
We tried doing it via opengl calls, but the performance is identical to the DRM calls.
That being said, thanks for the suggestion. Trying it via Wayland is something we should evaluate.

Oh and thanks for the link to wcap, that seems really interesting.

Thank you for your reply @Lars1
In that case we will await your evaluation of Wayland.
Should further findings in this area arise, I will let you know.