Performance issues after BSP Update

Greetings everyone

We are using the Toradex Colibri modules as a platform for a product.
In recent years we have already tried to update to newer BSP versions without success.
Now we are at the time that we need that update for security and feature reasons.

This product uses 2 displays for showing the HMI on an angularJS based single-page web app.

BSP:
In general the BSP is based on the MinimalConsole image from Toradex.
For controlling the Display, EGLFS on Framebuffers is used.
As browsers we are using the QtWebengine.

Problem:
We have now updated from Toradex version 2.6b to 5.7.0.
We managed to include and update all our software and to run the HMI software.
Unfortunately we are now experiencing major issues regarding performance.
It seems no matter what we try, the new BSP is never as snappy as the old one especially when it comes to showing animations on the page.
The octane 2.0 benchmark from Google lets us assume that something is wrong with the memory or garbage collection on the new BSP/QtWebEngine.
We would be glad for any help or pointers any of you could give us.

Additional:
It is interesting that we can see all the arguments of the main-process in its child process (QtWebEngineProcess) in htop. On the new BSP this is not the case anymore. Has there been a major change?

Specification Hardware:
Toradex Colibri iMX6DL 512MB IT
CMA Size 192MB
2x 1280x800 Touch-Displays

Specification old BSP:
Toradex BSP Version: 2.6b2 (Linux Kernel 3.14)
Kernelargs:

enable_wait_mode=off galcore.contiguousSize=0x7000000 galcore.physSize=0xE600000 galcore.showArgs=1 cma=192M consoleblank=0 no_console_suspend=1 console=ttymxc0,115200n8 video=mxcfb0:dev=lcd,AMPIRE-WXGA,if=RGB24,fbpix=BGR32 video=mxcfb1:dev=hdmi,1280x800M@60,if=RGB24,fbpix=BGR32 fbmem=8M,8M mxc_hdmi.only_cea=0 vt.global_cursor_default=0 root=/dev/mmcblk0p2 rw,noatime rootfstype=ext3 rootwait ip=off

QtWebEngine 5.6 (Chromium Version 45):
Additional video hardware acceleration patch from O.S. Systems O.S. Systems Software LTDA. · GitHub

Env variables for Browser:
QT_QPA_EGLFS_HIDECURSOR=1
QT_QPA_EGLFS_FB=/dev/fb0 // or fb2
QT_QPA_EVDEV_TOUCHSCREEN_PARAMETERS=$TOUCH_1 // or 2. path to touchscreen device
QT_QPA_EGLFS_PHYSICAL_WIDTH=150
QT_QPA_EGLFS_PHYSICAL_HEIGHT=150
Args for Browsers:
    --ignore-gpu-blacklist
    --force-gpu-mem-available-mb=48
    --disable-low-res-tiling
    --js-flags=\"--expose-gc\"
Args for QtWebEngineProcess: (Found on page chrome://gpu)
    --no-sandbox
    --enable-delegated-renderer
    --enable-threaded-compositing
    --in-process-gpu
    --enable-overlay-scrollbar
    --enable-pinch
    --enable-viewport
    --enable-viewport-meta
    --main-frame-resizes-are-orientation-changes
    --disable-gpu-shader-disk-cache
    --disable-canvas-aa
    --disable-composited-antialiasing
    --profiler-timing=0
    --use-gl=egl
    --disable-gpu-watchdog
    --use-gl=egl
    --supports-dual-gpus=false
    --gpu-driver-bug-workarounds=2,45,57
    --gpu-vendor-id=0x0000
    --gpu-device-id=0x0000
    --gpu-driver-vendor
    --gpu-driver-version

Specification new BSP:
Toradex BSP Version: 5.7.0 (Linux Kernel 5.4)
Kernelargs:

enable_wait_mode=off galcore.contiguousSize=0x7000000 root=PARTUUID=08fe777d-02 rw rootfstype=ext4 rootwait consoleblank=0 no_console_suspend=1 console=tty1 console=ttymxc0,115200n8 video=mxcfb0:dev=lcd,AMPIRE-WXGA,if=RGB24,fbpix=BGR32 video=mxcfb1:dev=hdmi,1280x800M@60,if=RGB24,fbpix=BGR32 fbmem=8M,8M mxc_hdmi.only_cea=0 vt.global_cursor_default=0

QtWebEngine 5.14 (Chromium Version 77):

Env variables for Browsers:
QT_QPA_EGLFS_HIDECURSOR=1
QT_QPA_EGLFS_FB=/dev/fb0 // or fb2
QT_QPA_EVDEV_TOUCHSCREEN_PARAMETERS=$TOUCH_1 // or 2. path to touchscreen device
QT_QPA_EGLFS_PHYSICAL_WIDTH=150
QT_QPA_EGLFS_PHYSICAL_HEIGHT=150
QT_QPA_PLATFORM=eglfs
QT_QPA_EGLFS_INTEGRATION=eglfs_viv
Args for Browser:
    --no-sandbox
    --ignore-gpu-blacklist
    --force-gpu-mem-available-mb=48
    --disable-low-res-tiling
    --js-flags=\"--expose-gc\"
Args for QtWebEngineProcess: (Found on page chrome://gpu)
    --disable-setuid-sandbox
    --enable-threaded-compositing
    --disable-zero-copy
    --disable-gpu-memory-buffer-compositor-resources
    --disable-gpu-memory-buffer-video-frames
    --enable-viewport
    --main-frame-resizes-are-orientation-changes
    --disable-composited-antialiasing
    --disable-features=MojoVideoCapture,NetworkServiceNotSupported,BlinkGenPropertyTrees,BackgroundFetch,OriginTrials,SmsReceiver,WebAuthentication,WebAuthenticationCable,WebPayments,WebUSB
    --enable-features=AllowContentInitiatedDataUrlNavigations,TracingServiceInProcess,OverlayScrollbar
    --use-gl=egl
    --in-process-gpu
    --gpu-preferences=KAAAAAAAAAAoAACAAAAAAwAAYAAAAAAAEAAAAAAAAAAAAAAAAAAAAAgAAAAAAAAA
    --use-gl=egl

Screenshots and additional information

Screenshots old BSP



Screenshots new BSP



Greetings Pascal

1 Like

Hello Pascal,

Thanks for reaching out to us. I would like to ask you a few more questions regarding your project:

What version of the Colibri iMX6DL are you using? Did you try with the exact same module on both the old and new versions of the BSPs?

The MinimalConsole is a recipe from BSP 2. For BSP 5 in order to create your image did you port MinimalConsole from BSP 2 or did you start over based on the recipes of the reference images of BSP 5?

Also, have you researched possible optimizations you could do on QtWebEngine? Perhaps you might get some help from the maintainers of the meta-qt5 layer repository: GitHub - meta-qt5/meta-qt5: QT5 layer for openembedded .

Besides the browser, do you face any other performance issues when comparing BSP 5 with BSP 2?

This topic is quite complex to answer right away. So please allow us some time to do some research internally. In the meantime, if you have any updates, please let us know.

Hello Rudhi

Thank you for your reply

We are using the hardware version 1.0b for both BSPs.
We have also 1.1a modules in stock which needs to be tested when the BSP 5 creation is finished.

We have developed our original layer on the MinimalConsole image on BSP 2.

For the BSP 5 we rebuilt our layer-structure. with examples from Toradex. (Distro / Machine / Image)
The basis for our image is the tdx-reference-minimal-image.

As distro we included the tdx-base.inc with following features:

DISTRO_FEATURES_remove = "wayland x11 vulkan directfb bluetooth pcmcia usbgadget wifi nfs zeroconf pci 3g nfc bluez5 pam  gobject-introspection-data alsa pulseaudio"
IMX_DEFAULT_BSP = "nxp"
DISTRO_FEATURES_BACKFILL_CONSIDERED = "gobject-introspection-data"

As for the machine we used the colibri-imx6.conf from meta-freescale-3rdparty as basis for our own machine.conf
Notable changes:

MACHINE_FEATURES += "screen usbhost vfat ext2 alsa touchscreen"
MACHINE_FEATURES_remove = "usbgadget"

For the devicetree we used the imx6dl-colibri-eval-v3.dtb as basis and ported our changes from BSP 2.6.

Currently we have not yet tested the performance of other applications.
We only observe the issues on the (Web-) GUI.
As a next step we could build some synthetic test and benchmarking tools for both BSPs and compare them.

We have now performed some benchmarks with sysbench tool.

It consists of cpu, memory and file I/O tests.
For the CPU test, nothing unusual was detected.
The numbers are nearly the identical (228s vs 230s for 10000 prime numbers)

The file I/O test shows the picture that the performance of the new BSP is significant better.
(1.37GB vs 2.42GB transferred in 5min)

The memory test shows different values for various blocksizes.
While bigger blocks are performing better on BSP 5, smaller blocks performed better on BSP 2.
I have added a small graph as well as the results of the memory tests.


sysbench_memory_bsp2_results.txt (10.7 KB)
sysbench_memory_bsp5_results.txt (10.7 KB)

We have now built and installed the cinematic experience example.
Both BSPs are working very smooth.
This indicates further that the QtWebEngine is making problems.

Hello @cicor.dev,

Thanks for your update and for sending us your test results. As you have already figured out, the CPU test results show no issues. The memory performance on both the BSPs is good although BSP 5 shows a lower performance on smaller blocks. I am not sure that the memory performance difference caused the browser benchmark to be that different between BSP 2 and BSP 5.
I would agree with you that the main reason for the browser benchmark result difference is most probably the distant browser versions themselves.
In this case, if you like, we could recommend you to a partner who will be able to help you with the browser for your specific use case.

Hello @rudhi.tx

We are now testing and comparing what might be the cause.
We have enabled and disabled some graphic/gpu features.

Further tests with graphic features:
For this test we also installed a simple single page application found here:
Single Page App
We have also extended the transition period to 4s to be able to observe the animation longer.

I can only describe what happens:

BSP 2:
Clicking through the pages results in a smooth transition animation.
Sure they probably are not animated in 60Hz, but enough to see that it is actually an animation.
And there are some light drops of the framerate

BSP 5:
Same features as in BSP 2:
Clicking through the pages results in a smooth transition animation at first.
But the light drops of the framerate are very noticable and the animation stands and jumps a longer distance.

--enabled-gpu-rasterization
Clicking through the pages works as follows:
it takes a few milliseconds to register that something is clicked
The animation seems to start in the middle and with more freezes and jumps as before.

--disable-viz-display-compositor
Nearly identical behavior as with the same flags as when enabled

--disable-accelerated-2d-canvas
Nearly identical behavior as with the same flags as when enabled
However it seems to have a little positive effect on the performance of BSP 5.
But nowhere near as on BSP 2

Scaling governors:
What we forgot to mention in the initial post are the scaling governors.
On BSP 5 we have switched from conservative to performance.
On BSP 2 we are running the interactive governor.

Unfortunately we do not have the same scaling governors on the new BSP.
Maybe the interactive governor is exactly there to prevent those issues about framerate drops?

Additional help:
Yes we would appreciate if you can recommend us your partner regarding browser specific problems.

Best regards
Pascal

Hey @cicor.dev,

Thanks again for sharing your intensive test results with us. We are currently discussing your performance issue here internally. Before recommending you to our partner, I would like to ask you for your patience until the mid of next week. Does that work for you?

Hello @rudhi.tx

This works for us.

Also a little update on our side.

We searched and compared the chromium sources on both BSPs for “imx” strings.
On BSP 2 we found lots of references, probably due to our patch.
On BSP 5 we found nothing imx relevant.

This lets us investigate why that is.
We are currently reviewing patches and ports from different sources about enabling imx vpu support in chromium.
Figuring out if those are still necessary and still possible to use and try out.

chromium-imx project
On our BSP 2 we have added this patch along with a hack to actually enable it for chromium 45.
It seems that up until chromium 66(?) or 53 this patch was working according to this email.
https://www.yoctoproject.org/pipermail/meta-freescale/2018-June/022604.html

An attempt was made for chromium 72 to include imx vpu support, but without success.
https://community.nxp.com/t5/i-MX-Processors/Chromium-browser-72-i-MX6-HW-acceleration/m-p/1066545

I will let you know, when we find out more about this and if we get it to work.

Best regards
Pascal

Hello @cicor.dev,

As promised, we have discussed this issue internally. There might be several possibilities why the performance is lower in general and it is hard to point you to one “answer”. Here are some suggestions:

  1. The newer kernels have some new security features which might cause some delays. One option could be to try to disable all these new features by turning the mitigations off in the U-Boot. You’ll have to set it like this in the U-Boot:
    setenv tdxargs mitigations=off
    Here you’ll find a discussion related to that: performance - Disable Spectre and Meltdown mitigations - Unix & Linux Stack Exchange

  2. You could try to shrink the kernel in general by removing all debug-related stuff. This can sometimes help to speed up context switching (which could be the cause for the slow garbage collection).

  3. Another idea could be to reduce the niceness level of your application so that it gets more CPU processing time:
    nice (Unix) - Wikipedia

  4. Another thing you could try to check if the performance is better is if you use the upstream Kernel (e.g. with BSP6).

The performance could go down in general when you use HTML5 for your UI. So it gets a little tricky to say exactly what causes the problem. However, please let us know if you try out the suggested workarounds and if they could improve the situation.

Hello @cicor.dev,

I hope you are doing well. Did you already have some time to work on this again or to experiment with the workarounds I suggested before? I’m curious to hear about your results. Also, please let me know if you need more support in this regard.

Hello @rudhi.tx

Status on our site.

  1. Tried out without any impact.
  2. We have not yet tried.
  3. We have tried that on earlier versions with BSP 5.1.0 and 5.5.0.
  4. We first tried something with the newest Torizon platform 6 and with a browser container without any performance improvements. Then we also tried to migrate our layer to BSP 6. Unfortunately we were not able to make our DTS compatible with the new mainline BSP 6.

Currently we are in contact with O.S. Systems again.

We will let you know, if we have new knowledge about this issue.