CAN packets dropped if high bus load on Apalis iMX8 Boot2Qt image and working GUI

Hi, @jaski.tx

Ok, I wait.

I create bug report in Qt community.
It seems that you (or mine) have to update Linux Kernetl to get improved FlexCAN driver.
Unfortunately, I’m not able to build the Qt image based on Yocto layers from Qt because of build fails; Qt didn’t help me to build “Boot2Qt image for Apalis iMX8”.

Best regards,
Vitaliy

@Vitaliy Something else you might want to try is essentially disabling DVFS (dynamic voltage and frequency scaling). Changing frequency can lead to delays leading to package drops. I think the performance govenor should essentially disable all scaling (as the SoC will run on maximum frequency at all times). Make sure you have a good cooling solution!

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor 
ondemand
# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

@Vitaliy during the tests with the Qt demo UI running, did you receive kernel messages on the serial console? If yes, those could be the culprit, as those cause considerable amount of latencies/jitter. Can you try with only emergency messages enabled?

Hi @Vitaliy

I tried to reproduce the issue on my side.
I used the latest image.

Boot to Qt for Embedded Linux 3.0.2 b2qt-apalis-imx8 ttyLP1
root@b2qt-apalis-imx8:~# uname -a
Linux b2qt-apalis-imx8 4.14.159-0+git.fff496c2a1bd #1 SMP PREEMPT Tue Apr 14 17:50:14 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

I send with our USB CAN adapter RTR frames with the minimal specified inter frame spacing. Timing between frames is in 47.48 us confirmed by oscilloscope.

On the Apalis iMX8 the image is just booted, i.e. the Qt demo shows the available demos, the can0 NIC is up, no userspace application takes the CAN frames, so Socket CAN silently drops the frame on its highest layer. CAN setup with ‘ip link set can0 up type can bitrate 1000000’.

The can0 interface statistics shows all error counters at 0, i.e. no overruns in hardware, no dropped frames between driver and higher Socket CAN layers.

If I repeat the test, but take the frames to userspace with ‘candump > /dev/null’ the dropped counter counts up, maybe 10…50 frames per 1’000’000 frames received.

If I increase the candump’s priority and keeping the CPU’s clock at its highest frequency I do not see any dropped messages.

nice -n -20 candump can0 > /dev/null
echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo 1200000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
echo userspace > /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
echo 1596000 > /sys/devices/system/cpu/cpu4/cpufreq/scaling_setspeed

If I don’t send RTR frames as in the tests above, but actually send data frames with a 1 byte payload, I do not see any dropped frames without giving candump a higher priority or keeping the CPUs at their highest operating point.


What exact SW and HW version of things are you using?
Do you really have no user space application reading the received CAN frames?
What happens if you send CAN frames with a 1 byte payload instead of RTR frames?

Max

Hi, @max.tx
Today I setup my Apalis Evaluation Board v1.0A with SOM Apalis iMX8QM 4GB WB v1.0B. I connect a FullHD monitor through a DVI connector. Also I attach standard USB mouse and keyboard.
From TEZI, I flash new boot2qt demo image to SOM, it is the same as you wrote:

root@b2qt-apalis-imx8:~# uname -a
Linux b2qt-apalis-imx8 4.14.159-0+git.fff496c2a1bd #1 SMP PREEMPT Tue Apr 14 17:50:14 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

Just after flashing, I connect to apalis using ssh terminal, connect CAN cable to other node (which generate CAN traffic), setup CAN by command ‘ip link set can0 up type can bitrate 1000000’. The length of the CAN cable from Apalis Evaluation Board to RM48USB (the other CAN node) is 2.8 m. Also I soldered the termination resistor 120 Ohm between CANH and CANL pins in the connector X32.
After that, I start to generate CAN traffic with SFF 0x565 and 0 bytes of the payload on RM48USB.
After that, I start Qt E-Bike demo, do something in that demo program.
After that, I stop to generate CAN traffic on RM48USB and check the number of generated packets from RM48USB MCU: 1’358’009.
After that, I check status of the CAN bus in Apalis:

root@b2qt-apalis-imx8:~# ip -s -s -d -c link show can0
2: can0:  mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 
	  bitrate 1000000 sample-point 0.750 
	  tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1
	  flexcan: tseg1 2..64 tseg2 1..32 sjw 1..32 brp 1..1024 brp-inc 1
	  flexcan: dtseg1 1..39 dtseg2 1..8 dsjw 1..8 dbrp 1..1024 dbrp-inc 1
	  clock 40000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    0          1357924  0       85      0       0       
    RX errors: length   crc     frame   fifo    missed
               0        0       0       85      0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       1       
root@b2qt-apalis-imx8:~# 

RM48USB use SN65HVD232Q CAN transceiver. Apalis use ADUM1301+MCP2551. I suppose that it is all ok with hardware.

So, after short run of E-Bike demo, I’ve seen package lost without any userspace application working with CAN bus.

Is there something that I can do? I think that it is a good idea to move work with CAN bus to standalone Cortex-M4 core and use IPC to get the CAN data from one core to other.

Hello, @stefan.tx

Using your last command, I note that there are less errors.
After little stress-testing (using “ping -f” to flood on Ethernet and run Qt demo software), I send 5729317 packets and get only 5 errors:

root@b2qt-apalis-imx8:~# ip -s -s -d -c link show can0
2: can0:  mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 
	  bitrate 1000000 sample-point 0.750 
	  tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1
	  flexcan: tseg1 2..64 tseg2 1..32 sjw 1..32 brp 1..1024 brp-inc 1
	  flexcan: dtseg1 1..39 dtseg2 1..8 dsjw 1..8 dbrp 1..1024 dbrp-inc 1
	  clock 40000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    0          5729312  5       0       5       0       
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       1       
root@b2qt-apalis-imx8:~# 

There are no any additional messages in “dmesg” output. Should I connect with serial cable instead of ssh?

Hi, @max.tx

Other statistics after 22 minutes of running standard Qt demo without my interruption.
For this run, I add command “echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor” before generating CAN traffic.

root@b2qt-apalis-imx8:~# ip -s -s -d -c link show can0
2: can0:  mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 
	  bitrate 1000000 sample-point 0.750 
	  tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1
	  flexcan: tseg1 2..64 tseg2 1..32 sjw 1..32 brp 1..1024 brp-inc 1
	  flexcan: dtseg1 1..39 dtseg2 1..8 dsjw 1..8 dbrp 1..1024 dbrp-inc 1
	  clock 40000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    0          28885701 101     1729    101     0       
    RX errors: length   crc     frame   fifo    missed
               0        0       0       1729    0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       1       
root@b2qt-apalis-imx8:~# 

Total 28889042 packets send.
Regards,
Vitaliy

hi @Vitaliy

I don’t see any errors in the statistics. Did you implemented an error handler in your application?

Best regards,
Jaski

hi @jaski.tx What did you mean? I don’t have my application. I ran standard Qt demo program, without any usage of CAN bus. Regards, Vitaliy

Thanks for your Input. Is the issue solved now?

Best regards,
Jaski

Thank you for the clue. In our case, we were having CAN FD frame overruns with imx8qxp with SCFW debug enabled. Fine with SCFW debug UART disabled. Changing scaling_governor to performance worked around that issue too, though disabling debug was the actual fix.