Can dropping packets Apalis T30

leo.tx · December 13, 2017, 7:06pm

Hi,

We have a customer that claims to be losing too many packets on CAN receive with Apalis T30. His first test was to send messages from can1 to can2:

ip link set dev can0 txqueuelen 1000 up type can bitrate 250000 loopback off listen-only off
ip link set dev can1 txqueuelen 1000 up type can bitrate 250000 loopback off listen-only off

candump can1 --error -o can1dump.txt -d
cansend can0 -i 0x01ff0008 -e -v --loop=100000 0x11 0x22 0x33 0x44 0x55 0x66 0x77 0x88

MSG=$(grep "0x01ff0008" can1dump.txt | wc -l)
ERR=$(grep "0x004" can1dump.txt | wc -l)
echo "Received $MSG can messages and $ERR errors"

It was noticed that writing candump to the tmpfs would prevent some messages to be lost, and the test was run again slightly modified to log to /tmp/can2dump.txt, running it 4 times with: 1) no strees; 2) cpu stress; 3) CPU and I/O stress; 4) CPU, I/O and VM stress. Please see the test script. The results were:

CPU, I/O e VM idle:
Test 1 - 99999 lines - 99998 sucess - 1 error
Test 2 - 100000 lines - 100000 sucess - 0 error
Test 3 - 100001 lines - 100000 sucess - 1 error
Test 4 - 100000 lines - 100000 sucess - 0 error
Test 5 - 100000 lines - 100000 sucess - 0 error

CPU com estresse:
Test 1 - 100000 lines - 100000 sucess - 0 error
Test 2 - 100000 lines - 100000 sucess - 0 error
Test 3 - 100013 lines - 99997 sucess - 16 error
Test 4 - 100014 lines - 99997 sucess - 17 error
Test 5 - 100014 lines - 99993 sucess - 21 error

CPU e I/O com estresse:
Test 1 - 100017 lines - 99950 sucess - 67 error
Test 2 - 100014 lines - 99954 sucess - 60 error
Test 3 - 99999 lines - 99931 sucess - 68 error
Test 4 - 100008 lines - 99903 sucess - 105 error
Test 5 - 100016 lines - 99952 sucess - 64 error

CPU, I/O e VM com estresse:
Test 1 - 100015 lines - 99928 sucess - 87 error
Test 2 - 100031 lines - 99935 sucess - 96 error
Test 3 - 100029 lines - 99920 sucess - 109 error
Test 4 - 100010 lines - 99917 sucess - 93 error
Test 5 - 100025 lines - 99897 sucess - 128 error

After that the customer rebuilt the kernel with the CAN driver as a module and claims that the kernel interrupt latency is responsible for messages lost. He changed the driver to work continuously with a 10us sleep while idle, thus reducing the messages lost from 30% to 0,072%, at the drawback of compromising the CPU.

Is there a better way to reduce the amount of messages being dropped that would not compromise CPU performance so heavily, such as adjusting a CAN buffer size?

Best regards

marcel.tx · December 14, 2017, 7:16am

Unfortunately the MCP2515 is kind of known for its too small on-chip FIFO buffer and there is nothing much one can do about it. However of course the bigger issue is NVIDIA’s suboptimal SPI peripheral driver which we just recently learned a few new tricks for (1, 2 and 3) albeit on TK1. I guess one could try to port such to T30 as well.

leo.tx · December 14, 2017, 10:40am

Thanks Marcel,

Do you think we can make this question public as a mock question, so I can forward to the customer?

marcel.tx · December 14, 2017, 11:00am

Sure, I reworded it a little. Now you may go ahead and publish it.