Hi,
We have a customer that claims to be losing too many packets on CAN receive with Apalis T30. His first test was to send messages from can1 to can2:
ip link set dev can0 txqueuelen 1000 up type can bitrate 250000 loopback off listen-only off
ip link set dev can1 txqueuelen 1000 up type can bitrate 250000 loopback off listen-only off
candump can1 --error -o can1dump.txt -d
cansend can0 -i 0x01ff0008 -e -v --loop=100000 0x11 0x22 0x33 0x44 0x55 0x66 0x77 0x88
MSG=$(grep "0x01ff0008" can1dump.txt | wc -l)
ERR=$(grep "0x004" can1dump.txt | wc -l)
echo "Received $MSG can messages and $ERR errors"
It was noticed that writing candump to the tmpfs would prevent some messages to be lost, and the test was run again slightly modified to log to /tmp/can2dump.txt, running it 4 times with: 1) no strees; 2) cpu stress; 3) CPU and I/O stress; 4) CPU, I/O and VM stress. Please see the test script. The results were:
CPU, I/O e VM idle:
Test 1 - 99999 lines - 99998 sucess - 1 error
Test 2 - 100000 lines - 100000 sucess - 0 error
Test 3 - 100001 lines - 100000 sucess - 1 error
Test 4 - 100000 lines - 100000 sucess - 0 error
Test 5 - 100000 lines - 100000 sucess - 0 error
CPU com estresse:
Test 1 - 100000 lines - 100000 sucess - 0 error
Test 2 - 100000 lines - 100000 sucess - 0 error
Test 3 - 100013 lines - 99997 sucess - 16 error
Test 4 - 100014 lines - 99997 sucess - 17 error
Test 5 - 100014 lines - 99993 sucess - 21 error
CPU e I/O com estresse:
Test 1 - 100017 lines - 99950 sucess - 67 error
Test 2 - 100014 lines - 99954 sucess - 60 error
Test 3 - 99999 lines - 99931 sucess - 68 error
Test 4 - 100008 lines - 99903 sucess - 105 error
Test 5 - 100016 lines - 99952 sucess - 64 error
CPU, I/O e VM com estresse:
Test 1 - 100015 lines - 99928 sucess - 87 error
Test 2 - 100031 lines - 99935 sucess - 96 error
Test 3 - 100029 lines - 99920 sucess - 109 error
Test 4 - 100010 lines - 99917 sucess - 93 error
Test 5 - 100025 lines - 99897 sucess - 128 error
After that the customer rebuilt the kernel with the CAN driver as a module and claims that the kernel interrupt latency is responsible for messages lost. He changed the driver to work continuously with a 10us sleep while idle, thus reducing the messages lost from 30% to 0,072%, at the drawback of compromising the CPU.
Is there a better way to reduce the amount of messages being dropped that would not compromise CPU performance so heavily, such as adjusting a CAN buffer size?
Best regards