After calling ifdown can1, FlexCAN driver continues to broadcast repeatedly the last unacknowledged message which causes our periperials to freeze in some cases. Is there a way how to force FlexCAN driver to stop sending messages completely if the can interface is down?
Apalis IMX8QM
Debian 10.12 w/ kernel from the "linux-toradex" repository ("toradex_5.4-2.3.x-imx" branch)
Default FlexCAN driver included with the kernel
CAN bus terminated from both sides (120 Ohm)
We have our own custom carrier board. Can device tree was not changed
Detailed description: The problem:
FlexCAN driver/controller enters an error state after invoking “ifdown can1” and rebooting the system without running “ifup can1” prior to the reboot
Before the reboot, an endless chain of unacknowledged messages is being generated while the interface is down. I presume that this is the expected behavior
After the reboot, sending a message and thus initializing the interface causes it to keep sending the same message (the oldest one in the queue) over and over again every 120 microseconds
This is the exact opposite of the behavior described above, as the unacknowledged messages are being generated while the interface is up, not down
All CAN communication on the interface is stuck in this state. No other messages can be sent over the bus (analyzed using saleae and an oscilloscope). The only visible message is the one being sent repeatedly.
This most likely causes some of our CAN-based peripherals to freeze (FEIG VEK S4C not even reacting to its physical mode/reset button)
Invoking “ifdown can1 && reboot” is enough to reproduce the issue
A software restart doesn’t fix the issue, one of the devices have to be powered off and back on (either the peripheral or the SBC)
I also tried using different kernel/FlexCAN driver versions
Performing “ifdown can1 && sleep 10 && ifup can1 && reboot” does generate unacknowledged messages, but after the reboot, the system works correctly
Potential issues:
There is a chain of unacknowledged messages before the system fully boots up and the network interfaces go up
This is the same as if “ifdown can1” was invoked
A short logic level jump occurs at the CAN N wire 60 milliseconds after shutdown (Low → High (~ 2.5ms) → Low)
The only related messages in “dmesg” are:
“flexcan 5a8e0000.can can1: bit-timing not yet defined”
“flexcan 5a8e0000.can can1: New error state: 1”
“flexcan 5a8e0000.can can1: New error state: 2”
I suspect that you don’t have pull up resistor on CANTX and perhaps your iMX8QM pulls CANTX down by default. This will put long dominant on the bus. Isn’t this your chain of unacknowledged messages? At least on your waveform I see long low level after startup, after reboot and before reboot. This is wrong and should be fixed.
Also, if you look at the module datasheet (https://docs.toradex.com/105526-apalis-imx8-datasheet.pdf), you can see that the pin reset state is marked as PD for SODIMM 14, for instance. Could you please have a look at it and check if this helps you with your issue?
Thanks again and thanks @Edward for another precious help.
Thanks both @Edward@gclaudino.tx for your comments and advices, I have sent a request to our HW designers to check this and I will let you know.
In the meanwhile, we have looked at the documentation and tried to set Pull (Field 6-5 to UP state) i.MX Device Tree Pinmux Settings Guide for System on Module
Before the state of the field was Down. And after changing to UP we did not see any progress.
Device tree won’t help solving missing pull up issue. If default pad pull direction is down, then it still will take quite a long to launch kernel and configure pins as instructed in DT. All that long time CANTX=0. Software vise you could try minimizing length of unwanted CANTX=0 on power up enabling pull up early in U-Boot, but this still won’t eliminate CANTX=0 pulse.