Apalis IMX8QM FlexCAN issue

mach · June 23, 2022, 2:30pm

After calling ifdown can1, FlexCAN driver continues to broadcast repeatedly the last unacknowledged message which causes our periperials to freeze in some cases. Is there a way how to force FlexCAN driver to stop sending messages completely if the can interface is down?

Apalis IMX8QM
Debian 10.12 w/ kernel from the "linux-toradex" repository ("toradex_5.4-2.3.x-imx" branch)
Default FlexCAN driver included with the kernel
CAN bus terminated from both sides (120 Ohm)

gclaudino.tx · June 23, 2022, 4:39pm

Dear @mach, how are you?

Welcome to our community ! Please feel free to roam around.

Can you please share with us more details about what you have done?

Are you using which exact Apalis iMX8QM Version?
Which BSP and Version are you using?
Are you using one of our carrier boards or a custom one? If it’s one of our boards, did you change the device tree?
What kind of tests have you performed before?

Best regards,

mach · June 27, 2022, 6:47am

Dear @gclaudino.tx
here are my answers:

Apalis iMX8QM ver. 1.1.b
We do not use specific BSP. After the installation of Debian 10.12, we use kernel from linux-toradex repo (linux-toradex.git - Linux kernel for Apalis, Colibri and Verdin modules).
We have our own custom carrier board. Can device tree was not changed

Detailed description:
The problem:

FlexCAN driver/controller enters an error state after invoking “ifdown can1” and rebooting the system without running “ifup can1” prior to the reboot
Before the reboot, an endless chain of unacknowledged messages is being generated while the interface is down. I presume that this is the expected behavior
After the reboot, sending a message and thus initializing the interface causes it to keep sending the same message (the oldest one in the queue) over and over again every 120 microseconds
This is the exact opposite of the behavior described above, as the unacknowledged messages are being generated while the interface is up, not down
All CAN communication on the interface is stuck in this state. No other messages can be sent over the bus (analyzed using saleae and an oscilloscope). The only visible message is the one being sent repeatedly.
This most likely causes some of our CAN-based peripherals to freeze (FEIG VEK S4C not even reacting to its physical mode/reset button)
Invoking “ifdown can1 && reboot” is enough to reproduce the issue
A software restart doesn’t fix the issue, one of the devices have to be powered off and back on (either the peripheral or the SBC)
I also tried using different kernel/FlexCAN driver versions
Performing “ifdown can1 && sleep 10 && ifup can1 && reboot” does generate unacknowledged messages, but after the reboot, the system works correctly

Potential issues:

There is a chain of unacknowledged messages before the system fully boots up and the network interfaces go up
This is the same as if “ifdown can1” was invoked
A short logic level jump occurs at the CAN N wire 60 milliseconds after shutdown (Low → High (~ 2.5ms) → Low)

The only related messages in “dmesg” are:
“flexcan 5a8e0000.can can1: bit-timing not yet defined”
“flexcan 5a8e0000.can can1: New error state: 1”
“flexcan 5a8e0000.can can1: New error state: 2”

Edward · June 27, 2022, 7:51am

Hello @mach,

I suspect that you don’t have pull up resistor on CANTX and perhaps your iMX8QM pulls CANTX down by default. This will put long dominant on the bus. Isn’t this your chain of unacknowledged messages? At least on your waveform I see long low level after startup, after reboot and before reboot. This is wrong and should be fixed.

gclaudino.tx · June 27, 2022, 4:35pm

Hi @mach, how are you?

As @Edward said, you should check that you have a pull-up resistor on the CAN TX as you can see on our carrier board design guide (https://docs.toradex.com/101123-apalis-arm-carrier-board-design-guide.pdf).
You can see on the following screenshot the existence of a 3.3V Pull-up Resistance on the CAN1_TX

Also, if you look at the module datasheet (https://docs.toradex.com/105526-apalis-imx8-datasheet.pdf), you can see that the pin reset state is marked as PD for SODIMM 14, for instance. Could you please have a look at it and check if this helps you with your issue?

Thanks again and thanks @Edward for another precious help.

Best regards,

mach · June 28, 2022, 6:32am

Thanks both @Edward @gclaudino.tx for your comments and advices, I have sent a request to our HW designers to check this and I will let you know.

In the meanwhile, we have looked at the documentation and tried to set Pull (Field 6-5 to UP state) i.MX Device Tree Pinmux Settings Guide for System on Module
Before the state of the field was Down. And after changing to UP we did not see any progress.

Best regards,
Vaclav

Edward · June 28, 2022, 7:10am

Device tree won’t help solving missing pull up issue. If default pad pull direction is down, then it still will take quite a long to launch kernel and configure pins as instructed in DT. All that long time CANTX=0. Software vise you could try minimizing length of unwanted CANTX=0 on power up enabling pull up early in U-Boot, but this still won’t eliminate CANTX=0 pulse.

gclaudino.tx · July 7, 2022, 11:23am

Dear @mach,

Do you have any news on this topic?

mach · July 18, 2022, 1:38pm

Hello,
thank you for sharing all ideas, we have added the pull up resistor on CANTX and the problem is solved.
Best regards!

gclaudino.tx · July 18, 2022, 1:44pm

Hi @mach,

I’m glad it’s working! Also, thanks again @Edward for sharing your suggestion.

Best regards,