Colibri IMX8X receiving some CAN messages but not others

Hi,

I have a very confusing issue and I was wondering if anyone else has seen anything similar.

We have a custom Colibri carrier board that for years has used the Colibri IMX6. I was doing some testing recently with the Colibri IMX8X on the same board for potential future upgrades. Our board has SODIMM pins 55 and 63 go to an external CAN transceiver that talks to a variety of devices.

With the IMX6 installed, it can talk to two test devices fine. With the IMX8 installed, it will only receive data from one of the two devices. I have scoped out SODIMM pin 63 and also used a CAN analyzer and have seen that both external devices are definitely sending, but only one of the two will show any increase in RX packets on ifconfig can0. Both external test devices send a large variety of messages, and the IMX8 will receive all messages from A but zero from B.

Trace of device A sending (reads fine in linux) at SODIMM pin 63:

Trace of device B sending (never reads in linux) at SODIMM pin 63:

Note on the latter image I have a can analyzer attached as well, since without the analyzer running the IMX will not pull low the ACK bit (though it will ACK for device A), but with the analyzer attached the analyzer itself will pull low the ACK bit.

In the second screenshot we can see there is clearly valid CAN data on pin 63 but for some reason the driver is not receiving it. In ifconfig the RX packets, bytes received, errors, and dropped frames all never increase with the second device B attached. I see in the IMX8 reference manual that there is some option for peripheral-level filtering, but I don’t see why this would be turned on with a reference image, and I’m not sure how to check. As seen, at a low level the working messages are very similar to the non-working messages.

HW version: IMX8QXP 2GB IT V1.0D
Distro version: 6.4.0
Image derived from “tdx-reference-multimedia-image”

I did some further testing with the CAN analyzer and found that the IMX8X would receive the same message content as was sent by the ‘bad’ device if the analyzer sent it, so this must be a hardware/baud issue on the remote device and not a filtering issue on the IMX.

I saw that the ‘bad’ device was running the bus at a slightly higher rate, and sure enough, I set the IMX8X can from 250 khz to 253 khz and it started to read the messages. The frequency tolerance must be very narrow.

A final note in case people in the future have similar issues with CAN bit timing.

The CAN protocol divides a bit time into four segments: SYNC, PROP, PHASE1, PHASE2. The edge is supposed to fall within the SYNC segment, otherwise it can try to re-synchronize depending on how far off it is.

On the IMX8 if you set the can bitrate to 250,000 the timings will automatically become:
Tq (time quanta) 50 ns
Prop 37 tq
Phase1 32 tq
Phase2 10 tq

Sync is required to be 1 tq based on the CAN standard, so the total tq is 80 tq = 4000ns aka 250khz as expected. However in this case the tq is so short it makes the expected angle window very narrow.

I checked on a imx6 and it was using a default tq of 266ns for the same 250khz bitrate. The latter three segments were similar but it made the sync segment much longer.

On the IMX8 by bringing up the CAN with this command:

ip link set can0 up type can tq 250 prop-seg 7 phase-seg1 6 phase-seg2 2

we get similar segments to the original example but a much longer sync segment, and still a bitrate of 250 khz. With this setup, I found that the various devices I have all communicate fine with the board even with some 1.5% variances in bitrate.

The width of SYNC segment doesn’t matter. Sampling point may matter more, especially on relatively longer buses. Your devices still need to have clock mismatch much better than 1%. You’d better check your device bitrate using scope. Let it sending something alone to the bus, so no other nodes are ACKing nor sending anything. Minimum pulse width you see is your devices CAN bit time.
socketcan not always is enough to be confident about clock. Specifying Tq in integer amount of ns is not always enough. Several kernel major versions ago I got FlexCAN clocked from odd clock of 83.3 MHz or something like that, which resulted in <1% but nasty bit timing error. Patching driver to even 24MHz crystal clock fixed the issue. Physical timing measurement allows to check which device is lying.

I did check the device bitrates and found that one was running fast- about 252khz instead of 250khz. Fixing the device bitrate did allow the IMX to read messages fine with either Tq. Still, having Tq longer gives more margin and causes the IMX to work when the device’s baud is a bit off, even though ideally it shouldn’t be necessary. Technically according to CAN spec the prop and phase1 segments aren’t supposed to be over 8 tq, which they are with the kernel-assigned timings.

Longer Tq doesn’t give longer margin. Take any decent bit timing calculator and see.