Communication issues between CAN controllers

Hello,
in our device we are using 2 CAN controllers, one on the SoM verdin imx8m-mini, the other on our carrier board.
the controller on the carrier board is the same microchip mcp2518fd used on the SoM.

When we were using rev1.1B of the SoM both CAN controllers were using 2 crystal@20MHz as clocks. in this situation my loop test between the two CAN controllers is working fine (send a fixed packet from one controller to the other and parse the expected packet)

Recently we have started using rev1.1D of the SoM on which the crystal clock has been changed to 40MHz
I have changed the settings on the devicetree, I can communicate properly with devices connected to the CAN controller on the SoM, but my loop test with the CAN controller on the carrier board (which still uses a 20MHz crystal) is now failing,
I guess changing the clock frequency on the SoM has affected the bit-timing, but how do I accommodate the changing in the device tree?
these are the current settings for both controllers in my devicetree:

/* Verdin SPI_1 */
&ecspi2 {
	#address-cells = <1>;
	#size-cells = <0>;
	pinctrl-names = "default";
	pinctrl-0 = <&pinctrl_ecspi2>;
	cs-gpios = <&gpio5 13 GPIO_ACTIVE_LOW>;
	status = "okay";

	can3: can3@0 {
		compatible = "microchip,mcp2518fd";
		clocks = <&clk20m>;
		gpio-controller;
		interrupt-parent = <&gpio4>;
		interrupts = <14 IRQ_TYPE_LEVEL_LOW>;
		microchip,clock-allways-on;
		microchip,clock-out-div = <0>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_can3_int>;
		reg = <0>;
		spi-max-frequency = <8500000>;
		status ="okay";
	};
};

/* On-module CAN controller 1 & 2 */
&ecspi3 {
	#address-cells = <1>;
	#size-cells = <0>;
	cs-gpios = <&gpio5 25 GPIO_ACTIVE_LOW>,
		       <&gpio1 5 GPIO_ACTIVE_LOW>;
	/* This property is required, even if marked as obsolete in the doku */
	fsl,spi-num-chipselects = <2>;
	pinctrl-names = "default";
	pinctrl-0 = <&pinctrl_ecspi3>;
	status = "okay";

	can1: can@0 {
		compatible = "microchip,mcp2518fd";
		clocks = <&clk40m>;
		gpio-controller;
		interrupt-parent = <&gpio1>;
		interrupts = <6 IRQ_TYPE_LEVEL_LOW>;
		microchip,clock-allways-on;
		microchip,clock-out-div = <0>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_can1_int>;
		reg = <0>;
		spi-max-frequency = <8500000>;
		status ="okay";
	};

	can2: can@1 {
		compatible = "microchip,mcp2518fd";
		clocks = <&clk40m>;
		gpio-controller;
		interrupt-parent = <&gpio1>;
		interrupts = <7 IRQ_TYPE_LEVEL_LOW>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_can2_int>;
		reg = <1>;
		spi-max-frequency = <8500000>;
		/* not assembled */
		status = "disabled";
	};
};

any help will be highly appreciated!
Regards,
Rocco

ip -details link show can0 type can
should confirm clock specified in DT.
Do you have issues with arbitration bit rate or dat bit rate. For frames with BRS=1 tq and dtq should be the same.

this is top value for 20MHz. With 40MHz clock it can be 40/2 * 0.85 = 17MHz

Edward

Hello @RoccoBr,

Can you clarify the following topics:

  • Can you communicate with devices connected to the CAN controller on the carrier board?
  • How does your loop test fail? What error do you get?

Best regards,
Bruno

hi @Edward ,

ip -details link show can0 type can

shows actually that also the CAN controller on the carrier board has the clock set to 40MHz instead of 20MHz

$ ip -details link show can1 type can
5: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state ERROR-PASSIVE (berr-counter tx 128 rx 1) restart-ms 0
          bitrate 250000 sample-point 0.875
          tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
          mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
          mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
          clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

$ ip -details link show can2 type can
4: can2: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state ERROR-PASSIVE (berr-counter tx 0 rx 128) restart-ms 0
          bitrate 250000 sample-point 0.875
          tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
          mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
          mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
          clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

how is it possible? did I defined the external clock in the wrong way in the device tree?
here is how the clocks are defined:

	clk20m: oscillator {
		compatible = "fixed-clock";
		#clock-cells = <0>;
		clock-frequency = <20000000>;
	};
	
	clk40m: oscillator {
		compatible = "fixed-clock";
		#clock-cells = <0>;
		clock-frequency = <40000000>;
	};

Regards,
Rocco

I’ve just realized that labels &clk20m and &clk40m are pointing to nodes with the same name “oscillator”.

I have change the names like this

	clk20m: oscillator_20 {
		compatible = "fixed-clock";
		#clock-cells = <0>;
		clock-frequency = <20000000>;
	};
	
	clk40m: oscillator_40 {
		compatible = "fixed-clock";
		#clock-cells = <0>;
		clock-frequency = <40000000>;
	};

and now everything is working as expected

$ ip -details link show can1 type can
5: can_1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 250000 sample-point 0.875
          tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
          mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
          mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
          clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

$ ip -details link show can2 type can
4: can_2: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 250000 sample-point 0.875
          tq 50 prop-seg 34 phase-seg1 35 phase-seg2 10 sjw 1
          mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
          mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
          clock 20000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

This means your changes to .dts files didn’t end in DT. Perhaps dts files wasn’t patched by Yocto, old *.dtb is installed in target or any other similar issue.
As you see, since driver uses same 40MHz clock you see identical tq (time quantas) on both and the same amount of time quanta in all bit time segments.

Hm, since cat /sys/kernel/debug/clk/clk_summary reports fixed clocks by they name in DT, “oscillator” in your case, not reference name “clk40m”, then perhaps 2nd oscillator instance is just the same instance with overwritten clock-frequency? Please check your clk_summary, you should see two oscillator instances if both, 20 and 40MHz clocks are used on the same board. If it’s the case ( I’m not really sure and hope it’s not so), then dtb compiler clearly should not allow multiple reference names for the same object.

hi Edward ,
I’m using torizon-core builder, and I couldn’t see any messages, warnings or errors raised by it when parsing the device tree

Regards,
Rocco

Looks like it is nice timebomb for unexperienced DT programmers. No DT compiler warnings, no errors and a very good chance for hours of hair picking.

Code below gives resulting clock of 40M in single MCP2518FD instance, though at first glance 20M is used in code. clk_summary reports single “oscillator” instance, so one oscillator is overwritten by another oscillator. Interestingly, you can’t do something like `clk20m: clk40m: oscillator {};" but can cheat it on purpose if you want two names for the same thing.

/{
	clk20m: oscillator {
		compatible = "fixed-clock";
		#clock-cells = <0>;
		clock-frequency = <20000000>;
	};
	
	clk40m: oscillator {
		compatible = "fixed-clock";
		#clock-cells = <0>;
		clock-frequency =  <40000000>;
	};
};

&mcp2518fd {

    clocks = <&clk20m>;
};

Hello @RoccoBr and @Edward,

This behavior from TorizonCore Builder is not desirable and could lead to devicetree problems such as this one being missed.

To address this, I have submitted a feature request with the TorizonCore Builder team so that DT compiler warnings are shown if they happen.

Thanks for the productive discussion on this thread.

Best Regards,
Bruno

1 Like