i.MX7D spidev

We have an issue with writing data through the SPI port using spidev that has been exposed to userspace by modifying the device tree.

We have some software that is writing two blocks of data out to an FPGA using SPI port 2 that is exposed to user space as spidev1.0. I can see that spidev1.0 exists in the /dev directory in embedded Linux as it should. The way this works is, initially, we have two blocks of data that is written out through the SPI port. Once this is done the FPGA will then toggle GPIO line 128 and that generates an interrupt. We have an interrupt handler in place that uses the embedded Linux poll() function to wait for the interrupt to occur and then send a block of data out to the FPGA. This happens on each interrupt.

When we run our software, and using a logic analyzer, we see one block of data is sent out to the FPGA using SPI port 3 as it should, but it takes a long period before the second block is written. Also, the interrupt handler detects an interrupt and sends out a block of data as it should, but then nothing else seems to happen, and occasionally the embedded Linus OS locks up and we have to power cycle our device.

Here is how we have the device tree setup to use SPI port 2 and SPI port 3 from user space:

&ecspi3 {
	status = "okay";

	mcp258x0: mcp258x@0 {
		compatible = "microchip,mcp2515";
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_can_int>;
		reg = <0>;
		clocks = <&clk16m>;
		interrupt-parent = <&gpio5>;
		interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
		spi-max-frequency = <10000000>;
		
		/* 4/26/2017 KJC - To enable the CAN bus device driver set: status = "okay".  To 
		 *	           disable the CAN bus device driver set: status = "disabled".
		 */
		/* status = "okay"; */
		status = "disabled";
		
	};

	spidev1: spidev@1 {
		compatible = "toradex,evalspi";
		reg = <0>;
		spi-max-frequency = <23000000>;

		/* 4/26/2017 KJC - To enable the SPI device driver set: status = "okay".  To disable
		                   the SPI device driver set: status = "disabled".
 		 */
		/* status = "disabled"; */
		status = "okay";
	};
};

&ecspi2 {
	status = "okay";

	spidev0: spidev@0 {
		compatible = "toradex,evalspi";
		reg = <0>;
		spi-max-frequency = <23000000>;
		status = "okay";
	};
};

  
/* Colibri SPI */
&ecspi2 {
	fsl,spi-num-chipselects = <1>;
	cs-gpios = <&gpio7 9 GPIO_ACTIVE_HIGH>;
	pinctrl-names = "default";
	pinctrl-0 = <&pinctrl_ecspi2 &pinctrl_ecspi2_cs>;
};

&iomuxc {
	imx7d-p1100 {

		pinctrl_ecspi2_cs: ecspi2_cs_grp {
			fsl,pins = <
				MX7D_PAD_ENET1_RGMII_TD3__GPIO7_IO9	0x14 // 194  
			>;
		};

		pinctrl_ecspi2: ecspi2grp {
			fsl,pins = <
				MX7D_PAD_ENET1_RGMII_TD2__ECSPI2_MISO	0x2  // 196
				MX7D_PAD_ENET1_RGMII_RD3__ECSPI2_MOSI	0x2  // 55
				MX7D_PAD_ENET1_RGMII_RD2__ECSPI2_SCLK	0x2  // 63
			>;
		};
	};
};

SPI port 3’s GPIO lines are already configured in the device tree that we received from Toradex so we just disabled: mcp258x0: mcp258x@0 and enabled: spidev1: spidev@1 as seen in the device tree ecspi3 device node above

When running our software we do see the following serial output:

[   32.391557] spi_master spi1: I/O Error in DMA RX:dc
[   32.396519] spi_master spi1: failed to transfer one message from queue

We don’t understand the above since these are error messages that are located in the spi-imx.c file. We can see in the imx7s.dtsi file that ecspi3 node’s compatible property is set to: "fsl,imx7d-ecspi", "fsl,imx6sx-ecspi", "fsl,imx51-ecspi" which would indicate that SPI port 3 uses the spi dma device driver built using the spi-imx.c file, but the spidev1: spidev@1' node has a compatible’ property of: “toradex,evalspi” and that is defined in the spidev.c file.

So, when accessing spidev from user space shouldn’t that be accessing the non-dma device driver because of the compatibility setting?

If that is not the case then how would we use the non-dma SPI device driver since we are not configuring or using DMA to transfer data to our FPGA?

From what I’m seeing, and the errors that are generated, the issue is that the wrong DMA device driver is being used to transfer data, but maybe I’m not fully understanding what is going on and something else is the cause of our problem.

Thank You

spidev is the generic user space interface to the SPI driver in the kernel. toradex,evalspi is used to activate spidev while the compatible ecspi property would activate the iMX SPI driver specific to iMX7. spidev will use the iMX SPI kernel driver which does use DMA. What exact version of Linux image are you using? What is the output of cat /etc/issue and uname -a?

Hi sanchayan.tx,

Thanks for the response and clearing up things up regarding the device driver being used.

Here is the output that I see from cat /etc/issue:

The Angstrom Distribution \n \l

Angstrom v2016.12 - Kernel \r

Colibri-iMX7_LXDE-Image 2.7.2 20170428

root@colibri-imx7:~#

and here is the output from uname - a

Linux colibri-imx7 4.1.39-svn150 #5 SMP Wed Jun 28 20:51:51 EDT 2017 armv7l GNU/Linux

Thank You.

Also,

I see the following serial terminal output when the OS is loading for the two spi ports we are using:

[    1.157665] spi_imx 30830000.ecspi: probed
[    1.164544] spi_imx 30840000.ecspi: probed

I do not see any error messages. Is there any additional output I should expect to see related to the spi port when the OS boots up?

Thank You.

The driver looks to be correctly probed. Can you share some small independent test code which I can use to reproduce the issue at my end? It will help me to trace and fix the issue more quickly.

Hello sanchayan.tx,

Sorry for the late response. I’ve been busy working on other things, but I was finally able to create a small QtWidget test application in QtCreator which simulates how we’re using the SPI port to write data to an FPGA whenever an FPGA interrupt occurs on GPIO line 63 as you can see in the code. The interrupt will occur on the GPIO line at a maximum rate of 200 microseconds. If we write a buffer that is 240 words, or 480 bytes in size, we can see that it takes a large amount of time, around 3 seconds, between SPI writes. We also see continuous error messages displayed from the serial terminal as stated above:

spi_master spi1: I/O Error in DMA RX:dc
spi_master spi1: failed to transfer one message from queue

But, if we use a buffer size for the SPI write that is 198 words, or 396 bytes, the write works as it should, and we see interrupts occurring at a rate of around 13.5 milliseconds.
The longer time between interrupts is due to the fact that the FPGA FIFO is receiving the data it needs at the proper rate and does not need to make interrupt requests for data as often. I do occasionally see the error output above from the SPI driver only once on startup, but the data output from the SPI port seems to be correct.

To test the above you can change the value of the line:

static const uint32_t bufferSize = 240;

in the file: interrupt.h to 198 and 240.

I have attached the QtCreator project and the source files so that you can use them to run the test. If there is anything else you need please let me know.

Thank You.

Interrupt-Test

Just running the application you provide reproduces the issue easily. I will look into this and get back to you once I have an update.

@Gage05 It seems the problem you report has been fixed in mainline kernel atleast. A quick way to try would be to replace the spi-imx.c SPI driver with the one from here and check. Can you please also check this once at your end and confirm that this fixes the issue for you? I checked at my end using the test code you provided and the mainline kernel driver does not trigger the RX DMA error. In the meanwhile, I will be looking into which exact patches we need to backport to fix this. The issue is probably related to the DMA timeout being hardcoded in the downstream NXP kernel while the mainline kernel calculates this dynamically.

Hello sanchayan.tx,

Thank you for this! I got the new spi-imx.c device driver code from the link that you provided, built it into our embedded Linux OS, and updated the Colibri i.MX7D module.

From my initial testing it seems to be working well. We updated one of the more complete test setups that we have and will check things out to see if everything is good there.

We’ll let you know how that goes.

Thanks Again! :).

From my initial testing it seems to be working well.

Glad to hear that.

We’ll let you know how that goes.

Yes, please do let us know if your testing goes well or otherwise. I will then arrange for all the backported patches to be pushed to our -next branch.

@Gage05 Did you observed any further anomalies with your testing of the SPI driver taken from mainline? If not, then I will arrange for all the backported patches to be pushed.

Hi sanchayan.tx,

Thanks for responding and checking in on the state of things.

Actually, I tried the driver code that you linked to this post above. When I built it into the embedded Linux OS and performed my initial testing it appeared to be working great and the issue no longer occurred. I could specify any buffer size and the long delays we were seeing in writing data out using SPI were gone.

However, when this new driver was used with our device’s application, problems occurred.

We saw that SPI communication worked fine for a time, but after a while everything became desynchronized and the communication no longer worked. When viewing the SPI clock line, and MOSI data line on a logic analyzer these lines were not doing what they were supposed to. Power cycling the device corrected the problem, but the desynchronization would eventually occur every time while using the SPI port.

I reverted back to using the original SPI device driver code that we have and it works better. The only issue that we’re having is that we occasionally see a missed interrupt. That missed interrupt causes a discontinuity in our output from the SPI port data, but it’s not something that will be noticed by the end user.

So, for now the original SPI device driver works for our needs, but we will likely need to figure out what is going on in the future when we have more time to investigate the issue. I just haven’t had time to sit down and compare the files to see what might be going on.

There’s one thing that I did notice between the original spi-imc.c file contents and the contents that you provided in the link. Our original spi-imx.c file contains the dates: 2004-2007 AND 2016 at the top of the file in the comments section. The spi-imx.c file contents that you provided in the link only contains the date range 2004-2007 but NOT 2016.

Is the code you provided an older version of the SPI device driver?

Thank You.

The spi-imx.c file contents that you provided in the link only contains the date range 2004-2007 but NOT 2016.

That’s only the Freescale copyright line. The link was to the latest spi-imx driver in mainline.

Well, it turns out that the issue with the current SPI device driver is a problem for us. With the current SPI device driver interrupts are occasionally missed and the output we see from the SPI port is not correct as a result. There are inconsistencies in the output.

I’m currently looking at the source files for the original SPI device driver code and the new code that you provided in spi_imx.c in an attempt to determine why this problem is occurring.

With the original SPI device driver code that we have everything seems to be working as it should, but occasionally it appears that the interrupt received is not serviced.

Do you have any idea why this might be the case?

What is the CPU load when your application runs? At least with the mainline kernel driver I was not able to reproduce the issue with the test Qt application you had provided or spidev test code. Can you also try by disabling DMA by adding the below to ecspi3 node in your carrier board dts file?

dma-names  =  " ", " ";

Can you also check with 8 bit mode for transfers? The issue seems to be more prevalent in 16 bit mode.

Do you have any idea why this might be the case?

Sorry, but not at the moment. We will have to reproduce it reliably and investigate.

sanchayan.tx,

Thank you for these suggestions. I will try them and get back with you on the results.

The new SPI device driver code is better in one way. It allows us to choose buffer transfer sizes that, before, were causing issues. Now these buffer transfer sizes seem to work better and there is no longer a pause between writing out sections of the buffer over the SPI port. We still have the occasional unserviced interrupt signal that is an issue for us, but as I said I will try your suggestions and let you know how things turn out…

The unserviced interrupt is related to the frequency of the interrupts signals sent by the SPI device driver. At lower frequencies we never see the issue that I’m referring to. At higher frequencies, frequencies at which we will need the SPI port to operate and send data, we notice the problem.

Thanks Again.

Hello sanchayan.tx,

We tried adding the line above that you provided:

dma-names  =  " ", " ";

But we are still having the issue. We also noticed that we are still seeing the serial output that I reported originally:

spi_master spi1: I/O Error in DMA RX:dc
spi_master spi1: failed to transfer one message from queue

This output seems to be indicating that the DMA is still being used in relation to the SPI port we’re using.

In the &ecspi3 node a added the line that you provided: dma-names = " ", " "; to the main part of the node, and then we tried running the test by moving the line to the &ecspi3 sub node spidev1: spidev@1. I don’t believe that the location should make a difference in this case since the sub node: spidev1: spidev@1 is what is enabled for use anyway.

If the line that you provided indeed disabled the use of DMA by the SPI port device driver then I would not have expected to see the DMA error above.

In looking at the SPI port device driver code I can see that the: I/O Error in DMA RX error seen is in the spi_imx_dma_transfer function in spi_imx.c at the point where wait_for_completion_timeout is called to wait for data to be received. The timeout is occurring and generating the output that we are seeing.

One thing about our tests is that if you are running the test code I gave you it will work fine and normally you will not see an issue. It is when you increase the frequency of the interrupts to the processor that the problems we’re seeing, start to occur. I believe that you would see this problem on your end if you increased the frequency of the interrupts occurring on the GPIO line.

Also, what tool can we use to see the total load on the CPU that you’re talking about?

Thanks

The new SPI device driver code is better in one way. It allows us to choose buffer transfer sizes that, before, were causing issues. Now these buffer transfer sizes seem to work better and there is no longer a pause between writing out sections of the buffer over the SPI port.

That is because of the following commit to the mainline driver.

If you still see the IO error in DMA message then DMA is not disabled. The updated device tree was not deployed correctly. Below would be the expected set of changes

diff --git a/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi b/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi
index 6d349413b193..d73461f03ffe 100644
--- a/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi
+++ b/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi
@@ -91,6 +91,7 @@
 };
 
 &ecspi3 {
+       dma-names = "","";
        status = "okay";
 
        mcp258x0: mcp258x@0 {
@@ -102,14 +103,14 @@
                interrupt-parent = <&gpio5>;
                interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
                spi-max-frequency = <10000000>;
-               status = "okay";
+               status = "disabled";
        };
 
        spidev0: spidev@0 {
                compatible = "toradex,evalspi";
                reg = <0>;
                spi-max-frequency = <23000000>;
-               status = "disabled";
+               status = "okay";
        };
 };

The dmesg logs should show this

root@colibri-imx7:~# dmesg | grep -i "spi"
[    1.170081] spi_imx 30840000.ecspi: dma setup error -19, use pio
[    1.178630] spi_imx 30840000.ecspi: probed

Also, what tool can we use to see the total load on the CPU that you’re talking about?

htop

I started looking into this issue now. I do not have a setup to reproduce the rx side of things you have, though the issue seems to be easily reproducible for tx on our/NXP downstream kernel. The DMA implementation seems to be not correct.