SPI fails with mainline kernel 6.6.1

StefanB · November 16, 2023, 11:31am

Hi

I’m running mainline kernel on the Verdin imx8mm.
We tried to update from 6.5.11 to 6.6.1.
On our board we have connected the following SPI devices

TPM2 (infineon,slb9670)
SPI-RAM (microchip,mchp23lcv1024)

With 6.5.11 everything was fine but with the update to 6.6.1 the access to all SPI-devices failed.
When we did measurements on the SPI Bus we recognized, that the send and received data length was not matching with the expected size and the transfer failed.

I think that we identified the commit, that makes the SPI fail.

spi: Increase imx51 ecspi burst length based on transfer length
15a6af94a2779d5dfb42ee4bfac858ea8e964a3f
https://lore.kernel.org/r/20230628125406.237949-1-stefan.moring@technolution.nl

spi: imx: Take in account bits per word instead of assuming 8-bits
5f66db08cbd3ca471c66bacb0282902c79db9274
https://lore.kernel.org/all/20230917164037.29284-1-stefanmoring@gmail.com/

When we revert the commits then SPI is running again.
It seems that the enabling of the busts breaks the imx8mm.

I did some measurements with the SPI-RAM. When I write data to the mtd device
we see a 4Byte write command followed by the number of bytes to write
With the Burst-Mode enabled I see that the send length is too long and in many tests I see that the data that I tried to send is interleaved with 0. The send Pattern should be incrementing from 0x01 (0x01, 0x02, 0x03,…)

The length is 504Byte instead of 127

We see 0x00,0x00,0x01 0x00,0x00,0x00,0x02 0x00,0x00,0x00,0x03

Question:

Does Toradex also has recognized such behavior on imx8m or even on imx6 (the change applies for both)?
Do you have any clue what is going wrong?

I tired to understand the behavior tracing the code but I do not see any obvious bugs.
PS: I posted here the bad behavior with 127Byte, we see different errors with other length and other alignment.

Best Regards
Stefan

lucas_a.tx · November 16, 2023, 6:22pm

Hi @StefanB ,

I contacted the team here and they’re not aware of this behavior in kernel 6.6 upstream for the Verdin iMX8M Mini.

I took a look at our latest automated tests (kernel 6.7.0 and 6.6.0) related to our future BSP reference multimedia images for the Verdin iMX8M Mini and Colibri iMX6, and they don’t check SPI burst-mode specifically, so that’s probably why we didn’t reproduce it.

As for your second question, it’s hard to tell. You can try bringing up this issue directly to the upstream kernel to see if the committer you referenced and the related maintainers have a better idea on why this is happening.

Best regards,
Lucas Akira

StefanB · November 17, 2023, 7:17pm

Thanks for your reply.
I like to state, that you do not have to do special Burst-Testing to detect the SPI-Error.
A probe of the TPM2 already show the error, see below
The failing selftest is a normal behavior, but the request to do a firmware update shows
the problem. The capabilities of the TPM cannot be read successfully.

[   15.182512] tpm_tis_spi spi3.1: 2.0 TPM (device-id 0x1B, rev-id 22)
[   15.191079] tpm tpm0: A TPM error (256) occurred attempting the self test
[   15.197928] tpm tpm0: starting up the TPM manually

[   15.777121] tpm tpm0: TPM in field failure mode, requires firmware upgrade

I also run the test with a colibri-imx6dl and encountered the same bad behavior.

As you requested I will post it on the mailing linux mainling list
Regards Stefan

StefanB · November 23, 2023, 4:32pm

Hi

I discussed the issue on the mailing list but without finding the problem

The regression testing for SPI is done with a loopback (MOSI → MISO).
My test showed sending and receiving has the same problem and mask the mistake.
So the data on the line is too long but the received data is fine again.

The maintainers were not able to reproduce the problem (also with the scope).
I do not know what imx8 type they used.
Updating to the latest imx-sdma firmware 4.6 from NXP did not helped either.

My analysis showed that the problem only occurs if the transmit length >= 64Bytes and the DMA is used for the transfer. But the reason for the misbehavior is unclear.
So I reverted the commits for me and I hope I find someone else also having the problem.

details see here the kernel mailing list:
https://lore.kernel.org/lkml/8a415902c751cdbb4b20ce76569216ed@mail.infomaniak.com/

Regards Stefan

lucas_a.tx · November 24, 2023, 12:59pm

Hi @StefanB ,

Thank you for the update. I’m sure the information you posted here will be useful to other people.

The regression testing for SPI is done with a loopback (MOSI → MISO).
My test showed sending and receiving has the same problem and mask the mistake.

Our automated SPI tests only do a simple loopback test with spidev_test, so this could explain why we didn’t notice it.

Given that currently our most recent BSP (6 at the time of writing this) doesn’t support kernel 6.6 yet, we cannot guarantee we’ll actively look at this issue right now.

Let us know if you need anything else on our side. Otherwise, feel free to continue posting updates about this issue.

Best regards,
Lucas Akira

StefanB · November 24, 2023, 1:22pm

Thanks Lucas

I will post the news, if there is any, on this ticket.
By the way, with your new mallow carrier board you would have the possibility see the problem with the TPM2.
Thanks
Stefan

StefanB · January 15, 2024, 7:17am

Hi

Meanwhile we managed to fix the problem and pushed it mainline.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e9b220aeacf109684cce36a94fc24ed37be92b05

it is also back ported to 6.6.y
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.6.y&id=a8b655ac35be588df0bec9c56b565bc72a188682

Best Regards
Stefan

lucas_a.tx · January 16, 2024, 12:51pm

Hi @StefanB ,

That’s great to hear. Thank you very much for your contribution in the kernel, we really appreciate it.

Best regards,
Lucas Akira