eCSPI hardware CS and DMA

Hi,

Some time ago I posted here incomplete patch for spi-imx.c driver, demonstrating eCSPI ability of removing gaps in SPI stream each 8 / 16 / 32 bits, as well not toggling unexpectedly hardware controlled CS pin. At least up to max value of eCSPI burst length (4k bits or 512 bytes) SCK may be uniform without any gaps.

I’m unable to find that my old message. I had no chances to come back to my experiments with Linux eCSPI driver until now. Since the question about those eCSPI SCK gaps rises again from time to time, I hope there is community interest in fixing it?

Patch details:
For single CS pulse transfer eCSPI TX buffer has to be pushed 32bits word at once. There are no means to fill TX FIFO 8 bits at a time without breaking desired single CS pulse per message. Sending not a multiple of 4 bytes, odd amount of bytes should be written first to TX register, all remaining writes should push 4 bytes to TX FIFO every time. Problem with original driver is that can_dma() call back lets upper side spi.c driver map DMA message tx and rx. But we need to insert odd amount of bytes. For this reason can_dma() routine returns false, DMA map/unmap is moved to spi-imx.c. Since DMA part of original spi-imx.c was following bits_per_word setting, it did have to change byte order on (spidev_test) -b 8 and -b 16 transfers. Since some people need longer than 512 bytes transfer, I made patched driver behaving the same as original one on long transfers.

spi-imx.diff (6.7 KB)

Hi @Edward ,

Hope you’re doing well :slight_smile:

Sorry for the delayed answer. Thank you for the information, we’ll forward that to the responsible team to have a look.

Can you share what version of BSP you’re using here?

We’ll answer that one as soon as possible.

Best Regards
Kevin

Hi @Edward, thank you for the patch!
Could you please bring back some of the context around this? Is it that the SPI HW misbehaves in these big transfers or do you think the IMX driver implementation is wrong according to the Reference Manual?

Best regards,
Rafael Beims

Hi @kevin.tx

I’m fine, hope you’re well too.

Thank you for your interest. Sorry for not mentioning BSP, it’s 5.7.0.

Yet another update. Hardware/native CS instead of expected small reduction of CPU load was noticeably increasing it, kind of +1…2% at high load. Problem is that spi-imx driver is using spi-bitbang driver. When spi.c finds invalid CS GPIO, it calls spi_cs() routine from spi-bitbang, which makes two unnecessary ndelay() calls increasing CPU load. New patch fixes this problem as well, spi_cs() from spi-bitbang is eliminated. At very high CPU load caused by high CAN bus load to MCP25xxFD I saw at least 2% CPU load reduction compared to GPIO CS on iMX6ULL and iMX7D. Mix of native and GPIO CS’s is still OK, verified on iMX6ULL + Aster.

spi-imx.diff (7.6 KB)

BTW posting new question it seems impossible to add tags for more than one iMX flavor, isn’t it? Perhaps just on Edge?

Regards
Edward

Hi @rafael.tx

In short: stock eCSPI driver supports HW/native CS only up to 64 bytes transfers. This patch extends that limit up to 512 bytes. 512 covers much more devices than former limit. Benefits of HW CS:

  • shorter transfer time
  • minimal, but still less CPU usage.
  • uniform SCK clock without any modulation each 8/16/32 bits (bits per word, transfer setting)
  • Eases 32pins GPIO module sharing to M4, since eCSPI can avoid using GPIO at all

Unfortunately going further is problematic without limiting bits per word to 32bits only, and no HW CS of course. NXP/FSL should ban and not use eCSPI in any new devices since long ago, or fix it, but fixes one wouldn’t be called eCSPI anyway.

Regards
Edward