SPI timing between consecutive transfers is more than 150us

Hi,

We are in a need to use the SPI of VF61 with a high-transfer rate.
Unfortunatelly, we are facing an issue in were the timing between consecutive SPI Transfers is too high for our requirements.

In our tests, using a direct SPI access in a C program is taking 150us between each SPI transfer.

In the picture below, the timing between two SPI transfer is 450us because we are using a delay_usecs of 300us (for tests), but that resulted in a total time-window of 450us (150+300).

Is there anyway to decrease this time in a controlled way?
@stefan.tx, could you help us in this case?

Best,
Andre Curvello

Hi Andre

I would like to have more Information about your use-case.

timing between consecutive SPI Transfers

How are you doing the SPI transfer? Are you writing to SPI in a loop or it is done in a special task? Is the timing between consecutive SPI Transfers always constant, even if you stress the CPU?
Could you send maybe a small snippet of code, so we can reproduce the issue on our side.

Best regards, Jaski

Hi @jaski.tx,

I’m working on the same issue of @andrecurvello. The problem is that between two consecutive SPI transfers we always have an interval of around 150us with SPI bus idle, even with delay_us set with zero. We need perform thousands of consecutive SPI transfers and an interval of 150us between messages really mean a problem for us.

You can use the following code to reproduce the problem:

#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include <linux/spi/spidev.h>

#define FILE_DESCRIPTOR "/dev/spidev3.0"
#define VERBOSE 0

enum transfer_sequence {
 ADDR_HIGH = 0,
 ADDR_LOW,
 STATUS,
 DATA_HIGH,
 DATA_LOW,
 TRANSFER_SIZE
};

int main(int argc, char *argv[])
{
    int ret = 0;
    int fd;
    uint16_t tx[TRANSFER_SIZE];
    uint16_t rx[TRANSFER_SIZE];
  uint32_t data = 0;
  uint8_t verbose = VERBOSE;

  const char *device = FILE_DESCRIPTOR;
  uint32_t mode = 0;
  uint8_t bits = 16;
  uint32_t speed = 5000000;
  uint16_t delay = 0;
  uint8_t len_msg = sizeof(__u16)*TRANSFER_SIZE;

    struct spi_ioc_transfer tr = {
        .tx_buf = (unsigned long)&tx,
        .rx_buf = (unsigned long)&rx,
        .len = len_msg,
        .delay_usecs = delay,
        .speed_hz = speed,
        .bits_per_word = bits,
    };
    
  fd = open(device, O_RDWR);
    if (fd < 0)
        printf("can't open device");

    /*
     * spi mode
     */
    ret = ioctl(fd, SPI_IOC_WR_MODE32, &mode);
    if (ret == -1)
        printf("can't set spi mode");

    ret = ioctl(fd, SPI_IOC_RD_MODE32, &mode);
    if (ret == -1)
        printf("can't get spi mode");

    /*
     * bits per word
     */
    ret = ioctl(fd, SPI_IOC_WR_BITS_PER_WORD, &bits);
    if (ret == -1)
        printf("can't set bits per word");

    ret = ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits);
    if (ret == -1)
        printf("can't get bits per word");

    /*
     * max speed hz
     */
    ret = ioctl(fd, SPI_IOC_WR_MAX_SPEED_HZ, &speed);
    if (ret == -1)
        printf("can't set max speed hz");

    ret = ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed);
    if (ret == -1)
        printf("can't get max speed hz");

    printf("spi mode: 0x%x\n", mode);
    printf("bits per word: %d\n", bits);
    printf("max speed: %d Hz (%d MHz)\n", speed, speed/1000000);

  tx[ADDR_HIGH] = 0x0000;
  tx[ADDR_LOW]  = 0x0000;
  tx[STATUS]    = 0x0000;
  tx[DATA_HIGH] = 0x0000;
  tx[DATA_LOW]  = 0x0000;

  while(1) {
      ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr);
      if (ret < 1) {
          printf("can't send spi message");
    }

    if (verbose) {
      data = (uint32_t) (0x0000FFFF & rx[DATA_LOW]);
      data |= (uint32_t) (0xFFFF0000uL & (rx[DATA_HIGH] << 16));
      printf("RX = 0x%08x\n", data);
    }
  }

  close(fd);

    return ret;
}

This is the definition of SPI on device tree:

spidev30: spidev3@0 {
    compatible = "toradex,evalspi";
    reg = <0>;
    spi-max-frequency = <50000000>;
    fsl,spi-cs-sck-delay = <100>;
    fsl,spi-sck-cs-delay = <50>;
    status = "okay";
};

The following picture shows samples from SPI bus:

Do you have some sugestion to reduce this time ?

Best regards, Gustavo

hi gustavo

Thanks for your code. I will try this and give you a feedback till 22th January morning.

Hi @jaski.tx,

Thank you, we are anxious for your feedback.

Hi gustavo and andre

I checked the spi transfer the script of gustavo with the bsp 2.7b4 with modified device tree for SPI without RT patch. I could measure transfer every 70-80us. So with RT patch, the delay becomes bigger. This delay depends on lot of factors as if your application has a higher priority, passing data from userspace to kernel and down to the hardware? Which delay would you expect for your application?

Best regards, Jaski

One thing which might be worth is trying without DMA, since setting up DMA transfer might take time:

&dspi1 {
    /delete-property/dma-names;
 };

(you should see “can’t get dma channels” in the kernel boot log).

Hi @jaski.tx and @stefan.tx,

The last test the we sent for you, we was using the kernel 4.4.107 without RT patch, and the delay between transfer was arround 100-120us, lower then the previous kernel version with RT patch. Probably a delay less than 50us will atend our requirements.

We tried disable dma, as @stefan.tx suggested, using the following device tree:

&dspi3 {
    status = "okay";
    /delete-property/dma-names;

    /* This will create /dev/spidev3.0 */
    spidev30: spidev3@0 {
        compatible = "toradex,evalspi";
        reg = <0>;
        spi-max-frequency = <50000000>;
        fsl,spi-cs-sck-delay = <100>;
        fsl,spi-sck-cs-delay = <50>;
        status = "okay";
    };

    /* This will create /dev/spidev3.1 */
    spidev31: spidev3@1 {
        compatible = "toradex,evalspi";
        reg = <1>;
        spi-max-frequency = <50000000>;
        fsl,spi-cs-sck-delay = <100>;
        fsl,spi-sck-cs-delay = <50>;
        status = "okay";
    };
};

But with this device tree, the driver crash after try execute a SPI transaction, following the log:

[  126.869618] Unable to handle kernel NULL pointer dereference at virtual address 
[  126.878031] pgd = 8cb18000
[  126.880796] [00000000] *pgd=8cb01831, *pte=00000000, *ppte=00000000
[  126.888427] Internal error: Oops: 817 [#1] PREEMPT ARM
[  126.893625] Modules linked in: can_raw can flexcan can_dev pmbus zl6100 pmbus_core lm75
[  126.901868] CPU: 0 PID: 514 Comm: test_spi Not tainted 4.4.107 #1
[  126.908024] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[  126.914530] task: 8e3257c0 ti: 8cb5e000 task.ti: 8cb5e000
[  126.920015] PC is at dspi_transfer_one_message+0x1e8/0x5e8
[  126.925560] LR is at dspi_transfer_one_message+0x1d4/0x5e8
[  126.931105] pc : [<803ab27c>]    lr : [<803ab268>]    psr: 80030013
[  126.931105] sp : 8cb5fd58  ip : 00000008  fp : 8cb5fdd4
[  126.942689] r10: 803a73c4  r9 : 00000000  r8 : 8eb182a0
[  126.947970] r7 : 8cb5fec0  r6 : 8cb5fec0  r5 : 8cb5fd94  r4 : 00000000
[  126.954561] r3 : 00000000  r2 : 00000001  r1 : 0000000c  r0 : 00000005
[  126.961151] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  126.968355] Control: 10c5387d  Table: 8cb18059  DAC: 00000051
[  126.974162] Process test_spi (pid: 514, stack limit = 0x8cb5e208)
[  126.980318] Stack: (0x8cb5fd58 to 0x8cb60000)
[  126.984733] fd40:                                                       ffffffff 8c8ca000
[  126.993014] fd60: 8eb18000 8eb18400 8e8edc10 8cb5fec0 803aafdc 803ab078 00000002 00000000
[  127.001293] fd80: 8e0af900 0000000a 80011d10 80013304 00000000 78910002 8cb5fe5c 8cb5fe18
[  127.009573] fda0: 80042280 803a8ca8 8001331c 60030013 8eb18000 8cb5fec0 8cb5fec0 8eb18000
[  127.017851] fdc0: 00000000 803a73c4 8cb5fe14 8cb5fdd8 803a87d4 803ab0a0 8cb5fe6c 8cb5fe60
[  127.026131] fde0: 80042280 803a8ce8 8cb5fe14 60030013 8cb5fe24 8eb18400 8cb5fec0 8eb18000
[  127.034410] fe00: 00000000 803a73c4 8cb5fe5c 8cb5fe18 803a8ca8 803a84b8 8cb5fe54 60030013
[  127.042689] fe20: 80011d10 00000000 8cb5fe28 8cb5fe28 8cb5ff1c 8eb18400 8cb5fec0 8eb18400
[  127.050968] fe40: 8eaf1540 8eaf1550 0000000a 807332a4 8cb5fe6c 8cb5fe60 803a8ce8 803a8a3c
[  127.059247] fe60: 8cb5fe94 8cb5fe70 803a9e2c 803a8ce0 00000000 00000000 8cb5fe78 8cb5fe78
[  127.067527] fe80: 8e0af93c 8e0afee0 8cb5ff1c 8cb5fe98 803aa880 803a9dc0 8e0af900 8e0afec0
[  127.075806] fea0: 00000051 8eaf1550 8eb18400 00000001 8eaf1540 8cb5fe98 8e12d00a 8e12a00a
[  127.084084] fec0: 8e0af934 8e0af934 8eb18400 00000000 803a72ac 8cb5fe24 0000000a 00000000
[  127.092364] fee0: ffffff8d 8cb5fee4 8cb5fee4 00000000 8cb5ffac 7e836ba0 8eb04688 8c8a36c0
[  127.100643] ff00: 00000003 7e836ba0 8cb5e000 00000000 8cb5ff7c 8cb5ff20 800f10d8 803aa098
[  127.108922] ff20: 8cb59010 8e0820c0 0000001e 00000000 8e71b378 00000002 8e0820c8 00000000
[  127.117201] ff40: 8cb5ff7c 8cb5ff50 800df994 8011da0c 00000000 00000000 8c8a36c0 00000003
[  127.125480] ff60: 8c8a36c0 40206b00 7e836ba0 8cb5e000 8cb5ffa4 8cb5ff80 800f1360 800f0c18
[  127.133760] ff80: 00140000 00000000 00010351 00000036 8000fb84 8cb5e000 00000000 8cb5ffa8
[  127.142039] ffa0: 8000f9c0 800f1330 00140000 00000000 00000003 40206b00 7e836ba0 7e836ba0
[  127.150317] ffc0: 00140000 00000000 00010351 00000036 00000000 00000000 76fb2000 00000000
[  127.158597] ffe0: 00020914 7e836b94 00010663 76f0d94c 60030010 00000003 8eff6861 8eff6c61
[  127.166859] Backtrace: 
[  127.169388] [<803ab094>] (dspi_transfer_one_message) from [<803a87d4>] (__spi_pump_messages+0x328/0x568)
[  127.178967]  r10:803a73c4 r9:00000000 r8:8eb18000 r7:8cb5fec0 r6:8cb5fec0 r5:8eb18000
[  127.186986]  r4:60030013
[  127.189593] [<803a84ac>] (__spi_pump_messages) from [<803a8ca8>] (__spi_sync+0x278/0x2a4)
[  127.197861]  r10:803a73c4 r9:00000000 r8:8eb18000 r7:8cb5fec0 r6:8eb18400 r5:8cb5fe24
[  127.205879]  r4:60030013
[  127.208485] [<803a8a30>] (__spi_sync) from [<803a8ce8>] (spi_sync+0x14/0x18)
[  127.215596]  r10:807332a4 r9:0000000a r8:8eaf1550 r7:8eaf1540 r6:8eb18400 r5:8cb5fec0
[  127.223617]  r4:8eb18400
[  127.226223] [<803a8cd4>] (spi_sync) from [<803a9e2c>] (spidev_sync+0x78/0x9c)
[  127.233437] [<803a9db4>] (spidev_sync) from [<803aa880>] (spidev_ioctl+0x7f4/0x8a4)
[  127.241182]  r5:8e0afee0 r4:8e0af93c
[  127.244864] [<803aa08c>] (spidev_ioctl) from [<800f10d8>] (do_vfs_ioctl+0x4cc/0x718)
[  127.252698]  r10:00000000 r9:8cb5e000 r8:7e836ba0 r7:00000003 r6:8c8a36c0 r5:8eb04688
[  127.260717]  r4:7e836ba0
[  127.263325] [<800f0c0c>] (do_vfs_ioctl) from [<800f1360>] (SyS_ioctl+0x3c/0x64)
[  127.270721]  r9:8cb5e000 r8:7e836ba0 r7:40206b00 r6:8c8a36c0 r5:00000003 r4:8c8a36c0
[  127.278678] [<800f1324>] (SyS_ioctl) from [<8000f9c0>] (ret_fast_syscall+0x0/0x48)
[  127.286335]  r9:8cb5e000 r8:8000fb84 r7:00000036 r6:00010351 r5:00000000 r4:00140000
[  127.294282] Code: e3a0100c 951b3058 83a03b01 851b2058 (95830000) 
[  127.306323] ---[ end trace f1ca6e58c6e7e596 ]---

Do you know how we can disable dma without crash the driver ?

We test the spi using code that we sent previously.

Regards, Gustavo.

We are not able to reproduce the issue here. Is your driver unmodified?

Can you reproduce it on the dspi1 instance? Is the test utility spi_test built from the exact source provided above?

Hi @stefan.tx

The code of driver (spidev.c) is the same from toradex repository.

We did the same test using dspi1 with dma enabled and disabled , and the results was the same, there is a delay between spi transfer of arround 100-120us.

The only diference that when we disabled dma in dspi1 the driver didn’t crash.

Regards, Gustavo

Hi
we had a delay of 50-60us. Could you send a screenshot?

Hi @jaski.tx,

This is the screenshot of last test (dspi1 and dma disabled).

@gustavossfilho Let us do one issue after the other. Above you post kernel trace and write “But with this device tree, the driver crash after try execute a SPI transaction, following the log” … “Do you know how we can disable dma without crash the driver ?”

I tried to reproduce that issue, but was not able. Then suddenly you report results… So got the kernel crash resolved?

Hi @jaski.tx and @stefan.tx,

We are trying to find a solution for our problem with dead time between SPI transfers, we thinking maybe M4 can communicate over SPI with the lowest time between transfers, if this is correct we could use M4 like an SPI proxy and fix our problem.

Do you have some estimative about the performance of SPI communication in M4 or time between transfers in this case?

Regards, Gustavo

Hi @gustavossfilho

I would expect that the time between transfers would be much shorter. After all, FreeRTOS is a real-time OS… You also write a monolithic firmware without user-space/kernel space transitions and bypass the Linux SPI stack. On top of that, you might be able to use features of the NXP SPI IP which are not supported in the Linux SPI driver.

However, this comes with the downside of added complexity (firmware, RPmsg communication etc…).

If it is only about lowest time between transfer, it should be possible to just send longer transfers in Linux… Then the driver should set up a single buffer and let DMA handle the data transfer.

However, if you have a closed loop which needs to be handled in a timely manner, that is not an option. And in such cases using the M4 might be the better options.

Hello,
I am Fabricio and I work to developers Gustavo and Curvello in the same project and I would like to know a position about the last question sent by Gustavo.
The performance is a point very critical in this project and we need to know which better way to go. Continuous with A5 or change to M4.

@jaski.tx
@stefan.tx

Best regards.
Fabricio

Hi @fpsantos

I did answer to the last comment (you have to press the “Show more comments” link).

Hi @stefan.tx, how are you?

Now I’ll be conducting a test using the M4 microcontroller of the VF61.

Just to be sure, the steps I must follow are:

  1. Disable (or remove?) the corresponding SPI at Linux (A5) device-tree
  2. Setup the corresponding SPI at VF6XX headers at FreeRTOS-Toradex base for VF61
  3. Setup the init and configuration of SPI clock, prescalers, mode and operation
  4. Conduct the overall operation using RPMSG in A5 and M4 for bypass of SPI.

I saw some few references for SPI0 and SPI1 for VF61 in the FreeRTOS base, and I’ll base myself in some of the structured examples made for iMX7D (just for a guide).

If you have anything that could be added for support, I’d appreciate.

I’m using the VFxxx Controller Reference Manual.

Best,
Andre Curvello

This seems like a reasonable approach and covers all necessary steps. Also make sure to use clk_ignore_unused in Linux to avoid that Linux turns off the SPI clock again.

That said, I still think using just Linux might be preferable in case you do not have a tight control loop to fulfill. Using longer transfers in one go should make it possible to achieve smaller gaps between the transfers.

While we have Vybrids RPmsg running and customer using it, it is certainly not as well tested as the i.MX7 implementation is…

Hi @jaski.tx,

I’m conducting a clean setup of a BSP 2.7 image without RT Patch to check this timing of 70-80us that you managed to get, and to match your setup too, using a Colibri Evaluation Board V3.2 that we have here.

@stefan.tx, do you think that it’s possible to reduce more this window of 70-80us (I also read that @jaski.tx managed to get 50-60us too) between SPI transfers in Linux?
Or this is the minimum time possible, due to system calls, Linux SPI driver/stack operations, etc…?

I’m asking this to see if it’s feasible to keep working the SPI-thing in Linux or go to M4 as we talked earlier.

We are facing two paths:

  1. Try to reduce the time-window between consecutive SPI transfers in Linux.
  2. Move the SPI work to M4 and glue the communication with A5 application using RPMSG.

It is preferable to keep working on Linux/A5, but we need to reduce as much as we can this time between SPI transfers.

If this could be achieved in Linux, it’ll be great, otherwise, I’ll have to go to option 2.