iMX8MM Verdin interrupt latency ~1ms

WillG · December 6, 2023, 9:28am

Hi!

I have been working on a solution for gathering data via SPI, using a GPIO pin as a data ready interrupt.

The previous application was userspace based, using gpiod and spidev, which worked to gather around 500 bytes per transaction at a 1KHz rate on a Raspberry Pi3 running default Debian.

When porting the code over to the iMX it became apparent that the latency for the same code was worse than the RPi, so I have written a kernel module to try and improve performance. This appears to have improved SPI latency somewhat, but it seems the GPIO IRQ doesn’t trigger for about 1ms after the DRDY signal goes low.

Here is the irq setup:

//32 * (3 - 1) + 4 //(datasheet bank 3 line 4)
#define GPIO_DRDY 68 
<...>
static int __init spi_fpga_init(void){
<...>
if(gpio_is_valid(GPIO_DRDY) == false){
    pr_err("DRDY %d not valid\n", GPIO_DRDY);
    goto fail_cleanup;
  }
  if (gpio_request(GPIO_DRDY, "GPIO_DRDY") < 0){
    pr_err("Failed to request DRDY GPIO\n");
    goto fail_cleanup;
  }
  if (gpio_direction_input(GPIO_DRDY) !=0){
    pr_err("Failed to set DRDY direction\n");
    goto fail_cleanup;
  }
  //interrupt request
  GPIO_irq_num = gpio_to_irq(GPIO_DRDY);
  printk(KERN_INFO "Requesting irq %d\n", GPIO_irq_num);
  if((ret = request_irq(GPIO_irq_num, 
  			(void*)gpio_interrupt_handler, 
  			IRQF_TRIGGER_FALLING, 		
  			"spi_fpga_device", 
  			NULL)) != 0){
    printk(KERN_INFO "Error %d: could not request irq: %d\n", ret, GPIO_DRDY);
    goto fail_cleanup;
  }
<...>
}

GPIO_irq_num is returned as 139, and gpio_interrupt_handler is called on a falling edge, from which a call to spi_async is issued.
But signal analysis shows a gap of about 1ms between the DRDY line going low and CS being asserted.

Is there any way to increase the priority of the GPIO interrupt? The iMX is running the latest version of the tdx-reference-minimal-image with the RT patches provided by tdx-xwayland-rt.

Thanks
Will

henrique.tx · December 7, 2023, 8:10pm

Hi @WillG !

Welcome to Toradex Community!

Please feel free to browse around

That’s really interesting. Could you please share which Debian and kernel versions you were using? Also, were you executing only your application?

It would be great if you could share more details about how you measured the difference and what was the values measured.

Could you please share how you measured this?

Could you share more details about this signal analysis?

We are asking all those questions because maybe the culprit is not the GPIO (or maybe not only the priority of the GPIO interrupt)

After all that, some more questions:

What is exactly your requirement? Do you need to receive the 500 bytes at exactly 1KHz? There is no margin around this frequency?
- Must it be hard real-time?
Please share the output of the tdx-info (reference: Getting Device Information with Tdx-Info | Toradex Developer Center).

WillG · December 8, 2023, 10:22am

Morning Henrique,

Thanks for getting back to me, I’ve added comments in line below (apologies if the red text is hard to read):

henrique.tx · December 8, 2023, 12:42pm

Hi @WillG !

Your comments are not visible on Community.

As you answered via email, I assume you added your comments to my message. Seems like Community doesn’t show it accordingly.

Could you send your answers again?

Thanks for your comprehension.

Best regards,

WillG · December 8, 2023, 1:23pm

Hi Henrique, sorry I didn’t realise:

The previous application was userspace based, using gpiod and spidev, which worked to gather around 500 bytes per transaction at a 1KHz rate on a Raspberry Pi3 running default Debian.

The Pi is running Raspbian Bullseye 11 with the default kernel provided by the Raspbian imaging tool, not Debian, my mistake.

And yes it is the only manually activated userspace program, there may be some other system processes running by default. But I think I may have mischaracterised the Pi as performing better than it does…

When porting the code over to the iMX it became apparent that the latency for the same code was worse than the RPi,

It’s been a bit of a piece-wise discovery process so far, as I wasn’t aware of any latency problems until I sent the code to my colleague (Connor, cc’d) who is working on the FPGA, and he did his own signal analysis.

That is where I got the ‘>1ms’ between DRDY and the SPI transaction, but having done more analysis on my end, I think the average time for the full cycle to complete can actually be a lot less, with only a few loops showing this lag.

seems the GPIO IRQ doesn’t trigger for about 1ms after the DRDY signal goes low.

I still don’t have a proper setup to accurately measure the delta between drdy falling and the spi cs going low, as I am generating the data ready pulse with the same analyser I am using to measure signals. I will try and get it set up so I can see both DRDY and SPI CS on the same monitor.

signal analysis shows a gap of about 1ms between the DRDY line going low and CS being asserted.

WIP

Is there any way to increase the priority of the GPIO interrupt?

I appreciate the methodical approach, I was a bit hasty in posting with the information I had, and agree that there are probably other causes, see black text below.

o The actual number is 432, but this will eventually increase to 864 for a complete 256 channel system. The 1KHz requirement comes from the ADC sampling rate, which should have little to no variance. Ideally there will be no dropped samples, but I have to discuss this with my colleague to see what the verilog design is currently capable of handling.

Software summary
------------------------------------------------------------
Bootloader: U-Boot
Kernel version: 5.15.129-rt67-6.4.0-devel+git.67c3153d20ff #1 SMP PREEMPT_RT Wed Sep 27 12:30:36 UTC 2023
Kernel command line: root=PARTUUID=402c09ac-02 ro rootwait console=tty1 console=ttymxc0,115200 consoleblank=0 earlycon
Distro name: NAME="TDX Wayland with XWayland RT"
Distro version: VERSION_ID=6.4.0-devel-20231012131104-build.0
Hostname: verdin-imx8mm-14947563
------------------------------------------------------------

Hardware info
------------------------------------------------------------
HW model: Toradex Verdin iMX8M Mini on Verdin Development Board
Toradex version: 0059 V1.1C
Serial number: 14947563
Processor arch: aarch64
------------------------------------------------------------

(I am running on a Yavia carrier)

There is other load on the system, the main one being transmitting the data via TCP over an ethernet connection. Currently this is deferred until there are 8 * 432 byte packets, which seems to give a fair overhead tradeoff. I have tried disabling the network transmission, but it strangely doesn’t appear to have had much effect on the worst case kernel module loop time.

In terms of improvements, I have managed to increase performance quite significantly by configuring the kernel with CONFIG_NO_HZ_FULL, and am currently experimenting with the different CPU governor options and kernel timer speeds.

I have also experimented with running the userspace application on an isolated CPU with SCHED_FIFO at priorities of 99 and 49, but this has little effect so far.

I have also noted that, given a faster DRDY signal, the average time for a transfer seems to decrease.

This is probably because of the way the kernel module currently works, which is to ignore any interrupts generated while a transfer is still in progress.

To that end, I’m wondering if there is a way to mask and then unmask the GPIO interrupt, so that it fires again immediately after a slow transaction, rather than in the worst case waiting just under 2ms for the next falling edge.

Sorry for the lack of process and proper measurements here, I hope to get some more clear measurements today, and continue to try and better characterise the system as a whole.

EDIT: I have just compiled a version of the kernel module that will raise a GPIO when the interrupt is serviced, and lower it when the spi transaction callback is received. This should hopefully give an estimate of the main time deltas, I will update here when I have some good captures.

DOUBLE EDIT: I’ve managed to get some captures:

This is one of the worse cases, usually seen at the very first interrupt but occurring throughout, 300us seems to be the very worst case between DRDY going low and the soft irq handler executing.

This is one of the best cases, delta between DRDY and IRQ is 6us.

The SPI transaction time is also variable, we might be able to mitigate that a bit by increasing the clock speed, but we are already up at around 20MHz so noise/dataloss might be an issue. It would be good to know if we can ensure it that the GPIO interrupt time is consistently sub 10us, perhaps by running the module on an isolated CPU.

Thanks for your time,

Will

stefan_e.tx · December 11, 2023, 2:11pm

Hi @WillG

Is the RT kernel a requirement? Because normally this increases the interrupt latency a bit.

However, you now have 300 us as interrupt latency, which doesn’t seem far off. What exactly are you doing in the irq? Can you share the full driver with us? It could be that sending something through SPI will again add a context switch, so you would have to increase this task priority as well. However, here I’m not 100 sure. Can you share the output of:

ps -eo pid,rtprio,pri,cmd

Also if you want to have a hard irq, you could try the following flag:

request_irq(GPIO_irq_num, 
  			(void*)gpio_interrupt_handler, 
  			IRQF_TRIGGER_FALLING | IRQF_NO_THREAD, 		
  			"spi_fpga_device", 
  			NULL);

But keep in mind this can increase the jitter for the RT-Kernel. See this link for more information:
https://wiki.linuxfoundation.org/realtime/documentation/technical_details/threadirq

Regards,
Stefan

WillG · December 13, 2023, 10:02am

Hi Stefan,

Thanks for the suggestions. We initially intended to drive everything from userspace, but having read your advice and thought about it a bit it seems like the RT requirement is not particularly strong. I have switched back to the standard tdx-xwayland kernel recipe and am seeing far more regularitly in interrupt servicing in the module now.

I have also added the IRQF_NO_THREAD flag to the interrupt request, and now the majority of the time the interrupt is served within 20us of the external signal going low. It’s still out for testing with the FPGA engineer, will update here once I have feedback.

Thanks,
Will

stefan_e.tx · December 13, 2023, 3:27pm

Hi @WillG

Thanks a lot for the feedback. I’m crossing my fingers that it keeps working

Regards,
Stefan