Imx6/linux: Large and inconsistent latency when issuing an irq triggered spi read

gardarh · March 8, 2016, 11:03am

Hello,

I’m using imx6dl/linux to fetch data from an ADC via SPI with the sampling rate of 250Hz (i.e. 4ms between samples) where the ADC triggers the sampling by setting a DRDY pin low. To sum up, this is what I need to do within a 4ms window:

Wait for DRDY pin to go low (4ms window opens)
Issue an SPI read/write to ADC1 to acquire samples
Issue an SPI read/write to ADC2 to acquire samples (both ADCs are triggered with the same gpios and one DRDY triggers both of them)

If I can’t complete these steps within the window I lose a sample. My first approach at this was to write a userspace program:

poll() the DRDY gpio pin using the gpio sysfs driver
Issue an SPI read/write via the generic spi (using ioctl()) (do this 2 times)

This works reasonably well when the system is idle with some jitter (which is OK since the ADC is driven by an external clock) but when a load hits the system I encounter latency spikes from DRDY->SPI CS DOWN that exceeded 4ms. The average latency between DRDY->SPI CS DOWN is around 100us.

To remedy this I proceeded in writing a kernel module to perform this task. I used the iio driver subsystem. The disappointing result of that work is that the driver has pretty much the same performance as the userspace program so my problem persists, I’m still losing samples under load. 100us strikes me as an oddly long period, especially consider this post ( Writing a Linux Kernel Module — Part 1: Introduction | derekmolloy.ie ) from Derek Molloy where he manages a <20us turnaround time for what is admittedly a simpler task and on a different (but similar) cpu.

Here is a snipped from my dts:

&ecspi4 {
    status = "okay";
    ti_ads1198_0: ads1198@0 {
        spi-max-frequency = <20000000>;
        compatible = "ti_ads1198";
        reg = <0>;
        enable-gpios = <&gpio5 6 0 &gpio5 5 0 &gpio4 16 0 &gpio5 7 0>;
        spi-cpha;
        gpios = <&gpio5 6 0 /* ADS1198 RESET */
             &gpio5 5 0 /* ADS1198 START */
             &gpio4 16 0 /* ADS1198-0 DRDY */
             &gpio5 7 0 /* ADS1198-0 PWDN */
            >;
        interrupt-parent = <&gpio4>;
        interrupts = <16 IRQ_TYPE_EDGE_FALLING>; /* DRDY */
    };
    ti_ads1198_1: ads1198@1 {
        spi-max-frequency = <20000000>;
        compatible = "ti_ads1198";
        reg = <1>;
        spi-cpha;
    };
};

I have modeled my driver mostly after the MPU6050 driver Stefan mentioned in another post on this message board with this trigger setup:

ret = request_irq(st->irq_drdy, &iio_trigger_generic_data_rdy_poll,
         IRQF_TRIGGER_FALLING | IRQF_ONESHOT,
         dev_name(&st->spi->dev),
         st->trig);

I attempted to put an spi_async() in the irq_handler function but without noticeable success.

One thing that I notice are the following debug messages I get on boot:

[    0.187883] of_dma_request_slave_channel: dma-names property of node '/soc/aips-bus@02000000/spba-bus@02000000/ecspi@02014000' missing or empty
[    0.200817] spi_imx 2014000.ecspi: cannot get the TX DMA channel!
[    0.206944] spi_imx 2014000.ecspi: dma setup error,use pio instead

To remedy this I tried adding the dma and dma-names properties to the imx6qdl.dtsi (essentially copy-pasted from the imx6sx.dtsi), yielding the following entry for my spi peripheral:

ecspi4: ecspi@02014000 {
    #address-cells = <1>;
    #size-cells = <0>;
    compatible = "fsl,imx6q-ecspi", "fsl,imx51-ecspi";
    reg = <0x02014000 0x4000>;
    interrupts = <0 34 IRQ_TYPE_LEVEL_HIGH>;
    clocks = <&clks IMX6QDL_CLK_ECSPI4>,
         <&clks IMX6QDL_CLK_ECSPI4>;
    clock-names = "ipg", "per";
    status = "disabled";
    dmas = <&sdma 9 7 1>, <&sdma 10 7 2>;
    dma-names = "rx", "tx";
};

This did not fix the latency issues, but the warnings disappeared.

So my questions are:

Is the 100us latency as I described above “normal” or am I possibly doing something wrong?
Can you think of anything that can get me reliably within the 4ms window or do I just have to accept the fact that I’m putting realtime constraints on a non-realtime OS and fix the problem with a messier solution (i.e. sample padding)?
Applying PREEMPT_RT patches would be one potential solution, I’m just a little concerned that applying that patch might compromise the stability of the system, is that a route that you have researched?

brandon.tx · March 10, 2016, 1:08am

Hello,

Without going really deep into this subject - what you’re experiencing is to be expected - although perhaps it can be improved even without the use of PREEMPT_RT patches. Although this other fellow may achieve a relatively consistent 20us response time on his interrupt without any load on the CPU, I expect that:

Loading the CPU and exercising other interrupts will blow these numbers up
Even without any CPU load or any other interrupts active, there’s likely to be occasional jitter much greater than 20us - maybe very infrequently, but still problematic for most real-time systems

Is the 100us latency as I described above “normal” or am I possibly doing something wrong?

Yes. I would expect average latency to be lower than 100us, but you will certainly experience 100+us latency from time-to-time (even 4ms is probable). I also expect that there is room for improvement without going the PREEMPT_RT route.

Can you think of anything that can get me reliably within the 4ms window or do I just have to accept the fact that I’m putting realtime constraints on a non-realtime OS and fix the problem with a messier solution (i.e. sample padding)?

How “hard” are your real-time requirements - can you accept any loss of data? If not, then you probably need to consider either a true RTOS, a secondary controller (or heterogenous multicore solution such as iMX7) or a hardware buffer. Even PREEMPT_RT will not truely gaurantee anything. If there is any “softness” to this real-time requirement then you have options with Linux.

Applying PREEMPT_RT patches would be one potential solution, I’m just a little concerned that applying that patch might compromise the stability of the system, is that a route that you have researched?

PREEMPT_RT certainly helps, but its not a perfect solution - it still requires that all kernel code including drivers/modules/patches/etc play by the PREEMPT_RT rules (ie. no preemption disabling, etc). With PREEMPT_RT and some tuning of interrupt priorities, you should be able to bring down worst case latency well below 4ms (I measure ~200us worst case with stress tests I did using our Apalis iMX6 with our 3.14.28 kernel patched with PREEMPT_RT) - again this is still not a guarantee.

A brute force option - albeit very ugly - would be to create a SPI kernel module which disables preemption & scheduling for the core it runs on and occupies the CPU 100% of the time such that it doesn’t require any interrupts or preemption. This is sorta silly, so I don’t recommend it.

gardarh · March 10, 2016, 9:44am

Thanks for the answer Brandon. What I have resorted to is simply repeating samples where necessary to “fill in the holes” and fulfill my promise that I’ll deliver 250 samples/second. Therefore I count on that the CPU time is accurate enough (as opposed to the external xtal), it feels like a reasonable compromise.

So in effect, if the >4ms jitter is rare enough (~1 time/sec on average) that I don’t think anyone is going to notice and therefore my realtime requirement is not that hard. However I’d like for this to happen as rarely as possible.

What bothers me the most is the 100us average latency (it feels to long and it indicates to me that I’m doing something wrong). Do you have known examples of lower latency on a system doing an spi read triggered by an external interrupt pin? Is there anything on the top of your mind that you imagine might be causing this (e.g. in terms of device tree hookup, since you don’t have my code)?

stefan.tx · March 10, 2016, 5:32pm

@gardarh, maybe you have already done that but I did not found it mentioned in the initial question: The first thing I would do is turning on the priority knobs. Use sched_setscheduler in your application and choose one of the real-time schedulers (SCHED_FIFO/SCHED_RR). The second thing you need to do is to identify kernel threads involved in the communication (e.g. according to name or IRQ numbers associated with SPI according to /proc/interrupts).

gardarh · March 11, 2016, 10:35am

@stefan.agner thanks for the suggestion, I had not stumpled upon those system calls. As promising as they sound they did not improve my situation. The working theory is that the imx spi driver is causing the latency but for the moment I have a sufficiently good solution (smearing samples where I miss out) even though it doesn’t make me a model student in the school of signal processing. Right now release date is approaching and I have to start focusing on other problems but I’ll keep the thread updated if I do further research into this to later on.

Garyio · March 14, 2016, 7:00pm

I have also been working on an imx6 linux board where I am doing spi to communicate with an FPGA. I found the same latency issues you are seeing. One thing I found to help is to enable real time priorities in the spi driver. You can add set master->rt = 1 to the spi-imx.c file in drivers/spi in the probe function where other master parameters are set. This gets back to the spi core code where it has:

/*
 * Master config will indicate if this controller should run the
 * message pump with high (realtime) priority to reduce the transfer
 * latency on the bus by minimising the delay between a transfer
 * request and the scheduling of the message pump thread. Without this
 * setting the message pump thread will remain at default priority.
 */
if (master->rt) {
	dev_info(&master->dev,
		"will run message pump with realtime priority\n");
	sched_setscheduler(master->kworker_task, SCHED_FIFO, &param);
}

I’m not sure this helps much but I found some improvement. You will have to recompile the kernel if your not already. Hope this helps.

brandon.tx · March 15, 2016, 12:46am

@Garyio, this is very interesting. I’d be interested to see any comparisons of latency measurements you made between these two configurations.

Garyio · March 15, 2016, 1:35am

Changing this did not necessarily speed things up it just seemed to help (maybe) get rid of those random extra long latency issues like you where seeing. I’m still testing. I also turned on the preemptible option in the kernel. I am using the 3.14.28 kernel. Actually I think it was already on when I started. You can tell if it’s enabled when you cat /proc/version. It will say preemtible or something. I also tried to apply an RT patch to my kernel but it didn’t seem very stable. I had to apply a yocto imx6 recipe patch by hand. I was able to run cyclictest overnight for a couple of nights but when I would run my app it would sometimes crash. Could have been my app.

gardarh · March 15, 2016, 11:38am

This should’ve been a comment to the post above…

gardarh · March 15, 2016, 11:37am

Hey @Garyio. I modified the PREEMPT settings, essentially by adding the following to my defconfig:

# Comment out the following line
# CONFIG_PREEMPT_VOLUNTARY=y
# Add the following lines
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RCU=y

Adding this under probe() of the spi-imx driver:

master->rt = 1;

and modifying my userspace driver, adding the following:

struct sched_param sched_params;
sched_params.sched_priority = sched_get_priority_max(0);
sched_setscheduler(0, SCHED_FIFO, &sched_params);

However this does not eliminate the large delays which mostly occur when I try using some peripherals, e.g. fetch something over ethernet or start a Bluetooth connection. Thanks for the suggestion though!

brandon.tx · March 15, 2016, 2:41pm

@gardarh, most drivers are not optimized for real-time and I assume that most of the iMX6 drivers are no different. The SPI driver is part of the critical path in your latency tests. So it can ruin your timing regardless of the OS & scheduler’s real-time configuration. The reverse is also true - that the scheduler is also on the critical path and must also be highly responsive. That said, if this is a critical issue for you, then you’ll want to try to profile & optimize the various pieces that make up the critical path.

brandon.tx · March 15, 2016, 2:41pm

You can profile the responsiveness of the OS/scheduler by using cyclictest (available in the package ‘rt-tools’). These are my results from a 3.14.28 preempt_rt kernel on Apalis iMX6Q @ 800MHz after 30mins while executing a few different stress scripts to load the CPU cores & exercise various interrupts:

cyclictest -n -p 80 -t -D30m:
policy: fifo: loadavg: 4.62 4.52 3.98 4/163 27123           

T: 0 (12907) P:80 I:1000 C:1799985 Min:     10 Act:   24 Avg:   26 Max:     166
T: 1 (12908) P:80 I:1500 C:1199994 Min:      9 Act:   27 Avg:   27 Max:     187
T: 2 (12909) P:80 I:2000 C: 899996 Min:     11 Act:   21 Avg:   25 Max:     135
T: 3 (12910) P:80 I:2500 C: 719996 Min:     10 Act:   30 Avg:   28 Max:     129

However, note that use of CONIFG_PREEMPT or applying the PREEMPT_RT patches won’t touch any drivers that aren’t part of the mainline kernel nor will it make any drivers more deterministic - it simply adds more preemption to the kernel.

If you’d like to test a preempt-rt kernel on iMX6, you can obtain it from our feeds server using the instructions provided here: How to build real-time kernel for Apalis IMX6 - Technical Support - Toradex Community

gardarh · March 15, 2016, 3:18pm

Thanks, it’s good to be aware of those tools, I might take another round at this problem later. I’m currently using a workaround that allows for a missing sample every now and then. It’s not optimal but a reasonable compromise and launch date is approaching fast

brandon.tx · March 15, 2016, 3:28pm

For comparison, these are the results for the standard kernel with voluntary preemption:

cyclictest -n -p 80 -t -D30m:
policy: fifo: loadavg: 4.35 4.40 3.79 6/115 31966           

T: 0 ( 6531) P:80 I:1000 C:1726260 Min:      8 Act:   40 Avg:   60 Max:   32383
T: 1 ( 6532) P:80 I:1500 C:1150840 Min:      9 Act:   35 Avg:   62 Max:   33290
T: 2 ( 6533) P:80 I:2000 C: 863130 Min:     10 Act:   42 Avg:   80 Max:   33049
T: 3 ( 6534) P:80 I:2500 C: 690504 Min:     12 Act:   97 Avg:   78 Max:   32086

Note that average latency is still below 100us, but max latency reaches ~33ms.

gardarh · March 29, 2016, 11:34am

I added to flag

CONFIG_HZ_1000=y

To my defconfig, it appeared to improve the situation somewhat (reduced to no. of lost samples significantly but did not eliminate them)

modonovan · March 7, 2017, 10:47am

Possibly relevant:

gp_lyakh · December 13, 2017, 5:25am

My current solution is PREEMPT_RT, with recursive tasklet pooling of the deferred IRQ status flag.Best Regards.

rdonio · August 1, 2020, 11:31am

Hi
I see the device tree for the ti_ads1198 in this thread. I am trying to locate the source code for it.
Thanks on advance Ron

diego_b.tx · August 3, 2020, 6:52am

Hi @rdonio and welcome to the Toradex Community!

Please do not hijack this thread. If you have a new question, please start a new thread in the Toradex Community.

Sorry but I don’t understand your question. You can find all our sources here. Please always try to be as precise as possible and provide information about used SoM, carrier board, OS and BSP version. If you didn’t search for our Git repositories but something different, I would suggest starting a new thread with a more detailed question for the next step.

Thank you for your understanding and best regards
Diego

rdonio · August 3, 2020, 8:10am

Hi Sorry I am not trying to hijack anything.
The question in related to this thread.
I see the device tree for the ti_ads1198 in this thread. I am trying to locate the source code for it. Thanks on advance Ron