UART input overrun

CornStarch · November 12, 2018, 10:44pm

Dear Toradex.
We are experiencing input overruns of ttyS0 using the Tegra drivers from kernel 4.19.1. This didn’t happen back when we were still using the Toradex kernel. We increased N_TTY_BUF_SIZE from 4096 to 131072 in /include/linux/tty.h, but it didnt seem to do the trick.
I noticed now checking /proc/interrupts, that the interrupt count seems to stay the same, which partly explains the problem. Not raising interrupts will probably fill the buffer and some read after an interval will not be fast enough.
Could this be connected to the DMA mode issues? In a new version we removed DMA support for UART from the kernel. Should this be enough or would you still recommend putting dma-names = “”,“”; in the device tree?
Can i check somehow in user space, if im using DMA or PIO?
Can you think of any other reasons for input overruns? Thank you very much.

jaski.tx · November 13, 2018, 3:30pm

hi @CornStarch

We are experiencing input overruns of ttyS0 using the Tegra drivers from kernel 4.19.1. This didn’t happen back when we were still using the Toradex kernel.

Why did you stop using Toradex Kernel?

We increased N_TTY_BUF_SIZE from 4096 to 131072 in /include/linux/tty.h, but it didnt seem to do the trick.

Increasing buffer size to such a big value may not help you. You should introduce flow control in your Uart Driver/Application.

I noticed now checking /proc/interrupts, that the interrupt count seems to stay the same, which partly explains the problem. Not raising interrupts will probably fill the buffer and some read after an interval will not be fast enough.

Did you check if the Buffer is filled up?

Could this be connected to the DMA mode issues? In a new version we removed DMA support for UART from the kernel. Should this be enough or would you still recommend putting dma-names = “”,“”; in the device tree?

We don’t have experience with the kernel on T30. Usually putting dma-names = “”, “” should disable the dma transfer for a specific UART port and you would not need to disable the DMA for the whole UART Interface. Please have a look here too.

Can i check somehow in user space, if im using DMA or PIO?

Usually this should be disabled if you are specify it correctly in the device tree.

Can you think of any other reasons for input overruns?

Unfortunately no.

Thank you very much.

You are welcome.

jaski.tx · November 16, 2018, 3:47pm

Hi
You are welcome. We are looking currently into this issue and will come next week back to you.

kswain · December 7, 2018, 9:42pm

I actually still have this issue. You can disable the DMA driver in the device tree as mentioned and use /dev/ttyS1 instead of /dev/ttyTHS1 (Tegra High Speed). That helped us along a little further, but still had uart1 crash. We fixed that here:

I’m now back to the high speed DMA uart, and all seems to be fine, except one application. We try to send 4096 block of data, and we don’t get the required response as expected. This app works on a PC, and our CortexM3 platform, so it’s probably related to the UART buffer size, or latency or Tegra platform. I’ll be spending more time on this next week.

jaski.tx · December 11, 2018, 10:08am

Did you try this using the 4.19 kernel? Did you try sending different number of blocks than 4096?

kswain · December 11, 2018, 6:44pm

This is using the 4.19.y mainline kernel. I made it a little further today:

fprintf(stdout,"Length is %d\n",length);
	n = write(device,teststring,length);
	fprintf(stdout,"Wrote n is %d\n",n);

length is 4096, but n (write length) returns 4095. If I increase length to more than 4096, it still returns 4095. From the write() manual:
On success, the number of bytes written is returned (zero indicates
nothing was written). It is not an error if this number is smaller
than the number of bytes requested; this may happen for example
because the disk device was filled.

So I guess there is a buffer limit of 4095 with DMA driver? What is the safest way to increase this buffer size? The largest I’ll be sending is 4k.

jaski.tx · December 17, 2018, 9:22am

hi, sorry for the late reply.

What Buffer size did you set for N_TTY_BUF_SIZE and UART_XMIT_SIZE?

mirza.krak · March 20, 2019, 7:53pm

Curios if you ever found a workaround for this?

I have recently experienced the same problems when running a 4.14 Linux kernel on a Colibri T20. It is very easy to generate input overruns on ttyS0 which is used as serial console in our case

kswain · March 20, 2019, 8:12pm

I don’t know if I fixed overruns specifically? I am running Apalis T30 with 4.14.y kernel. We disabled Tegra high speed uart in kernel config. We changed device tree to use PIO mode drivers only (/dev/ttySx instead of /dev/ttyTHSx). We also had one further bug with UARTB, which was due to the IR controller sharing a clock on that UART. With that bug, that uart would just crash.

Other than that, our application had to be careful when using tcflush(device,TCIOFLUSH);, as it is possible to flush a buffer at the wrong time and lose data. Since making these changes, we have had no issues.

Patch we used is attached

jaski.tx · March 21, 2019, 9:58pm

@mirza.krak: Did the answer from @kswain help you?

mirza.krak · March 22, 2019, 7:25am

I am grateful for @kswain taking the time to write a comment. But unfortunately this has not resulted in a resolve.

We are already using PIO (ttyS0) driver so there is not much we can change. Still seeing overruns when we try to lend larger chunks of data (more then 32 bytes which is the FIFO size).

We have not experienced this issue on downstream L4T kernels and only on 4.14

jaski.tx · March 22, 2019, 4:07pm

Hi @mirza.krak, Hi @kswain

Thanks for your messages. This issue is solved in 4.19 kernel. Please consider using this kernel.

Best regards,
Jaski

mirza.krak · March 22, 2019, 4:15pm

Thanks for the update.

Do you happen to know which specific commit fixes it?

Right now it is not really an option to update and would like to patch 4.14 to work.

kswain · March 25, 2019, 6:38pm

@mirza.krak If this is solved in 4.19 kernel, then it’s probably in this list (1 or more patches??).

http://git.toradex.com/cgit/linux-toradex.git/log/drivers/tty?h=toradex_4.19.y

There is more patches in the 4.20.y as well, if you want to look there:

http://git.toradex.com/cgit/linux-toradex.git/log/drivers/tty?h=toradex_4.20.y

mirza.krak · April 28, 2019, 7:28pm

I had a chance testing on a 4.19 kernel. The issues still remain.

I am also seeing poor IRQ performance on other interfaces now that I have tested a bit more, e.g CAN where I am also seeing a lot of overruns which have not experienced on downstream kernel.

I also did a test on a 4.1 upstream kernel, this worked just fine and no overruns.

Any ideas?

jaski.tx · April 29, 2019, 12:49pm

Hi @mirza.krak

As we know, this issue should not appear on mainline kernel since frequency scaling is not enabled. But if the frequency scaling is enabled, this issue can occur? Could you check, if you see the issue when the frequency scaling is disabled?

mirza.krak · April 30, 2019, 8:50am

@jaski.tx , thanks for pointer.

I am running with frequency scaling enabled in the kernel, but during my tests I typically set it to a fixed rate using a “userspace” governor.

I will try to disable it completely in the kernel.

Oh, I also see that I high jacked a thread that was having these problems on T30, I am running on a Colibri T20

mirza.krak · April 30, 2019, 11:09am

I have tried with removing frequency scaling from the kernel, I still see the same problems.

mirza.krak · May 3, 2019, 7:33am

With some additional troubleshooting I found out that setting,

CONFIG_CPU_IDLE=n

Resolves the problems I was having, would still need to investigate on why though as it seems that the latency of exiting idle states is terrible.

CornStarch · November 13, 2018, 4:09pm

Thank you for your reply

Why did you stop using Toradex Kernel?
We want to use some features like overlay-fs and persistent vlans (possible with new systemd)

Increasing buffer size to such a big value may not help you.
My bad, we use 32768 (taken from Toradex kernel), not 131072

Did you check if the Buffer is filled up?
How would I do that?

Please have a look here too.
Ok, thank you. Our device is already ttyS0, so maybe we were using PIO all along. This would set us back to square one (not having a clue why the interrupts could stop being risen).

Best regards