[iMX6] CPU overhead for serial port read

Introduction

The SoM used is an iMX6ULL with an OS built using yocto adhering to the steps described in the Toradex manuals.

The UARTs are not implicitly defined to use DMA in the DTSI file.

SoM: Colibri iMX6 ULL 256MB v1.1 A
Kernel version: 5.4.193-5.7.0-devel+git.90bfeac00dbe

Problem statement

We are facing an issue with UART (UART2) communication (mostly reception) where the CPU load on the processor is more than what is expected.

Experiment 1: Read from physical UART

The UART configuration is:

  • Baud Rate: 1,000,000 bps
  • Bits: 8
  • Parity: none
  • Stop bits: 1
  • Flow control: none

The test application used was a simple terminal application which would receive data from the serial port (ttymxc1), just reading in the data into memory and not do anything in particular with it (no storage or further processing).

The CPU load for a data rate of 1000 KiBps was around 35%.

The data read was done in two ways:

1. IO Channel

  • Here, the read function from iochannel.h was used.

2. Thread

  • Here, the UNIX read function was used.

In both cases, the CPU usage was similar.

Experiment 2: Read from virtual serial device

The same application was used with the same device this time over USB (ttyUSB1) with a UART-USB converter.

The CPU usage was around 2% for the same load as ‘Experiment 1’.

DMA status

The following files in /sys/class/dma/ contain zero which I think indicates the DMA is disabled for UART2:

  • dma2chan27
  • dma2chan28

The above channels according to iMX6ULL reference manual are used for UART2 RX and TX respectively.


What could the reason be for this increased CPU usage while using the on board UART?
I would try enabling DMA, if it weren’t for this line in the Toradex FAQ.

Sometimes the DMA peripheral may affect the performance of the UART interface. Try disabling the DMA for RX by editing the device tree according to:

Yes, for some reason all imx6ul/ull UARTs have DMA channels not defined in imx6ul.dtsi. It is so at least since kernel 4.14. Coming 5.15 kernel already has these defined. I tried adding them from here, seems working, though I didn’t check CPU usage at high bitrate. Please diff imx6ul.dtsi. Since uart8 is at different address on imx6ull (than in imx6ul), it is defined in imx6ull.dtsi, please add uart8 DMA settings in imx6ul.dtsi. UART DMA channels are correct in 5.15 DT.

I can’t answer your question but I can say that on Toradex hardware I use this library

and I thread off the read/write logic. One thread per I/O direction.

I haven’t gotten back to working on that library with the creator. I wanted to add some features from the old DOS GreenLeaf CommLib. That used ring buffers and it had logic for packet definition. You could define it by start char and length, start/stop characters plural, etc.

The “reading logic” was in its own thread just grabbing bytes from the UART, stuffing them in the ring buffer, then performing a quick scan of the buffer (if it was long enough) to see if a packet was ready. Then it waited for another interrupt from the UART.

When a packet was ready it signaled out to the real application.

Too many serial I/O applications perform a blocking read in a while loop rather than operating as an interrupt driven TSR type application. On many platforms this causes an amazing amount of CPU consumption. I never figured out why, I just quit doing that and then I didn’t have to figure out why.

Roland

Enabling DMA

I tried enabling DMA by patching the DTSI as was done in 5.15. It seems to be working.
The dma files in /sys/class still read 0. Can you confirm this on your device?

Observations

The CPU usage has reduced considerably, almost to the levels of that seen with USB. Thank you!

However, I have observed spikes of up to 50% almost every 5 seconds. Do you know of anything that could cause this?
“Spikes” are observed with the USB method (experiment 2) as well, but only to about 5%. These “spikes” have the same pattern as that with UART+DMA.

At the moment, there is just one port opened. Most of our traffic is in receive, very little is transmitted except for the initial handshake.
I did a quick perusal of the library and it looks similar to what we have implemented.
I will keep this in mind. Thanks!

Hi @nandanv ,

Is the issue solved ?

Hi @sahil.tx ,

No, I am waiting for response on this query.

Hi @nandanv,
How much data you are sending at once.
Is it possible to share the “simple terminal application” that you tested with?

Hello @nandanv,
I use a iMX6ULL for serial communication too. The sdma is activated and I have the same spikes of CPU loading.
The same application on a SAMA5 has an higher average of CPU loading but without spike.
Do you continue to investigate or not ?
I used “htop”, “ftrace” and “latencutop”, without new information. The IRQs look to use a small amount of time. I’m tracing the workqueue latency.

@sahil.tx I am unable to track down the application.
Apologies for the late reply.

@mchalain
I did not investigate further.
The DMA for UART is disabled.

Hi @nandanv ,
Can you update me on the status of the issue. Do you stil want us to investigate into the issue.
Please test with the latest BSP and see if the issue still exist.
If you find the issue again, could you create a new query with all the necessary details?