Colibri UART-A and UART-C: high read() latency

Hello,

We’re facing issues using the UART on the Colibri-IMX8X.
We use a custom carrierboard which connects all 3 UARTs to different RS-485 transeivers.
(In production, a baudrate of 38400 is used, I tested with 115200, but there isn’t a big difference when changing it, regarding this problem).
We don’t use RTS/CTS functionality on the UART and we don’t configure it as RS-485, this part is handled by some logic on the board directly.

When reading from the UART-A and UART-C (ttyLP3 and ttyLP2), it takes very long (~130ms) for the read() function to return data. On UART-B (ttyLP0) this is not an issue, latency is much better here (around 10ms).
On the RS-485, I have a test device that sends out the following data with a delay of ~10ms in alternating order and 1 byte counting up each time:

uint8_t Msg1[] = {0x08, 0x00, 0x00, 0x64, 0x02, 0x10, 0x11, **counter**};
uint8_t Msg2[] = {0x1B, 0x30, 0x00, 0x00, 0x65, 0x05, 0x31, **counter**, 0x32, 0x33, 0x34, 0x35, 0x36};

Generally, on the positve side, there’s no data lost on all uarts, but the read delay of 130ms is too long for our application.

There’s no other load on the system that would cause an influence that high.
I wrote some test code (see below), so others here might be able to reproduce this.

Result for UART-A+C:
Console:

main:272 (6): recvd 126 bytes
main:274 (6): read() took t1->t2=131856us
main:272 (6): recvd 126 bytes
main:274 (6): read() took t1->t2=131798us
main:272 (6): recvd 127 bytes
main:274 (6): read() took t1->t2=142043us
main:272 (6): recvd 133 bytes
main:274 (6): read() took t1->t2=132169us
main:272 (6): recvd 126 bytes
main:274 (6): read() took t1->t2=131828us
main:272 (6): recvd 126 bytes
main:274 (6): read() took t1->t2=131837us
main:272 (6): recvd 127 bytes
main:274 (6): read() took t1->t2=142492us
main:272 (6): recvd 128 bytes
main:274 (6): read() took t1->t2=132014us
main:272 (6): recvd 128 bytes
main:274 (6): read() took t1->t2=132020us
main:272 (6): recvd 134 bytes
main:274 (6): read() took t1->t2=132265us

I also made some tests with an oscilloscope to verify that my measurements are correct, just using a Pin as a trigger signal (Yellow: Trigger pin, low after read, Blue: RS485 data).


As you can see, the delay is indeed very high and corresponds with the measured delay in software.

Result for UART-B:
Console:

main:274 (6): read() took t1->t2=10952us
main:272 (6): recvd 5 bytes
main:274 (6): read() took t1->t2=757us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10181us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10965us
main:272 (6): recvd 5 bytes
main:274 (6): read() took t1->t2=758us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10187us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10953us
main:272 (6): recvd 5 bytes
main:274 (6): read() took t1->t2=752us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10181us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10956us
main:272 (6): recvd 5 bytes
main:274 (6): read() took t1->t2=756us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10193us
main:272 (6): recvd 8 bytes
main:274 (6): read() took t1->t2=10958us

Here the trace with the oscilloscope (Yellow: Trigger pin, low after read, Blue: RS485 data)

And some closer lookup, you can see the data is read quite quickly as soon as it is received (~700-800us after receiving the first bit):


The 10ms are because of the send delay of the test-device on the bus (see above). So this port works without issues.

Here’s my test code, compile it using the sdk (or on the target, if build tools are installed).
serial-test-minimal.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <inttypes.h>
#include <time.h>
#include <unistd.h>
#include <poll.h>
#include <fcntl.h>
#include <termios.h>
#include <linux/serial.h>
#include <sys/ioctl.h>


//#define USE_GPIOLIB_FOR_TRIGGER

#ifdef USE_GPIOLIB_FOR_TRIGGER
#include "noxgpio.h"
#include "nox_global.h"
#include "nox_internal.h"

#define GPIO_TRIGGER(f_High) noxGPIO_SetDO_NXB3TxDisable(f_High)

#else

#define LOG_TEXT(prio,txt,...) printf("%s:%d (%d): " txt "\n",__func__,__LINE__,prio,##__VA_ARGS__)

enum ReturnCodes
{
    /* General error return values */
    ERR_NONE                                 =0,  /* No error, success */
    ERR_NOT_INITIALISED                      =-1, /* Unit has not been initialised */
    ERR_INVALID_PARAM                        =-2, /* Invalid paremeter passed to function */
    ERR_BUF_TOO_SMALL                        =-3, /* Buffer is too small to hold the result */
    ERR_NONMEM                               =-4, /* Failed to allocate memory */
    ERR_IO                                   =-5, /* Some Error happened performing I/O operations */
    ERR_PERMISSION                           =-6, /* Couldn't access ressource/perform action because permissions */
    ERR_NOT_AVAILABLE                        =-7, /* Ressource is not available (e.g. GPIO pins on some board revisions) */

    ERR_IPC_PROTOCOL_USAGE                   =-10, /* IPC command failed due to invalid format (see nox_ipc_global.h) */
    ERR_IPC_COMMAND_EXECUTION                =-11, /* IPC command failed to execute (see nox_ipc_global.h)  */

    ERR_INTERNAL                             =-100,/* Internal error, see syslog */
};

enum
{
    LOG_EMERG=0,
    LOG_ALERT,
    LOG_CRIT,
    LOG_ERR,
    LOG_WARNING,
    LOG_NOTICE,
    LOG_INFO,
    LOG_DEBUG
};

#define GPIO_TRIGGER(f_High)
#endif



typedef enum
{
    SBAUD_9600=9600,
    SBAUD_19200=19200,
    SBAUD_38400=38400,
    SBAUD_115200=115200,
}serial_baudrate;



typedef struct
{
    serial_baudrate baudrate;

}serialport_settings_t;

int32_t serialport_open(const char *path,
                        serialport_settings_t *settings,
                        void **fd_ptr)
{
    int tmp;
    int fd;
    struct termios options;

    switch(settings->baudrate)
    {
        case SBAUD_9600:
            tmp=B9600;
            break;
        case SBAUD_19200:
            tmp=B19200;
            break;
        case SBAUD_38400:
            tmp=B38400;
            break;
        case SBAUD_115200:
            tmp=B115200;
            break;
        default:
            LOG_TEXT(LOG_ERR,"Invalid/Unimplemented baudrate=%d",settings->baudrate);
            return(ERR_INTERNAL);
    }

    if((fd=open(path, O_RDWR | O_NDELAY | O_NOCTTY)) < 0)
    {
        LOG_TEXT(LOG_ERR,"open() failed (%d): %s",errno,strerror(errno));
        return(ERR_IO);
    }

    if(tcgetattr(fd, &options) == -1)
    {
        LOG_TEXT(LOG_ERR,"tcgetattr() failed (%d): %s",errno,strerror(errno));
        close(fd);
        return(ERR_IO);
    }

    options.c_cflag=tmp | CS8 | CLOCAL | CREAD;
    options.c_iflag=IGNPAR | IGNBRK;
    options.c_oflag=0;
    options.c_lflag=0;

    options.c_cc[VTIME]=0;
    options.c_cc[VMIN]=1;

    if(tcsetattr(fd, TCSANOW, &options) == -1)
    {
        LOG_TEXT(LOG_ERR,"tcsetattr() failed (%d): %s",errno,strerror(errno));
        close(fd);
        return(ERR_IO);
    }

    /* Apply the settings */
	if(tcflush(fd, TCIFLUSH) == -1)
    {
        LOG_TEXT(LOG_ERR,"tcflush() failed (%d): %s",errno,strerror(errno));
        close(fd);
        return(ERR_IO);
    }

    /* Set blocking mode */
    if(fcntl(fd, F_SETFL, 0) == -1)
    {
        LOG_TEXT(LOG_ERR,"fcntl() failed (%d): %s",errno,strerror(errno));
        close(fd);
        return(ERR_IO);
    }

    struct serial_struct serial={0};

    if(ioctl(fd, TIOCGSERIAL, &serial) == -1)
    {
        LOG_TEXT(LOG_ERR,"ioctl() failed (%d): %s",errno,strerror(errno));
        close(fd);
        return(ERR_IO);
    }

    LOG_TEXT(LOG_DEBUG,"type=%d Recvbuf=%d flags=%X",serial.type,serial.xmit_fifo_size,serial.flags);

    *((intptr_t**)fd_ptr)=(intptr_t*)fd;

    return(ERR_NONE);
}

void serialport_close(void *fd)
{
    close((intptr_t)fd);
}

int32_t serialport_read(void *fd,
                        uint8_t *buf,
                        uint32_t bufLen)
{
    ssize_t tmp=bufLen;

    if((!fd) || (!buf) || (!bufLen))
    {
        return(ERR_INVALID_PARAM);
    }

    if(bufLen > INT32_MAX)
    {
        LOG_TEXT(LOG_ERR,"bufLen too big, is=%u, max=%d",bufLen,INT32_MAX);
        return(ERR_INVALID_PARAM);
    }

    tmp=read((int)fd,(void*)buf,tmp);

    if(tmp == -1)
    {
        LOG_TEXT(LOG_ERR,"read() failed (%d): %s",errno,strerror(errno));
        return(ERR_IO);
    }

    return(tmp);
}

uint64_t difftimeUs(const struct timespec *start, const struct timespec *end)
{
    // Calculate the difference in seconds and nanoseconds
    long seconds = end->tv_sec - start->tv_sec;
    long nanoseconds = end->tv_nsec - start->tv_nsec;

    // Convert the total difference into milliseconds
    uint64_t microseconds = (uint64_t)(seconds * 1000000) + (nanoseconds / 1000);

    // Return the absolute value of the difference
    return(microseconds);
}


int main()
{
    int32_t ret;
    void *pHandle;
    serialport_settings_t serial_settings;
    const char *serial_path="/dev/ttyLP0"; /* Change port here and recompile for testing */
    uint8_t rdBuf[512];
    struct timespec t1, t2;
    uint32_t totalLen;

    serial_settings.baudrate=SBAUD_115200;

#ifdef USE_GPIOLIB_FOR_TRIGGER
    uint32_t options=OPT_LOG_LEVEL_STDOUT_INFO;


    if((ret=initInternal(options,NULL,"serialport-test")) != ERR_NONE)
    {
        fprintf(stderr,"initInternal() failed (%d)\n",ret);
        return(EXIT_FAILURE);
    }

    if((ret=noxGPIO_InitLib(options,"serialport-test")) != ERR_NONE)
    {
        fprintf(stderr,"noxGPIO_InitLib() failed (%d)\n",ret);
        return(EXIT_FAILURE);
    }
#endif
    GPIO_TRIGGER(0);

    if((ret=serialport_open(serial_path,&serial_settings,&pHandle)) != ERR_NONE)
    {
        LOG_TEXT(LOG_ERR,"serialport_open() failed: %d",ret);
        return(EXIT_FAILURE);
    }

    clock_getres(CLOCK_MONOTONIC_RAW,&t1);

    LOG_TEXT(LOG_INFO,"Clock resolution: %ld ns",t1.tv_nsec);

    totalLen=0;
    while(1)
    {
        clock_gettime(CLOCK_MONOTONIC_RAW,&t1);
        GPIO_TRIGGER(1);

        if((ret=serialport_read(pHandle,
                                &rdBuf[totalLen],
                                sizeof(rdBuf) - totalLen - 1)) < 0)
        {
            LOG_TEXT(LOG_ERR,"serialport_read() failed: %d",ret);
            return(EXIT_FAILURE);
        }

        clock_gettime(CLOCK_MONOTONIC_RAW,&t2);
        GPIO_TRIGGER(0);

        totalLen=0;

        LOG_TEXT(LOG_INFO,"recvd %d bytes",ret);

        LOG_TEXT(LOG_INFO,
                 "read() took t1->t2=%" PRIu64 "us",
                 difftimeUs(&t1,&t2));

    }

    serialport_close(pHandle);

    return(EXIT_SUCCESS);
}


Is this a known issue? How can we fix this?

System info:
Colibri IMX8X with BSP 7.0

If you need more information, please let me know.

Thanks and regards
Simon

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Based on the context provided, the high read latency on UART-A (ttyLP3) and UART-C (ttyLP2) compared to UART-B (ttyLP0) on the Colibri-IMX8X could be due to several factors. Here are some potential areas to investigate and steps to consider:

  1. UART Configuration: Double-check the configuration settings for UART-A and UART-C. Ensure that the termios settings are correctly applied and consistent with those of UART-B. Pay special attention to the c_cflag, c_iflag, c_oflag, and c_lflag settings.

  2. Driver and Kernel: There might be differences in how the kernel or the UART driver handles these specific UART interfaces. Check if there are any known issues or patches related to UART handling in the BSP version you are using (BSP 7.0).

  3. Hardware Differences: Since you are using a custom carrier board, verify if there are any hardware differences or additional components (e.g., RS-485 transceivers) that might affect the performance of UART-A and UART-C.

  4. System Load and Interrupt Handling: Although you mentioned there is no other load on the system, ensure that the interrupt handling for these UARTs is not being delayed or blocked by other processes. You might want to check the system’s interrupt configuration and priorities.

  5. Testing with Different Drivers: If possible, test with different UART drivers or configurations to see if the issue persists. This might help identify if the problem is driver-related.

  6. Consult Toradex Support: Since this issue is critical for your application, consider reaching out to Toradex support for further assistance. They might have more insights or specific patches for the Colibri-IMX8X platform.

If these steps do not resolve the issue, further investigation into the specific hardware and software environment might be necessary.

Please, also check if these links can help you solve your problem:

IMX7d RS485 configuration.

Hi, have you tried using 0 for options.c_cc[VMIN] and use poll() to wait for data and manage timeout manually?

For example:

int read_tty(void *dst, size_t *psize, int timeout)
{
	struct pollfd fds;
	ssize_t nread;
	size_t size = *psize, totalread = 0;
	int ret = 0;

	fds.fd = ctx->fd;
	fds.events = POLLIN;
	while (size) {
		/* block until data starts to arrive */
		ret = poll(&fds, 1, timeout);
		if (ret < 0)
			return -errno;

		if (ret == 0) {
			/* timeout or end of reception */
			break;
		}
		ret = 0;
		nread = read(ctx->fd, dst, size);
		if (nread > 0) {
			size -= nread;
			dst += nread;
			totalread += nread;
			continue;
		}
		if (nread < 0 && errno != EAGAIN)
			return -errno;
	}
	*psize = totalread;
	return 0;
}

Hello @kaam1

Thanks for your hints!

I’ve tested your changes, unfortunately, they don’t solve the problem (poll() takes about the same time as the blocking read did before).

But I think I found the solution: In the devicetree, for the 2 affected ports there’s a DMA option defined, for the working port there isnt. for comparison, this is lpuart3 (Colibri UART-A) with high delay:
https://git.toradex.com/cgit/linux-toradex.git/tree/arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi?h=toradex_6.6-2.1.x-imx#n171

lpuart3: serial@5a090000 {
		reg = <0x5a090000 0x1000>;
		interrupts = <GIC_SPI 348 IRQ_TYPE_LEVEL_HIGH>;
		clocks = <&uart3_lpcg 1>, <&uart3_lpcg 0>;
		clock-names = "ipg", "baud";
		assigned-clocks = <&clk IMX_SC_R_UART_3 IMX_SC_PM_CLK_PER>;
		assigned-clock-rates = <80000000>;
		power-domains = <&pd IMX_SC_R_UART_3>;
		power-domain-names = "uart";
		dma-names = "tx","rx";
		dmas = <&edma2 15 0 0>,
			<&edma2 14 0 1>;
		status = "disabled";
	};

This is for the working lpuart0 (Colibri UART-B)

lpuart0: serial@5a060000 {
		reg = <0x5a060000 0x1000>;
		interrupts = <GIC_SPI 345 IRQ_TYPE_LEVEL_HIGH>;
		clocks = <&uart0_lpcg 1>, <&uart0_lpcg 0>;
		clock-names = "ipg", "baud";
		assigned-clocks = <&clk IMX_SC_R_UART_0 IMX_SC_PM_CLK_PER>;
		assigned-clock-rates = <80000000>;
		power-domains = <&pd IMX_SC_R_UART_0>;
		status = "disabled";
	};

When I remove the dma properties, then the port seem to behave correctly.

&lpuart3 {
    /* Remove dma, this causes very high delays on read() */
    /delete-property/ dma-names;
    /delete-property/ dmas;
    pinctrl-names = "default";
    status= "okay";
};

I will verify this later this week with all ports and check back here again, but pretty sure that this is the root cause.

Regards,
Simon

Hello @SimonG,

From our investigation, we arrived at the same conclusion as well. The ports using dma (lpuart3 and lpuart2) seems to rely on a timeout. Removing dma will definitely improve the latency. Let us know how your tests go. Also, thanks for sharing the solution here.

Hello @rudhi.tx

I can confirm that my tests were successful, the behaviour on all 3 ports is now as expected (same as on Colibri UART-B above). Therefore, I see this issue as resolved, I hope there’s no negative side effect without the dma.

Regards,
Simon

1 Like