Torizon OS crash with RX Flooding on UART

Issue

We are developing a system on Torizon OS with an iMX8P-Plus. There is a “server” backend running in a container that uses verdin-uart2 device to communicate with a board running on a microcontroller. So far things were smooth and working well until we introduced a larger protocol in the communication. There is now a lot more bytes being exchanged. And suddenly, we are seeing crashes on the kernel with “RX flood detected: soft reset”:

May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Sys addr:  0x00000020 | Version:  0x00000002
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000000
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Argument:  0x01ee83e8 | Trn mode: 0x0000002b
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Present:   0x01f88008 | Host ctl: 0x00000030
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-sdma 30bd0000.dma-controller: restart cyclic channel 5
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Wake-up:   0x00000008 | Clock:    0x0000000f
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Timeout:   0x0000008f | Int stat: 0x00000002
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Int enab:  0x117f100b | Sig enab: 0x117f100b
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Caps:      0x07eb0000 | Caps_1:   0x0000b407
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Cmd:       0x0000193a | Max curr: 0x00ffffff
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00000900
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Host ctl2: 0x00000008
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x50000220
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: cmd debug status:  0x2120
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: data debug status:  0x2200
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: trans debug status:  0x2300
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: dma debug status:  0x2400
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: adma debug status:  0x2510
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: fifo debug status:  0x2680
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: async fifo debug status:  0x2750
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ============================================

What is the recommended best way to investigate and deal with that? Is increasing the baud rate the best option? Is it more a problem of the SerialPort code?

Information

  • UART configuration, usage in code:
    • imx8mp.dtsi:
uart2: serial@30890000 {
	compatible = "fsl,imx8mp-uart", "fsl,imx6q-uart";
	reg = <0x30890000 0x10000>;
	interrupts = <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
	clocks = <&clk IMX8MP_CLK_UART2_ROOT>,
		 <&clk IMX8MP_CLK_UART2_ROOT>;
	clock-names = "ipg", "per";
	dmas = <&sdma1 24 4 0>, <&sdma1 25 4 0>;
	dma-names = "rx", "tx";
	status = "disabled";
};
  • imx8mp-verdin.dtsi:
/* Verdin UART_2 */
&uart2 {
	pinctrl-names = "default";
	pinctrl-0 = <&pinctrl_uart2>;
	uart-has-rtscts;
};
  • imx8mp-custom.dtsi:
/* Verdin UART_2 */
&uart2 {
	status = "okay";
	linux,rs485-enabled-at-boot-time;
	rs485-rts-active-low;
	rs485-rx-during-tx;
};
  • Serial port code:
ErrorCode SerialPort::open(const std::string& device)
{
    if (device.empty())
    {
        spdlog::error("Name cannot be empty");
        return ERR_DRIVERS_SERIAL_INVALID_DEVICE;
    }

    _fd = ::open(device.c_str(), O_RDWR | O_NOCTTY);
    if (_fd < 0)
    {
        spdlog::error("Error opening device {}, errno={}, {}.",
                      device,
                      errno,
                      Poco::Error::Error::getMessage(errno));
        return ERR_DRIVERS_SERIAL_IO_ERROR;
    }

    // Set serial port to raw mode - simpler and more reliable approach
    termios options {};
    if (tcgetattr(_fd, &options) < 0)
    {
        spdlog::error("Failed to get terminal attributes, errno={}", errno);
        ::close(_fd);
        _fd = -1;
        return ERR_DRIVERS_SERIAL_IO_ERROR;
    }

    // Configure for raw mode
    cfmakeraw(&options);

    // Set specific parametes. Most parameters are set by cfmakeraw
    // (https://linux.die.net/man/3/cfmakeraw)
    cfsetspeed(&options, B921600);  // 921600 baud
    options.c_cflag |= CLOCAL | CREAD;  // Enable receiver, ignore modem control lines
    options.c_cflag &= ~CRTSCTS;  // Disable hardware flow control

    // Important for reliable binary data reception:
    // Set a reasonable timeout that allows collecting a complete sample
    // VMIN=0: Return immediately with whatever is available
    // VTIME=0: No timer
    options.c_cc[VMIN] = 0;
    options.c_cc[VTIME] = 0;

    // Flush existing data and apply settings
    if (tcsetattr(_fd, TCSAFLUSH, &options) < 0)
    {
        spdlog::error("Failed to set terminal attributes, errno={}", errno);
        ::close(_fd);
        _fd = -1;
        return ERR_DRIVERS_SERIAL_IO_ERROR;
    }

    // Clear input and output buffers
    tcflush(_fd, TCIOFLUSH);

    return ERR_SUCCESS;
}

void SerialPort::close()
{
    if (_fd != -1)
    {
        ::close(_fd);
        _fd = -1;
    }
}

ErrorCode SerialPort::write(size_t& bytes_written, const char* data, size_t size)
{
    bytes_written = 0u;

    // Ensure output buffer is clear
    tcdrain(_fd);

    const ssize_t ret = ::write(_fd, data, size);
    if (ret < 0)
    {
        spdlog::error("::write errno={} ({})", errno, Poco::Error::Error::getMessage(errno));
        return ERR_NDRIVERS_SERIAL_WRITE_ERROR;
    }

    // Ensure all data is transmitted before returning
    tcdrain(_fd);

    bytes_written = static_cast<size_t>(ret);
    return ERR_SUCCESS;
}

ErrorCode SerialPort::read(size_t& bytes_read,
                           char* data,
                           std::size_t size,
                           unsigned int timeout_ms)
{
    /// Use 'select' for an efficient way to manage I/O operations
    /// on multiple file descriptors without having to resort to polling,
    /// which can be resource-intensive.
    /// It allows a single thread to monitor and manage I/O operations
    /// on multiple file descriptors in a non-blocking manner.
    /// source: https://man7.org/linux/man-pages/man2/select.2.html
    bytes_read = 0u;
    fd_set readfds;
    FD_ZERO(&readfds);
    FD_SET(_fd, &readfds);

    struct timeval timeout
    {
    };
    timeout.tv_sec = timeout_ms / 1000;
    timeout.tv_usec = (timeout_ms % 1000) * 1000;

    int const result = select(_fd + 1, &readfds, nullptr, nullptr, &timeout);
    if (result < 0)
    {
        spdlog::error("Descriptor error for reading");
        return ERR_DRIVERS_SERIAL_READ_DESC_ERROR;
    }
    else if (result == 0)
    {
        spdlog::error("No available descriptors for reading");
        return ERR_DRIVERS_SERIAL_READ_NO_DESC;
    }
    else
    {
        ssize_t const ret = ::read(_fd, data, size);
        if (ret < 0)
        {
            spdlog::error("::read errno={} ({}).", errno, Poco::Error::Error::getMessage(errno));
            return ERR_DRIVERS_SERIAL_READ_ERROR;
        }
        bytes_read = static_cast<size_t>(ret);
    }

    return ERR_SUCCESS;
}
  • Crash Log:
May 15 14:37:20 verdin-imx8mp-14772918 aktualizr-torizon[1078]: Current versions in storage and reported by OSTree do not match
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)

// ...

May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
May 15 14:37:30 verdin-imx8mp-14772918 kernel: watchdog: Watchdog detected hard LOCKUP on cpu 3
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
May 15 14:37:30 verdin-imx8mp-14772918 kernel: Modules linked in: xt_nat
May 15 14:37:30 verdin-imx8mp-14772918 kernel: terminate-all-tasks(e)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  xt_tcpudp
May 15 14:37:30 verdin-imx8mp-14772918 kernel: memory-full-oom-kill(f)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  xt_conntrack
May 15 14:37:30 verdin-imx8mp-14772918 kernel: kill-all-tasks(i)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  xt_MASQUERADE
May 15 14:37:30 verdin-imx8mp-14772918 kernel: thaw-filesystems(j)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  nf_conntrack_netlink
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sak(k)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  xfrm_user
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-backtrace-all-active-cpus(l)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  iptable_nat
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-memory-usage(m)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  xt_addrtype
May 15 14:37:30 verdin-imx8mp-14772918 kernel: nice-all-RT-tasks(n)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  iptable_filter
May 15 14:37:30 verdin-imx8mp-14772918 kernel: poweroff(o)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  ip_tables
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-registers(p)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  x_tables
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-all-timers(q)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  br_netfilter
May 15 14:37:30 verdin-imx8mp-14772918 kernel: unraw(r)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  bridge
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sync(s)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  stp
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-task-states(t)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  llc
May 15 14:37:30 verdin-imx8mp-14772918 kernel: unmount(u)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  usb_conn_gpio
May 15 14:37:30 verdin-imx8mp-14772918 kernel: force-fb(v)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  joydev
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-blocked-tasks(w)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  dwc3
May 15 14:37:30 verdin-imx8mp-14772918 kernel: dump-ftrace-buffer(z)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  aes_ce_cipher
May 15 14:37:30 verdin-imx8mp-14772918 kernel:
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  bnep
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP :
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  roles
May 15 14:37:30 verdin-imx8mp-14772918 kernel: loglevel(0-9)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  mwifiex_sdio
May 15 14:37:30 verdin-imx8mp-14772918 kernel: reboot(b)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  fsl_imx8_ddr_perf
May 15 14:37:30 verdin-imx8mp-14772918 kernel: crash(c)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  mwifiex
May 15 14:37:30 verdin-imx8mp-14772918 kernel: terminate-all-tasks(e)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  ghash_ce
May 15 14:37:30 verdin-imx8mp-14772918 kernel: memory-full-oom-kill(f)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  phy_fsl_imx8m_pcie
May 15 14:37:30 verdin-imx8mp-14772918 kernel: kill-all-tasks(i)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  sha2_ce
May 15 14:37:30 verdin-imx8mp-14772918 kernel: thaw-filesystems(j)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  sha256_arm64
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sak(k)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  sha1_ce
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-backtrace-all-active-cpus(l)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  cfg80211
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-memory-usage(m)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  snd_soc_fsl_sai
May 15 14:37:30 verdin-imx8mp-14772918 kernel: nice-all-RT-tasks(n)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  snd_soc_fsl_utils
May 15 14:37:30 verdin-imx8mp-14772918 kernel: poweroff(o)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  hci_uart
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-registers(p)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  btqca
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-all-timers(q)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  btbcm
May 15 14:37:30 verdin-imx8mp-14772918 kernel: unraw(r)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  phy_fsl_imx8mq_usb
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sync(s)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  ti_ads1015
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-task-states(t)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  imx_pcm_dma
May 15 14:37:30 verdin-imx8mp-14772918 kernel: unmount(u)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  btintel
May 15 14:37:30 verdin-imx8mp-14772918 kernel: force-fb(v)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  industrialio_triggered_buffer
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-blocked-tasks(w)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  kfifo_buf
May 15 14:37:30 verdin-imx8mp-14772918 kernel: dump-ftrace-buffer(z)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  industrialio
May 15 14:37:30 verdin-imx8mp-14772918 kernel:
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  imx_bus
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  dwc3_imx8mp imx_sdma
May 15 14:37:30 verdin-imx8mp-14772918 kernel: reboot(b) crash(c)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  ili210x lm75
May 15 14:37:30 verdin-imx8mp-14772918 kernel: terminate-all-tasks(e) memory-full-oom-kill(f)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  crc_ccitt snd_soc_nau8822
May 15 14:37:30 verdin-imx8mp-14772918 kernel: kill-all-tasks(i) thaw-filesystems(j)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  fsl_jr_uio spi_imx
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sak(k) show-backtrace-all-active-cpus(l)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  nvmem_snvs_lpgpr imx8mm_thermal
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-memory-usage(m) nice-all-RT-tasks(n)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  imx8mp_ldb fsl_imx_ldb
May 15 14:37:30 verdin-imx8mp-14772918 kernel: poweroff(o) show-registers(p)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  phy_fsl_imx8mp_lvds uio_pdrv_genirq
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-all-timers(q) unraw(r)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  uio openvswitch
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sync(s) show-task-states(t)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  nsh nf_conncount
May 15 14:37:30 verdin-imx8mp-14772918 kernel: unmount(u) force-fb(v)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  nf_nat libcomposite
May 15 14:37:30 verdin-imx8mp-14772918 kernel: show-blocked-tasks(w) dump-ftrace-buffer(z)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  galcore(O)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  fuse
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  nfnetlink lontium_lt8912b
May 15 14:37:30 verdin-imx8mp-14772918 kernel: reboot(b) crash(c)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  ti_sn65dsi83 sec_mipi_dsim_imx
May 15 14:37:30 verdin-imx8mp-14772918 kernel: terminate-all-tasks(e) memory-full-oom-kill(f)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:  display_connector sec_dsim
May 15 14:37:30 verdin-imx8mp-14772918 kernel: kill-all-tasks(i) thaw-filesystems(j)
May 15 14:37:30 verdin-imx8mp-14772918 kernel:
May 15 14:37:30 verdin-imx8mp-14772918 kernel: sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s                                           ) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]: [2025-05-15T14:37:32Z ERROR] could not get sessions, trying later
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]:        0: Could not get session data from server
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]:        1: error sending request for url (https://dgw.torizon.io/ras/sessions): operation timed out
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]:        2: operation timed out
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]:
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]:     Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
May 15 14:37:32 verdin-imx8mp-14772918 rac[1128]:     Run with RUST_BACKTRACE=full to include source snippets.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)

// ...

May 15 14:37:34 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: Timeout waiting for hardware interrupt.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Sys addr:  0x00000020 | Version:  0x00000002
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000000
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Argument:  0x01ee83e8 | Trn mode: 0x0000002b
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Present:   0x01f88008 | Host ctl: 0x00000030
May 15 14:37:34 verdin-imx8mp-14772918 kernel: imx-sdma 30bd0000.dma-controller: restart cyclic channel 5
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Wake-up:   0x00000008 | Clock:    0x0000000f
May 15 14:37:34 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Timeout:   0x0000008f | Int stat: 0x00000002
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Int enab:  0x117f100b | Sig enab: 0x117f100b
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Caps:      0x07eb0000 | Caps_1:   0x0000b407
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Cmd:       0x0000193a | Max curr: 0x00ffffff
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00000900
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: Host ctl2: 0x00000008
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x50000220
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: cmd debug status:  0x2120
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: data debug status:  0x2200
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: trans debug status:  0x2300
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: dma debug status:  0x2400
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: adma debug status:  0x2510
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: fifo debug status:  0x2680
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci-esdhc-imx: async fifo debug status:  0x2750
May 15 14:37:35 verdin-imx8mp-14772918 kernel: mmc2: sdhci: ============================================
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
// ...
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:35 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:35 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 15 14:37:35 verdin-imx8mp-14772918 kernel: imx-sdma 30bd0000.dma-controller: restart cyclic channel 5
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
May 15 14:37:35 verdin-imx8mp-14772918 systemd[1]: tzn-mqtt.service: Main process exited, code=exited, status=101/n/a
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-bac                                           ktrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-task                                           s(w) dump-ftrace-buffer(z)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: new USB bus registered, assigned bus number 3
May 15 14:37:35 verdin-imx8mp-14772918 kernel: thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timer                                           s(q) unraw(r) sync(s) show-task-states(t)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: hcc params 0x0220fe65 hci version 0x110 quirks 0x000000a001000010
May 15 14:37:35 verdin-imx8mp-14772918 kernel: unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: irq 223, io mem 0x38100000
May 15 14:37:35 verdin-imx8mp-14772918 kernel: crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s                                           ) show-task-states(t)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: new USB bus registered, assigned bus number 4
May 15 14:37:35 verdin-imx8mp-14772918 kernel: unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: Host supports USB 3.0 SuperSpeed
May 15 14:37:35 verdin-imx8mp-14772918 kernel:
May 15 14:37:35 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.06
May 15 14:37:35 verdin-imx8mp-14772918 kernel: crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j)
May 15 14:37:35 verdin-imx8mp-14772918 kernel: xhci-hcd xhci-hcd.2.auto: USB bus 3 deregistered
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: The application panicked (crashed).
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Message:  could not run mqtt subscribe loop:
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]:    0: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]:    1: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]:    2: connection closed by peer
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Location:
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]:    /usr/src/debug/libstd-rs/1.75.0/rustc-1.75.0-src/library/core/src/convert/mod.rs:757
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Run with RUST_BACKTRACE=full to include source snippets.
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Location: src/main.rs:146
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
May 15 14:37:35 verdin-imx8mp-14772918 tzn-mqtt[18002]: Run with RUST_BACKTRACE=full to include source snippets.
May 15 14:37:35 verdin-imx8mp-14772918 systemd[1]: tzn-mqtt.service: Failed with result 'exit-code'.
May 15 14:37:36 verdin-imx8mp-14772918 aktualizr-torizon[1078]: Invalid role in root.json
May 15 14:37:36 verdin-imx8mp-14772918 aktualizr-torizon[1078]: Invalid role in root.json

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Hello! Thank you for providing a detailed description of your issue and sharing relevant code and logs. Based on your setup (Torizon OS, Verdin iMX8M Plus, UART2 with DMA, high UART bandwidth use with a more complex protocol), let’s go through the main issues and recommendations for the “RX flood detected: soft reset” problem.


Understanding the Issue

You are seeing this error:

imx-uart 30890000.serial: RX flood detected: soft reset.

This indicates the UART RX buffer is being overwhelmed — likely due to data arriving faster than it can be processed by your application, or due to the lack of flow control.


Key Points from Your Configuration

  • UART2 is using DMA transfer (as per device tree).
  • Baud rate is 921600 (high, but supported).
  • Hardware flow control (uart-has-rtscts) is enabled in the device tree, but disabled in your code (options.c_cflag &= ~CRTSCTS).
  • Reads are done with select() and VMIN=0, VTIME=0, so your code reads whatever is available, immediately, which may not be fast enough if your read buffer is small or processing is delayed.
  • RX flood detected and imx-sdma: restart cyclic channel messages suggest the DMA RX buffer is being overrun.

Recommendations

You are essentially facing a classic serial/embedded system overrunning scenario: the incoming data stream is faster than the application’s ability to safely consume it. Here are best-practice steps to address this:

1. Enable Hardware Flow Control (RTS/CTS)

  • You have signaled hardware flow control possible in the device tree via uart-has-rtscts.
  • In your code, you’re disabling it. You should enable hardware flow control to let the kernel/hardware automatically regulate the RX line and prevent buffer overflow.
  • In your C++ code, remove or comment out this line:
    options.c_cflag &= ~CRTSCTS;  // Remove or comment this
    
    Instead, set hardware flow control:
    options.c_cflag |= CRTSCTS;  // Enable hardware flow control
    
  • This ensures the i.MX8M Plus UART hardware asserts the RTS signal when its FIFO is full so that the sender pauses transmission. This is crucial at high baud rates and high burst volumes.
  • See Verdin iMX8M Plus UART documentation.

2. Check Your Read Buffer Size and Processing Rate

  • When you use VMIN=0 and VTIME=0, reads return immediately with what’s available, risking many small reads and possible processing delays.
  • Consider using larger read buffers: read as much as possible per syscall.
  • Ensure your application reads promptly and at an appropriate rate — possibly dedicate a receive thread for serial reception with minimal delays.
  • If using select(), make sure your application’s main loop is not otherwise blocked or busy-waiting for long periods.
  • See general Linux serial port programming here.

3. Adjust Flow, Protocol, or Baud Rate if Needed

  • 921600 baud is supported, but bandwidth is always finite. If RX flooding persists even with flow control, you might need to either:
    • Further optimize your protocol to reduce bursts,
    • Increase application read buffer/processing performance,
    • Or, as a last resort, lower the baud rate.
  • Monitor for lost or corrupted data as further symptoms.

4. Monitor System Resource Utilization

  • Ensure your CPU isn’t overloaded (which could delay your code’s ability to read from the UART quickly enough).
  • Use tools like top or htop to ensure your process is consistently able to service the port.
  • Since the kernel is reporting watchdog and hard lockup, ensure you’re not causing long non-yielding sections in your code or starving systemd/services.

5. Optional: Protocol-level Flow Management

  • Even with hardware flow control, it’s wise to design your protocol so that large packets can tolerate interruptions or retries, should the hardware throttle transmission via RTS.
  • This is particularly helpful for robust error recovery.

6. Test after Each Change

  • Start with enabling hardware flow control and stress-test at your highest data rate.
  • If the problem persists, profile read timing and buffer usage in your application.

Useful Documentation


Summary

  • Enable hardware flow control (remove &= ~CRTSCTS, use |= CRTSCTS).
  • Read as much data as possible in each cycle, avoid delays.
  • Make sure your system and application are fast enough to keep up with the line speed.
  • As a last resort, consider lowering the baud rate or optimizing your protocol.

If you apply these changes, especially hardware flow control, you should see the “RX flood detected” issue resolved under most circumstances.

Best regards,
Toradex AI Assistant.

Dear @mnano,

I believe the suggestions made by @ToradexAI seem valid. Also note that its suggestion on lowering the baud rate is a valid one. Have you given any of these suggestions a try yet?

Best regards,
Collin

1 Like

Thanks for your response.

Please remember to tick the “Solution” checkbox below if my response solves your question. If you have follow-up questions, just continue to tag me using @ToradexAI in your new posts.

Did you like my answer? :+1: :-1:


Thank you for your feedback and for supporting the recommendations provided.

As mentioned, lowering the baud rate to something more standard—such as 115200—can often improve the stability of UART communication, especially if hardware flow control is not enabled or the system/application is occasionally busy and unable to read the UART buffer as fast as data arrives. You might find it useful to first enable hardware flow control as described, and if issues persist, then test with a lower baud rate to evaluate the impact on reliability.

For more information on configuring the serial port and typical baud rate recommendations, you may also refer to:

Feel free to report your findings or if you need more detailed guidance on applying these settings!

Best regards,
Toradex AI Assistant.

I did some additional tests, using some of the suggestions:

  • CPU Usage, Memory usage or temperature is never very high, so it does not seem to be the problem
  • If I enable hardware flow control in my code (options.c_cflag |= CRTSCTS), then no messages seems to be sent/received anymore. I have yet to check using a logic analyzer there but the other board uses an STM32 microcontroller and there are no messages received, no matter if hw flow control is activated there or not. Is there something else I need to do for hardware flow control to work? EDIT: After talking to the HW team, our custom board actually does not implement hardware flow control, should I change the device tree accordingly then ?
  • I refactored the app so that there is a dedicated receive thread. It did not change anything
  • Lower baud rate solve the issue (e.g. 230400 works without the issue) but we will need a higher baud rate in the future of the project

Is there a way to increase the DMA buffer size ?

Investigated things a bit more. Here is my understanding of what is happening:

Data path:

STM32 Board ----> UART HW RX FIFO → DMA → Kernel Driver Buffer → TTY Layer (n_tty) read_buf (4096 bytes) → SerialPort::read()

UART HW RX FIFO

  • Small buffer directly in UART peripheral
  • 32-entry FIFO (-> half-word, 64Bytes) based on verdin imx8mp datasheet
  • At 921600 baud (~92160 bytes/sec), a 64-byte FIFO fills up in approximately 0.69 milliseconds (64 bytes / 92160 bytes/sec). The STM32 board receives a command, acknowledge (~18 bytes) and start streaming data at 15 bytes every 5ms.
  • If the DMA or the interrupt handler doesn’t service this FIFO before it overflows, data is lost immediately, and the RX flood detected kernel message is triggered. This should be the primary point of failure in RX flood scenarios without hardware flow control, right ?

DMA

  • DMA controller is programmed to move data from the RX FIFO to a larger buffer in RAM managed by the driver (imx-uart)
  • Size of this DMA buffer or driver-level ring buffer can influence resilience to short processing delays, but it’s still finite. If application and the TTY layer above it are too slow to consume data from this driver buffer, it too can fill up, leading to backpressure that ultimately still results in the hardware FIFO overflowing if the sender doesn’t stop

TTY layer and read_buf

  • When application calls read(), the n_tty layer typically copies data from the serial driver’s buffers into its own read_buf and then makes it available to application.
  • A 4096-byte buffer at the TTY layer means that even if the application is momentarily slow to issue a read() call, up to 4096 bytes can accumulate in this TTY buffer after they’ve been successfully read from the hardware FIFO and DMA buffers by the lower-level driver. After profiling my application, it does not seem to be slow to issue read() calls.
  • At 921600 baud, this 4096-byte buffer provides a time window of roughly 44.4 milliseconds (4096 bytes / 92160 bytes/sec)

Can someone confirm this to me ?

Does this also mean that our STM32 board cannot send messages longer than HW FIFO capacity?

So it seems the linux kernel needs to schedule the interrupt/dma handler more frequently if we have no hardware flow control. Based on that, I have been testing my project with the PREEMPT_RT patch (torizon-rt distro) for the past 30 minutes (edit: half a day now). It seems that I don’t see the issue anymore. Does it make sense ?

Dear @mnano,

Thank you very much for your detailed descriptions and findings.
There actually is no hardware flow control for RS485, since these designated pins are used otherwise. I had a look into the error you are seeing and found the this commit, which states:

 * This is to be called from Rx ISRs only when some bytes were actually
 * received.
 *
 * A way to reproduce the flood (checked on iMX6SX) is: open iMX UART at 9600
 * 8N1, and from external source send 0xf0 char at 115200 8N1. In about 90% of
 * cases this starts a flood of "receiving" of 0xff characters by the iMX6 UART
 * that is terminated by any activity on RxD line, or could be stopped by
 * issuing soft reset to the UART (just stop/start of RX does not help). Note
 * that what we do here is sending isolated start bit about 2.4 times shorter
 * than it is to be on UART configured baud rate.
 */
static void imx_uart_check_flood(struct imx_port *sport, u32 usr2)
{
 /* To detect hardware 0xff flood we monitor RxD line between RX
 * interrupts to isolate "receiving" of char(s) with no activity
 * on RxD line, that'd never happen on actual data transfers.
 *
 * We use USR2_WAKE bit to check for activity on RxD line, but we have a
 * race here if we clear USR2_WAKE when receiving of a char is in
 * progress, so we might get RX interrupt later with USR2_WAKE bit
 * cleared. Note though that as we don't try to clear USR2_WAKE when we
 * detected no activity, this race may hide actual activity only once.
 *
 * Yet another case where receive interrupt may occur without RxD
 * activity is expiration of aging timer, so we consider this as well.
 *
 * We use 'idle_counter' to ensure that we got at least so many RX
 * interrupts without any detected activity on RxD line. 2 cases
 * described plus 1 to be on the safe side gives us a margin of 3,
 * below. In practice I was not able to produce a false positive to
 * induce soft reset at regular data transfers even using 1 as the
 * margin, so 3 is actually very strong.
 *
 * We count interrupts, not chars in 'idle-counter' for simplicity.
 */

 if (usr2 & USR2_WAKE) {
 imx_uart_writel(sport, USR2_WAKE, USR2);
 sport->idle_counter = 0;
 } else if (++sport->idle_counter > 3) {
 dev_warn(sport->port.dev, "RX flood detected: soft reset.");
 imx_uart_soft_reset(sport); /* also clears 'sport->idle_counter' */
 }
}

This states the question whether you are potentially sending 0xff often? If that is the case then maybe reverting this patch might solve it.
On another note: are you using multi-threading for your UART application?
Would it be possible for you to provide a small code, showcasing the issue you are experiencing?

Best regards,
Collin

I did manage to reproduce it relatively reliably with a minimal example. This example has a thread writing a command to the external STM32 and a reader thread which should receives samples every 5ms (sent by the external STM32 after it receives the command).

Here is the C++ code:

#include <atomic>
#include <chrono>
#include <csignal>
#include <cstring>
#include <iostream>
#include <mutex>
#include <string>
#include <thread>
#include <vector>

#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/select.h>
#include <termios.h>
#include <unistd.h>

const char* SERIAL_PORT_DEVICE = "/dev/verdin-uart2";
const speed_t BAUD_RATE = B921600;

// START_PEAK_SAMPLING command (16 bytes)
const std::vector<uint8_t> START_PEAK_SAMPLING_CMD = {
    0x0F, 0x0A, 0x02, 0x01, 0x23, 0x12, 0x05, 0x01, 0x01, 0x17, 0x27, 0x3A, 0x18, 0x8A, 0xE1, 0x07};
const size_t COMMAND_SIZE = START_PEAK_SAMPLING_CMD.size();

const size_t SAMPLE_MESSAGE_SIZE = 15;
const std::chrono::milliseconds SAMPLE_INTERVAL(5);

std::atomic<bool> keep_running(true);
std::mutex console_mutex;

/// @brief Safe print from multiple threads
void print_safe(const std::string& msg_prefix, const std::string& msg)
{
    std::lock_guard<std::mutex> lock(console_mutex);
    std::cout << "[" << msg_prefix << "] " << msg << std::endl;
}

/// @brief Signal handler for graceful shutdown (CTRL+C)
void signal_handler(int signum)
{
    print_safe("SIGNAL",
               "Interrupt signal (" + std::to_string(signum) + ") received. Shutting down...");
    keep_running = false;
}

/// @brief Configure serial port in raw mode
int configure_serial_port(int fd)
{
    struct termios tty;
    if (tcgetattr(fd, &tty) != 0)
    {
        print_safe("CONFIG", "Error from tcgetattr: " + std::string(strerror(errno)));
        return -1;
    }

    cfmakeraw(&tty);
    cfsetspeed(&tty, BAUD_RATE);

    // Control flags: No parity, 1 stop bit, 8-bit chars, receiver, no HW flow control
    tty.c_cflag |= CREAD | CLOCAL;
    tty.c_cflag &= ~CRTSCTS;

    // Read behavior: VMIN = 0, VTIME = 0
    // read() returns immediately with available bytes (or 0 if none).
    // select() will be used for polling/timeout.
    tty.c_cc[VMIN] = 0;
    tty.c_cc[VTIME] = 0;

    if (tcsetattr(fd, TCSANOW, &tty) != 0)
    {
        print_safe("CONFIG", "Error from tcsetattr: " + std::string(strerror(errno)));
        return -1;
    }
    // Flush both input and output buffers
    tcflush(fd, TCIOFLUSH);
    print_safe("CONFIG", "Serial port configured successfully.");
    return 0;
}

// Reading thread function
void reading_thread_func(int fd)
{
    print_safe("READER",
               "Thread started. Waiting for samples (" + std::to_string(SAMPLE_MESSAGE_SIZE)
                   + " bytes every " + std::to_string(SAMPLE_INTERVAL.count()) + "ms)...");
    size_t sample_count = 0;
    std::vector<uint8_t> sample_accumulation_buffer(SAMPLE_MESSAGE_SIZE * 2);
    size_t bytes_in_accumulation_buffer = 0;
    auto last_sample_time = std::chrono::steady_clock::now();

    while (keep_running)
    {
        auto loop_start_time = std::chrono::steady_clock::now();

        fd_set read_fds_sample;
        FD_ZERO(&read_fds_sample);
        FD_SET(fd, &read_fds_sample);

        struct timeval tv_sample_select;
        tv_sample_select.tv_sec = 0;
        // Wait for up to SAMPLE_INTERVAL, but be responsive.
        // Using SAMPLE_INTERVAL means we check for data at most once per interval if none arrives.
        tv_sample_select.tv_usec =
            std::chrono::duration_cast<std::chrono::microseconds>(SAMPLE_INTERVAL).count();

        int activity = select(fd + 1, &read_fds_sample, nullptr, nullptr, &tv_sample_select);

        if (activity < 0)
        {
            if (errno == EINTR && !keep_running)
            {
                // Interrupted by our signal
                break;
            }
            if (errno == EINTR)
            {
                print_safe("READER", "select interrupted, retrying.");
                continue;
            }
            print_safe("READER",
                       "Select error during sample reading: " + std::string(strerror(errno)));
            break;
        }

        if (activity > 0 && FD_ISSET(fd, &read_fds_sample))
        {
            ssize_t bytes_read_now =
                ::read(fd,
                       sample_accumulation_buffer.data() + bytes_in_accumulation_buffer,
                       sample_accumulation_buffer.size() - bytes_in_accumulation_buffer);

            if (bytes_read_now < 0)
            {
                print_safe("READER",
                           "Read error during sample reading: " + std::string(strerror(errno)));
                break;
            }
            if (bytes_read_now == 0)
            {
                print_safe("READER", "Port closed or EOF during sample reading.");
                break;
            }
            bytes_in_accumulation_buffer += static_cast<size_t>(bytes_read_now);

            while (bytes_in_accumulation_buffer >= SAMPLE_MESSAGE_SIZE)
            {
                sample_count++;

                // Print every sample
                std::string hex_output = "Sample #" + std::to_string(sample_count) + ": ";
                char hex_byte_str[4];
                for (size_t i = 0; i < SAMPLE_MESSAGE_SIZE; ++i)
                {
                    snprintf(
                        hex_byte_str, sizeof(hex_byte_str), "%02X ", sample_accumulation_buffer[i]);
                    hex_output += hex_byte_str;
                }
                print_safe("READER", hex_output);

                // To actually "use" the sample data
                std::memmove(sample_accumulation_buffer.data(),
                             sample_accumulation_buffer.data() + SAMPLE_MESSAGE_SIZE,
                             bytes_in_accumulation_buffer - SAMPLE_MESSAGE_SIZE);
                bytes_in_accumulation_buffer -= SAMPLE_MESSAGE_SIZE;
                last_sample_time = std::chrono::steady_clock::now();
            }
        }
        else
        {
            auto now = std::chrono::steady_clock::now();
            if (now - last_sample_time > std::chrono::seconds(2) && sample_count > 0)
            {
                print_safe(
                    "READER",
                    "No samples received for 2 seconds. Device might have stopped streaming.");
                keep_running = false;
            }
        }

        // Ensure the loop attempts to process at the SAMPLE_INTERVAL rate
        auto elapsed_this_loop = std::chrono::steady_clock::now() - loop_start_time;
        if (elapsed_this_loop < SAMPLE_INTERVAL && keep_running)
        {
            std::this_thread::sleep_for(SAMPLE_INTERVAL - elapsed_this_loop);
        }
    }
    print_safe("READER",
               "Thread finished. Total samples processed: " + std::to_string(sample_count));
}

/// @brief Writing thread function
void writing_thread_func(int fd)
{
    print_safe("WRITER", "Thread started.");

    // Give reader thread a moment to start up
    std::this_thread::sleep_for(std::chrono::milliseconds(100));

    print_safe(
        "WRITER",
        "Sending START_PEAK_SAMPLING command (" + std::to_string(COMMAND_SIZE) + " bytes)...");
    ssize_t bytes_written_cmd =
        ::write(fd, START_PEAK_SAMPLING_CMD.data(), START_PEAK_SAMPLING_CMD.size());

    if (bytes_written_cmd < 0)
    {
        print_safe("WRITER", "Write error for command: " + std::string(strerror(errno)));
        keep_running = false;
        return;
    }
    if (static_cast<size_t>(bytes_written_cmd) < START_PEAK_SAMPLING_CMD.size())
    {
        print_safe("WRITER",
                   "Partial write for command: " + std::to_string(bytes_written_cmd) + "/"
                       + std::to_string(START_PEAK_SAMPLING_CMD.size()) + " bytes.");
    }
    else
    {
        print_safe("WRITER", "Command sent successfully.");
    }

    if (tcdrain(fd) == -1)
    {
        print_safe("WRITER", "tcdrain failed after command send: " + std::string(strerror(errno)));
    }
    else
    {
        print_safe("WRITER", "Command flushed (tcdrain successful).");
    }

    print_safe("WRITER", "Thread finished.");
}

int main(int argc, char* argv[])
{
    std::string device_to_use = SERIAL_PORT_DEVICE;
    if (argc > 1)
    {
        device_to_use = argv[1];
        print_safe("MAIN", "Using device from command line: " + device_to_use);
    }
    else
    {
        print_safe("MAIN", "Using default device: " + std::string(SERIAL_PORT_DEVICE));
    }

    // Register signal handler for SIGINT (Ctrl+C)
    signal(SIGINT, signal_handler);

    print_safe("MAIN", "Opening serial port: " + device_to_use);
    int fd = ::open(device_to_use.c_str(), O_RDWR | O_NOCTTY);
    if (fd < 0)
    {
        print_safe(
            "MAIN",
            "Error opening serial port '" + device_to_use + "': " + std::string(strerror(errno)));
        return 1;
    }
    print_safe("MAIN", "Serial port opened. fd = " + std::to_string(fd));

    if (configure_serial_port(fd) < 0)
    {
        ::close(fd);
        return 1;
    }

    std::thread writer_thread(writing_thread_func, fd);
    std::thread reader_thread(reading_thread_func, fd);

    // The writer thread will finish relatively quickly.
    writer_thread.join();
    print_safe("MAIN", "Writer thread joined.");

    // Reader thread continues until keep_running is false
    // You can add a timeout to main if desired, e.g.:
    // std::this_thread::sleep_for(std::chrono::seconds(60));
    // keep_running = false;

    reader_thread.join();
    print_safe("MAIN", "Reader thread joined.");

    print_safe("MAIN", "Closing serial port.");
    if (::close(fd) < 0)
    {
        print_safe("MAIN", "Error closing serial port: " + std::string(strerror(errno)));
    }
    print_safe("MAIN", "Application finished.");
    return 0;
}

Here is a CMakeLists.txt for it:

project(rx_flood_mre LANGUAGES CXX)

add_executable(${PROJECT_NAME}
    main.cpp
)

target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_20)

Here is a dockerfile to create a container from it:

# Base image
FROM --platform=linux/arm64 torizon/debian-imx8:4

# Install following runtime dependencies:
#  - libc6: Standard libraries.
RUN apt-get -y update && \
    apt-get install -y --no-install-recommends \
    libc6 && \
    apt-get clean && \
    apt-get autoremove && \
    rm -rf /var/lib/apt/lists/*

# Install the application on the image
COPY rx_flood_mre /app/rx_flood_mre

ENTRYPOINT ["./app/rx_flood_mre"]

And that is the result on target:

torizon@verdin-imx8mp-14772918:~$ docker run --rm -it --device=/dev/verdin-uart2 --group-add dialout gitlab.company.ch:5050/software/projects/embedded/server/rx-flood-mre:arm64
[MAIN] Using default device: /dev/verdin-uart2
[MAIN] Opening serial port: /dev/verdin-uart2
[MAIN] Serial port opened. fd = 3
[CONFIG] Serial port configured successfully.
[WRITER] Thread started.
[READER] Thread started. Waiting for samples (15 bytes every 5ms)...
[WRITER] Sending START_PEAK_SAMPLING command (16 bytes)...
[WRITER] Command sent successfully.
[WRITER] Command flushed (tcdrain successful).
[WRITER] Thread finished.
[MAIN] Writer thread joined.
<--- We get the "RX flood detected here. If I disconnect the sensor there, we **might** not fully crash and get the rest of the log
[READER] Sample #1: 0C 0A 02 05 32 12 02 B0 01 18 CC DA 03 0C 0A
[READER] Sample #2: 02 05 32 12 02 B0 01 18 CC DA 03 0C 0A 02 05
[READER] Sample #3: 32 12 02 B0 01 18 CC DA 03 0C 0A 02 05 32 12
[READER] Sample #4: 02 B0 01 18 CC DA 03 0C 0A 02 05 32 12 02 B0
[READER] Sample #5: 01 18 CC DA 03 0C 0A 02 05 32 12 02 B0 01 18
[READER] Sample #6: CC DA 03 0C 0A 02 05 32 12 02 B0 01 18 CC DA
[READER] Sample #7: 03 0C 0A 02 05 32 12 02 B0 01 18 CC DA 03 0C
[READER] Sample #8: 0A 02 05 32 12 02 B0 01 18 CC DA 03 0C 0A 02
[READER] Sample #9: 05 32 12 02 B0 01 18 CC DA 03 0C 0A 02 05 32
[READER] Sample #10: 12 02 B0 01 18 CC DA 03 0C 0A 02 05 32 12 02
[READER] Sample #11: B0 02 18 FA 90 02 0C 0A 02 05 32 12 02 B0 02
[READER] Sample #12: 18 FA 90 02 0C 0A 02 05 32 12 02 B0 01 18 CC
[READER] Sample #13: DA 03 0C 0A 02 05 32 12 02 B0 01 18 CC DA 03
[READER] Sample #14: 0C 0A 02 05 32 12 02 B0 01 18 CC DA 03 0C 0A
[READER] Sample #15: 02 05 32 12 02 B0 01 18 CC DA 03 0C 0A 02 05
[READER] No samples received for 2 seconds. Device might have stopped streaming.
[READER] Thread finished. Total samples processed: 15
[MAIN] Reader thread joined.
[MAIN] Closing serial port.
[MAIN] Application finished.

Log from journalctl:

May 26 09:04:48 verdin-imx8mp-14772918 kernel: i2c i2c-0: <i2c_imx_write> write timedout
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: cpu cpu0: _set_opp_voltage: failed to set voltage (850000 850000 850000 mV): -110
May 26 09:04:48 verdin-imx8mp-14772918 kernel: unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: cpu cpu0: Failed to set regulator voltages: -110
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: cpufreq: __target_index: Failed to change cpu frequency: -110
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)
May 26 09:04:48 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 26 09:04:48 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 26 09:04:48 verdin-imx8mp-14772918 kernel: imx-uart 30890000.serial: RX flood detected: soft reset.
May 26 09:04:48 verdin-imx8mp-14772918 kernel: imx-sdma 30bd0000.dma-controller: restart cyclic channel 5
May 26 09:04:48 verdin-imx8mp-14772918 systemd[1]: serial-getty@ttymxc2.service: Deactivated successfully.

It should be easy to mock our external STM32, just send 15 random bytes every 5ms on the uart after receiving the command.

I also observe that with the RT patch, those “RX flood detected” do not happen and the example reliably receives samples.

Hi @mnano,

Could you please share the full journalctl log, especially the dmesg output? It’s possible that the RX flood is only a follow-up issue, and the root cause occurred earlier. For example, this line looks suspicious:

May 26 09:04:48 verdin-imx8mp-14772918 kernel: i2c i2c-0: <i2c_imx_write> write timedout

Additionally, could you try removing the sleep call from your code?

        // Ensure the loop attempts to process at the SAMPLE_INTERVAL rate
        auto elapsed_this_loop = std::chrono::steady_clock::now() - loop_start_time;
        if (elapsed_this_loop < SAMPLE_INTERVAL && keep_running)
        {
            std::this_thread::sleep_for(SAMPLE_INTERVAL - elapsed_this_loop);
        }

The select call should wait anyway for new data to come in. If, for example, the scheduler doesn’t resume your task promptly after the sleep duration, it might lead to issues.

When using the RT-Kernel, are you running your application with a higher priority, or have you made any other priority adjustments?

Also, have you tried reverting the patch mentioned by @collin.tx? This would help confirm whether it might have introduced an issue.
https://git.toradex.com/cgit/linux-toradex.git/commit/drivers/tty/serial/imx.c?h=toradex_6.6-2.2.x-imx&id=496a4471b7c3ae5c0be1a3fccd69e7debc127e08

Regards,
Stefan