Apalis T30 UART crash

I have configured my device tree and kernel to use the “non-DMA” version of the UART driver with mainline 4.14.y kernel. This is done by removing compatible = “nvidia,tegra30-hsuart”; from the device tree, and removing CONFIG_SERIAL_TEGRA. 3 out of the 4 ports work correctly. /dev/ttyS1 does not work at all. It regeisters in the kernel startup.

 4.502148] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    4.502362] iommu: Adding device serial8250 to group 70
[    4.505351] console [ttyS0] disabled
[    4.505439] 70006000.serial: ttyS0 at MMIO 0x70006000 (irq = 75, base_baud = 25500000) is a Tegra
[    5.933174] console [ttyS0] enabled
[    5.937696] 70006040.serial: ttyS1 at MMIO 0x70006040 (irq = 76, base_baud = 25500000) is a Tegra
[    5.947652] 70006200.serial: ttyS2 at MMIO 0x70006200 (irq = 77, base_baud = 25500000) is a Tegra
[    5.957574] 70006300.serial: ttyS3 at MMIO 0x70006300 (irq = 78, base_baud = 25500000) is a Tegra

Any attempt to access /dev/ttyS1 crashes the system. “stty -F /dev/ttyS1”. This tty is the one that is on MXM134 and MXM136. Is there something different about this pin? I am generally using the apalis-t30-evaluation device tree. Could this be configured incorrectly for one of the other alt functions? Or could there be some special use setup for that port like debugging or similar?

I disabled DMA on 16550 and made it a bit further. I can at least do:
cat /proc/tty/driver/serial

root@apalis-t30-mainline:/proc/tty/driver# cat serial
serinfo:1.0 driver revision:
0: uart:Tegra mmio:0x70006000 irq:75 tx:1690 rx:115 RTS|CTS|DTR|DSR|CD|RI
1: uart:unknown port:00000000 irq:0
2: uart:Tegra mmio:0x70006200 irq:77 tx:0 rx:0 RTS|CTS|DTR|RI
3: uart:Tegra mmio:0x70006300 irq:78 tx:0 rx:0 CTS|RI
root@apalis-t30-mainline:/proc/tty/driver#

Edit.
I also don’t see the device in kernel bootup anymore from dmesg | grep tty

I noted that UARTB clock and reset is handled differently than the other 3 UARTS. Perhaps the clock and/or reset needs a patch? UARTB maps in at 160 because it shares bit with vfir. Where would I look to see this being handled properly?

tegra30-car.h
/* SPDX-License-Identifier: GPL-2.0 */
/*
 * This header provides constants for binding nvidia,tegra30-car.
 *
 * The first 130 clocks are numbered to match the bits in the CAR's CLK_OUT_ENB
 * registers. These IDs often match those in the CAR's RST_DEVICES registers,
 * but not in all cases. Some bits in CLK_OUT_ENB affect multiple clocks. In
 * this case, those clocks are assigned IDs above 160 in order to highlight
 * this issue. Implementations that interpret these clock IDs as bit values
 * within the CLK_OUT_ENB or RST_DEVICES registers should be careful to
 * explicitly handle these special cases.
 *
 * The balance of the clocks controlled by the CAR are assigned IDs of 160 and
 * above.
 */

#ifndef _DT_BINDINGS_CLOCK_TEGRA30_CAR_H
#define _DT_BINDINGS_CLOCK_TEGRA30_CAR_H

#define TEGRA30_CLK_CPU 0
/* 1 */
/* 2 */
/* 3 */
#define TEGRA30_CLK_RTC 4
#define TEGRA30_CLK_TIMER 5
#define TEGRA30_CLK_UARTA 6
/* 7 (register bit affects uartb and vfir) */
#define TEGRA30_CLK_UARTB 7
#define TEGRA30_CLK_GPIO 8
#define TEGRA30_CLK_SDMMC2 9
.
.
.
#define TEGRA30_CLK_UARTB 160
#define TEGRA30_CLK_VFIR 161
#define TEGRA30_CLK_SPDIF_IN 162
#define TEGRA30_CLK_SPDIF_OUT 163
#define TEGRA30_CLK_VI 164

We reproduced this on TK1 as well. I suspect it having something to do with that UART being shared with SIR, FIR and VFIR blocks. Let us investigate some more on Monday when the whole team is back from ELCE.

Ok, thanks Marcel. I also believe it’s due to the shared bits. I’ll wait till next week for this one.

From page 972 of Tegra reference manual:
PINMUX_AUX_UART2_RXD_0:
PM:
0 = IRDA
1 = SPDIF
2 = UARTA
3 = SPI4
The reset default is 0b0x111000 (IRDA). Might we just have to turn off IRDA?

I don’t think it has anything to do with pin muxing as the regular muxing does work just fine with the high speed UART driver.

Yes, now that I’ve looked through the docs, IRDA mode is correct. And further VFIR_CTL_0 mode select reset default UARTB, not IRDA.

I’ll probably look at the clock next? I can’t see it being reset, as it shares the same reset as the other UARTs.

When I look at the tegra clk driver, most of the shared clock devices device as a connection rather than device. It’s too late for me to try today, but I’ll try something like this tomorrow.

diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c
index 07f5203..00616db 100644
--- a/drivers/clk/tegra/clk-tegra30.c
+++ b/drivers/clk/tegra/clk-tegra30.c
@@ -678,7 +678,7 @@ static struct tegra_devclk devclks[] __initdata = {
        { .con_id = "div-clk", .dev_id = "tegra-i2c.3", .dt_id = TEGRA30_CLK_I2C4 },
        { .con_id = "div-clk", .dev_id = "tegra-i2c.4", .dt_id = TEGRA30_CLK_I2C5 },
        { .dev_id = "tegra_uart.0", .dt_id = TEGRA30_CLK_UARTA },
-       { .dev_id = "tegra_uart.1", .dt_id = TEGRA30_CLK_UARTB },
+       { .con_id = "tegra_uart.1", .dt_id = TEGRA30_CLK_UARTB },
        { .dev_id = "tegra_uart.2", .dt_id = TEGRA30_CLK_UARTC },
        { .dev_id = "tegra_uart.3", .dt_id = TEGRA30_CLK_UARTD },
        { .dev_id = "tegra_uart.4", .dt_id = TEGRA30_CLK_UARTE },

Nothing I have tried has solved this. Even if I do “cat /dev/tty/driver/serial”, it crashes. Any ideas? I’ve been working on this for a week, so I’m really stumped.

Maybe VFIR resets to default: global disable IRDA module, which might include the UART hardware?
VFIR_CTL_0: bit31 GLOBAL_ENABLE: 0 = Entire IRDA module is disabled, 1 = ENABLE

From documentation (/Documentation/devicetree/bindings/pinctrl/nvidia,tegra30-pinmux.txt), there is no device vfir, so I am not sure how to do this in device tree. What I want to do is set bit 31 high in VFIR_CTL_0 at address 0x70006100.

I don’t know how to set the VFIR global enable in VFIR_CTRL. So I installed devmem into my image. I noticed that I can read the three UARTA,C,D, but not UARTB at 0x70006040. I also couldn’t read at 0x70006100 (VFIR). I was hoping to set the bit with devmem as a test, but that didn’t work either.

The problem seems to be with CLK_RST_CONTROLLER_CLK_OUT_ENB_L_0 and bit 7 not being set. If I do
devmem 0x60006010 → 0x9962C171
devmem 0x60006010 32 0x9962C1F1
I can then do stty -F /dev/ttyS1 and it doesn’t crash, and returns expected values.

Here’s a patch that I came up with, which works. This is not the correct way to do this, but demonstrates the problem can be fixed.

Basically, it removes the shared resource split of uartb and vfir, and just treats it as one device. I believe the vfir init may have been overwriting the uart and leaving the clk_out_enable set to disabled.

Sorry, I just now got around spending some more time on this issue and now fully understand what is going on. The big difference between the Tegra high-speed serial driver and the 8250 one is that while the former enables/disables clocks all the time the later does not do so but rather just enables the clocks during probe and leaves them enabled. Now, the problem is that the 2nd UART shares its clock with VFIR and while the kernel boots it initially probes UARTB and enables its clock. Later the kernel decides to disable the VFIR clock as it is unused. Any further accesses to UARTB now freeze. you may easily instrument this e.g. as follows:

  1. Stop in U-Boot.
  2. Make sure U-Boot console won’t overflow due to slow frame buffer console: setenv stdout serial.
  3. Enable tracing of clock related calls: setenv defargs 'trace_event=clk_enable,clk_disable,clk_set_parent tp_printk'.
  4. Now proceed with booting: boot.
  5. You will now notice UART and VFIR clocks being enabled resp. disabled:

.

[    2.229213] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.231946] clk_enable: uarta
[    2.232458] printk: console [ttyS0] disabled
[    2.232548] 70006000.serial: ttyS0 at MMIO 0x70006000 (irq = 78, base_baud = 25500000) is a Tegra
[    3.969906] printk: console [ttyS0] enabled
[    3.974678] clk_enable: uartb
[    3.978110] 70006040.serial: ttyS1 at MMIO 0x70006040 (irq = 79, base_baud = 25500000) is a Tegra
[    3.987577] clk_enable: uartc
[    3.991024] 70006200.serial: ttyS2 at MMIO 0x70006200 (irq = 80, base_baud = 25500000) is a Tegra
[    4.000485] clk_enable: uartd
[    4.003906] 70006300.serial: ttyS3 at MMIO 0x70006300 (irq = 81, base_baud = 25500000) is a Tegra
...
[    7.246243] clk_disable: vfir

This disabling of the VFIR clock may easily be prevented with the following simple patch:

diff --git a/drivers/clk/tegra/clk-tegra-periph.c b/drivers/clk/tegra/clk-tegra-periph.c
index 38c4eb28c8bf..e3da3147468a 100644
--- a/drivers/clk/tegra/clk-tegra-periph.c
+++ b/drivers/clk/tegra/clk-tegra-periph.c
@@ -671,7 +671,7 @@ static struct tegra_periph_init_data periph_clks[] = {
        MUX("hda", mux_pllp_pllc_clkm, CLK_SOURCE_HDA, 125, TEGRA_PERIPH_ON_APB, tegra_clk_hda_8),
        MUX("hda2codec_2x", mux_pllp_pllc_pllm_clkm, CLK_SOURCE_HDA2CODEC_2X, 111, TEGRA_PERIPH_ON_APB, tegra_clk_hda2codec_2x),
        MUX8("hda2codec_2x", mux_pllp_pllc_plla_clkm, CLK_SOURCE_HDA2CODEC_2X, 111, TEGRA_PERIPH_ON_APB, tegra_clk_hda2codec_2x_8),
-       MUX("vfir", mux_pllp_pllc_pllm_clkm, CLK_SOURCE_VFIR, 7, TEGRA_PERIPH_ON_APB, tegra_clk_vfir),
+       MUX_FLAGS("vfir", mux_pllp_pllc_pllm_clkm, CLK_SOURCE_VFIR, 7, TEGRA_PERIPH_ON_APB, tegra_clk_vfir, CLK_IGNORE_UNUSED),
        MUX("sdmmc1", mux_pllp_pllc_pllm_clkm, CLK_SOURCE_SDMMC1, 14, TEGRA_PERIPH_ON_APB, tegra_clk_sdmmc1),
        MUX("sdmmc2", mux_pllp_pllc_pllm_clkm, CLK_SOURCE_SDMMC2, 9, TEGRA_PERIPH_ON_APB, tegra_clk_sdmmc2),
        MUX("sdmmc3", mux_pllp_pllc_pllm_clkm, CLK_SOURCE_SDMMC3, 69, TEGRA_PERIPH_ON_APB, tegra_clk_sdmmc3),

Let me check with the Tegra maintainers whether that is indeed an acceptable solution.

Thanks again for reporting this issue and sorry it took so long for us to figure it out.

BTW: You may follow the mainline discussion here:

https://lore.kernel.org/lkml/20181101015230.27310-1-marcel@ziswiler.com

Thanks Marcel, your patch makes more sense for mainline, so I’m going to switch to yours.

Not only does the UART work without crashing, but all of my application firmware loaders (with tight timing requirements) work as expected. The DMA driver was causing all sorts of timing problems, so this is a good solution for us.

You are very welcome. Glad you are happy with this solution.

Looks like the mainline fix may rather involve doing proper clock reference counting. I will have a look at this and keep you posted.

As for the DMA driver, I guess that one really trades much higher throughput with a considerable latency increase which is obviously not acceptable in your use case.