Extremely slow boot with latest TorizonCore build

hello,
today I have tested the latest TorizonCore image 5.7.0-devel-202206+build.24 and my device was taking more than 5 minutes to boot completely. I suspect the kernel was just hanging some more instead of crashing & rebooting because I could see from my power supply that the current drained by my device was stable instead of going to zero and up again

Regards,
Rocco

Hi @RoccoBr,

I tested this same image here on my side (which I’ve downloaded from here: Toradex Download Links (Torizon, Linux BSP, WinCE and Partner Demos) → TorizonCore 5.7.0-devel-202206+build.24) and it worked fine for me. I enabled the kernel logs to check the boot time and that is what I got:

[   15.118390] audit: type=1325 audit(1654709751.190:8): table=filter family=2 entries=10
[   15.126551] audit: type=1325 audit(1654709751.200:9): table=filter family=2 entries=11
[   15.134635] audit: type=1325 audit(1654709751.200:10): table=filter family=2 entries=12
[   15.147387] Initializing XFRM netlink socket
[   15.179597] audit: type=1325 audit(1654709751.260:11): table=nat family=2 entries=7

TorizonCore 5.7.0-devel-202206+build.24 verdin-imx8mm-06902241 ttymxc0

verdin-imx8mm-06902241 login: torizon
Password: 
[   32.032485] kauditd_printk_skb: 31 callbacks suppressed
[   32.032491] audit: type=1006 audit(1654709768.110:43): pid=893 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=1 res=1
torizon@verdin-imx8mm-06902241:~$

As you can see, the boot process took 15 seconds for me. I tested this image on Verdin iMX8M Mini Q v1.1B and the evaluation carrier board.

Could you please share more information with us that might be causing this problem? Maybe a software or kernel modification that you did.
What carrier board are you using? Which version of the Verdin are you using?
Also, you can enable the kernel logs by running setenv tdxargs loglevel=7 on uBoot to check if anything is wrong.

Best regards,
Hiago.

1 Like

So far I have been monitoring the time I power on the unit and the current get stuck at 0.104 Ampere to the moment I can see the current going up to 0.176 Ampere and I can connect with ssh and see our docker container running. Using the same application but with our current TorizonCore build 5.2 I can connect via ssh and see the application container running within seconds from power on.
I’ll enable the kernel log and collect more info.
Regards,
Rocco

Hi @RoccoBr,

yes please, send us the kernel logs for us to help you better with this issue. Also, are you using a custom carrier board that might be causing this problem?

Best regards,
Hiago.

This is the relevant part of the kernel logs:

[    5.542756] systemd[1]: Started Journal Service.
[    5.564282] systemd-journald[493]: Received client request to flush runtime journal.
[   33.753430] REG_3P3V: disabling
[   33.756587] REG_MIPI_PHY: disabling
[  125.974308] imx-sdma 302c0000.dma-controller: loaded firmware 4.5

as you can see it takes some time to flush the journal and it gets stuck at REG_MIPI_PHY: disabling

do you want the complete log?

we are using a custom carrier board but we’ve never had any problems with TorizonCore v5.2 that we have been using so far. If I revert to TorizonCore v5.2, without changing anything else, the device boots within few seconds
Regards,
Rocco

I have compared with the kernel logs from TorizonCore 5.2, the same messages are there but they are executed much faster, there is no hanging:

[    5.621175] systemd-journald[495]: Received client request to flush runtime journal.
[    6.582284] imx-sdma 302c0000.dma-controller: loaded firmware 4.5
[    6.723648] spi_imx 30830000.spi: probed
[    6.731326] spi_imx 30840000.spi: probed
[    7.119304] usbcore: registered new interface driver ftdi_sio
[    7.156164] usbserial: USB Serial support registered for FTDI USB Serial Device
[    7.164329] ftdi_sio 1-1:1.0: FTDI USB Serial Device converter detected
[    7.185969] usb 1-1: Detected FT4232H
[    7.195220] usb 1-1: FTDI USB Serial Device converter now attached to ttyUSB0
[    7.203285] ftdi_sio 1-1:1.1: FTDI USB Serial Device converter detected
[    7.211284] usb 1-1: Detected FT4232H
[    7.221348] usb 1-1: FTDI USB Serial Device converter now attached to ttyUSB1
[    7.232465] ftdi_sio 1-1:1.2: FTDI USB Serial Device converter detected
[    7.241715] usb 1-1: Detected FT4232H
[    7.259345] usb 1-1: FTDI USB Serial Device converter now attached to ttyUSB2
[    7.270368] ftdi_sio 1-1:1.3: FTDI USB Serial Device converter detected
[    7.285391] usb 1-1: Detected FT4232H
[    7.293507] usb 1-1: FTDI USB Serial Device converter now attached to ttyUSB3
[    7.478847] fec 30be0000.ethernet ethernet0: renamed from eth0
[    7.992924] Microchip KSZ9131 Gigabit PHY 30be0000.ethernet-1:07: attached PHY driver [Microchip KSZ9131 Gigabit PHY] (mii_bus:phy_addr=30be0000.ethernet-1:07, irq=65)
[    8.115250] CAN device driver interface
[    8.181812] mcp25xxfd spi1.0: MCP2518 successfully initialized.
[    8.235591] mcp25xxfd spi2.0: MCP2518 successfully initialized.
[    8.301814] mcp25xxfd spi1.0 can2: renamed from can0
[    8.923171] usb 2-1: new full-speed USB device number 2 using ci_hdrc
[    9.125317] usb 2-1: not running at top speed; connect to a high speed hub
[    9.166593] usbcore: registered new interface driver option
[    9.172250] usbserial: USB Serial support registered for GSM modem (1-port)
[    9.179576] option 2-1:1.0: GSM modem (1-port) converter detected
[    9.186262] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB4
[    9.193285] option 2-1:1.1: GSM modem (1-port) converter detected
[    9.199989] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB5
[    9.207118] option 2-1:1.2: GSM modem (1-port) converter detected
[    9.213883] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB6
[   11.693338] IPv6: ADDRCONF(NETDEV_CHANGE): ethernet0: link becomes ready
[   11.700985] fec 30be0000.ethernet ethernet0: Link is Up - 1Gbps/Full - flow control off
[   14.337635] PPP generic driver version 2.4.2
[   32.199630] FAT-fs (mmcblk1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   33.295603] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   33.311366] Bridge firewalling registered
[   33.373171] audit: type=1325 audit(1654869742.113:2): table=nat family=2 entries=0
[   33.433321] audit: type=1325 audit(1654869742.173:3): table=filter family=2 entries=0
[   33.533549] audit: type=1325 audit(1654869742.293:4): table=nat family=2 entries=5
[   33.573275] audit: type=1325 audit(1654869742.323:5): table=filter family=2 entries=4
[   33.623542] audit: type=1325 audit(1654869742.363:6): table=filter family=2 entries=6
[   33.683566] audit: type=1325 audit(1654869742.413:7): table=filter family=2 entries=8
[   33.763218] REG_3P3V: disabling
[   33.763261] audit: type=1325 audit(1654869742.473:8): table=filter family=2 entries=10
[   33.766381] REG_MIPI_PHY: disabling
[   33.833603] audit: type=1325 audit(1654869742.563:9): table=filter family=2 entries=11
[   33.873193] audit: type=1325 audit(1654869742.623:10): table=filter family=2 entries=12
[   33.889664] Initializing XFRM netlink socket
[   33.983601] audit: type=1325 audit(1654869742.713:11): table=nat family=2 entries=7
[   39.788406] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation

I have tested the image with evaluation container and it get stuck indefinitely at the same point

interestingly in one og my my test with our own image based on TorizonCore 5.7 I had the sd card unplugged, so the unit got stuck at

[   33.752856] REG_3P3V: disabling
[   33.756011] REG_MIPI_PHY: disabling
[   33.759507] V3.3_SD: disabling

my feeling is that between 5.2 and 5.7 something has changed in the management of the 3.3V. maybe I need to change something in the device tree?

Hi @RoccoBr,

I couldn’t reproduce your problem here on my side.
Therefore, I will ask you to try some things that perhaps will help us find the root of this problem.

  • Do you have a Toradex carrier board with you? If so, please, try it on a different carrier board, so we could eliminate hardware problems.

  • Could you please try our BSP 5.6? With that, we can check whether is only Torizon that is causing this problem.

Best regards,
Hiago.

Hi Hiago,
no I don’t have a Toradex carrier board

by BSP 5.6, do you mean this image: https://artifacts.toradex.com/artifactory/torizoncore-oe-prod-frankfurt/dunfell-5.x.y/release/13/verdin-imx8mm/torizon/torizon-core-docker/oedeploy/torizon-core-docker-verdin-imx8mm-Tezi_5.6.0+build.13.tar?

Regards,
Rocco

Hi @RoccoBr,

I’m sorry I meant this one: https://artifacts.toradex.com/artifactory/tdxref-oe-prod-frankfurt/dunfell-5.x.y/release/18/verdin-imx8mm/tdx-xwayland/tdx-reference-minimal-image/oedeploy/Verdin-iMX8MM_Reference-Minimal-Image-Tezi_5.6.0+build.18.tar

It is not TorizonCore, instead is our BSP Linux Reference Minimal Image. So if this works, we’ll know that the problem is strictly related to TorizonCore, otherwise, it could be a hardware problem or something else that needs further investigation.

Best regards,
Hiago.

hi @hfranco.tx the image above hangs indefinitely at the same point . in attachment the full log
5.6_BSP_log.txt (32.5 KB)

Hi @RoccoBr,

Thanks for sharing it. From your log, I could see this:

[    5.517397] mcp25xxfd spi2.0: MCP2517 successfully initialized.

TDX Wayland with XWayland 5.6.0+build.18 (dunfell) verdin-imx8mm-06760613 ttymxc0
Verdin-iMX8MM_Reference-Minimal-Image

verdin-imx8mm-06760613 login: [    7.823183] fec 30be0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[    7.831173] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    7.845549] 8021q: 802.1Q VLAN Support v1.8
[   33.768369] REG_3P3V: disabling
[   33.771530] REG_MIPI_PHY: disabling

Can you see the login part?
Actually, it booted normally, in 5 seconds, then it asked you for a login. You can type root and then hit ENTER, it should work without problems. Could you try that, please?

What happened after the login

[    7.831173] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    7.845549] 8021q: 802.1Q VLAN Support v1.8
[   33.768369] REG_3P3V: disabling
[   33.771530] REG_MIPI_PHY: disabling

are just some logs that the kernel is printing for you on your terminal, it doesn’t mean that your module isn’t working or froze. These kernel messages are normal when you are using a serial cable to access your terminal.

Therefore from that, we can see that BSP 5.6 is booting in 5 seconds, but TorizonCore is hanging for some reason, am I correct?

Have you checked if the same login prompt appears in the TorizonCore terminal? Perhaps you couldn’t see it because of these kernel logs.

Best regards,
Hiago.

Hi @RoccoBr,

Any updates on this topic? Were you able to fix your issue?

Best Regards,
Hiago.

hi @hfranco.tx ,
I’m back to investigate this issue.
I’m having a look at the device tree at the moment. when you did your tests on your Verdin iMX8M Mini Q v1.1B and the evaluation carrier board, have you build your image with TorizonCore build tool passing the device tree for your carrier board or have you just download the archive and copied the files in the SD card and boot?

Hi @RoccoBr !

Before we talk about TorizonCore Builder and device tree modifications, could you please confirm that the apparent boot-freeze was sorted out?

As @hfranco.tx noted from your boot log, the module doesn’t seem to be freezing.

Best regards,

Hi,
No the boot-freeze as not being sorted out yet.
Besides the information in the log, I am also measuring the current consumption, it stays low for a couple of minutes before raising to the expected level. During those couple of minutes in low current the system is irresponsive.

Hi @RoccoBr !

Seems like I am not being able to understand the issue…

From your log, seems like it is working as expected.

After the log prompt on the serial console, aren’t you able to enter the username (and password, if any) and actually log into the module?

Best regards,

@henrique.tx have a look at the timestamps:

[ 33.756587] REG_MIPI_PHY: disabling

[ 125.974308] imx-sdma 302c0000.dma-controller: loaded firmware 4.5

there is a gap of 100 seconds where the system is hanging , the current consumption is low and the system is not responding

Hi @RoccoBr !

I see.

To which inputs the module doesn’t respond during this 100 seconds?

Best regards,

hi @henrique.tx ,
one example is ssh connection, whether via wired connection or modem.
but that is expected if the system is hanging for 100 seconds.

this delay is not present with our current implementation with TorizonCore v5.2 and we have started seeing this delay when evaluating TorizonCore v5.6 and v5.7. the only change we did was to fetch a different version of TorizonCore and nothing else during the build process