Occasionally linux hangs on wait for root device

bertin · February 22, 2023, 5:30pm

After we upgraded our kernel a few times (currently 6.1.8) occasionally the boot hangs on wait for root device /dev/mmcblk1p2 .

according to this post, the newer kernels “prefer” use of UUID . I guess this is because partition mounting is not deterministic and we should pass root=UUID= instead of root=/dev/mmcblk1p2 to the kernel

can we use the UUID for the partition from ls -l /dev/disk/by-uuid and pass that in place of the existing path to the partition? Do we need to also change /etc/fstab to match the UUID of the partition to where we want to mount it?

Thanks,

Brad

alex.tx · February 23, 2023, 8:23pm

Yes, you can use the UUID of the partition in place of the existing path to the partition. You can find the UUID of the partition using the command ls -l /dev/disk/by-uuid.

To pass the UUID to the kernel, you can modify U_boot environment configuration to include root=UUID=<UUID> instead of root=/dev/mmcblk1p2.

After modifying the kernel command line, you should also update /etc/fstab to use the UUID instead of the device path. Replace the device path with the UUID in the corresponding line(s) in /etc/fstab.

bertin · March 16, 2023, 12:19pm

Thanks for your response.

I can’t seem to make this work in Yocto. I made a bbappend for our image recipe (angstrom_lxde_image.bb) and tried to add via IMAGE_POSTPROCESS_COMMAND_append and also ROOTFS_POSTPROCESS_COMMAND_append . Below is part of the bbappend I wrote…but it doesn’t work because get_rootfs_uuid returns None …and that appears to be because ROOTFS is also None.

inherit fs-uuid

DEPENDS_append = " e2fsprogs-native"

python create_rootfs_uuid_variable () {
       uuid = get_rootfs_uuid(d)
       d.setVar("ROOTFS_UUID",uuid)
       print(d.getVar("ROOTFS_UUID",expand=True))
}

change_uenv_to_use_uuid() {
       #export ROOTFS_UUID
       #ROOTFS_UUID=get_rootfs_uuid()
       bbnote "change_uenv_to_use_uuid to UUID: ${ROOTFS_UUID}"
       sed -i -e 's,root=[^\s]+,root=${ROOTFS_UUID},g' ${IMAGE_ROOTFS}/boot/uEnv-${MACHINE}-${PV}-${PR}.txt
}

seasoned_geek · March 16, 2023, 2:09pm

Bertin,

You are getting into things I try not to touch. Mainly because I don’t consider Python a real language and you have just stumbled into one of the many many many reasons it is not. Besides Python not being type-safe, it requires a VM. Any systemy type Python functions will reach from their VM to the currently running OS. You are trying to build an OS.

I can show you what I did to get the user access to usb. This was done with base-files_%.bbapend

# Our torizon user needs access to the USB drives
#
do_install_append(){
    echo "/dev/sda1            /media/usbhd         auto       relatime,nofail,utf8,uid=torizon,gid=torizon,umask=002 0 0" >> ${D}${sysconfdir}/fstab

    echo "/dev/sdb1            /media/usbhd2        auto       relatime,nofail,utf8,uid=torizon,gid=torizon,umask=002 0 0" >> ${D}${sysconfdir}/fstab

    echo "/dev/mmcblk1p1       /media/sdcard        auto       defaults,sync,auto 0 0" >> ${D}${sysconfdir}/fstab

    install --directory ${D}/media/sdcard
}

pkg_postinst_dahlia(){
    chown -R 1000:1000 ${D}/media/sdcard
}

You may bet lucky and be able to use get_rootfs_uuid(${D}) instead of get_rootfs_uuid(d) but I doubt it.

bertin · March 16, 2023, 2:44pm

Ok I will try to use get_rootfs_uuid(${D}) instead.

The only reason I tried to use python functions here in my bitbake to get the UUID from the built root partition. Your example is using normal path variables like /dev/mmcblk1p1. But what you will find is that the newer Linux kernels dont like that. They want a UUID otherwise they could mount a different parition as mmcblk1p1 instead. And if you use a path instead of a UUID, it is possible that the system wont find the rootfs parition where it expects it to be in the firle system.

I dont have a problem with Python per se, but in this case, yes it is difficult to debug this error. Mostly because I dont understand what is supposed to set ROOTFS and how are you supposed to use the fs-uuid class?

There are basically no example recipes or appends that use this class that I can find.

I joined the yocto mailing list and am asking the question here out of desperation.

Thanks for the idea though, I will post back if it works.

seasoned_geek · March 16, 2023, 3:52pm

You are correct, I am using path variables. I think you missed the point.

You are going to have to create the exact same append inserting UUID to fstab.

There is a chicken and egg problem with the answer you were given. Much like this Raspberry Pi message thread. The tools for obtaining UUID have to be used on a running instance of the OS. That means the drive/partition has already been formatted and had a UUID assigned.

This long drawn out 2021 conversation may also shed some light. I didn’t read it in depth, but they were trying to solve the UUID problem for some kind of PI using Yocto. Also this 2016 conversation. Both mention a

--use-uuid option for root partition

You may be able to solve your problem completely by figuring out where to put that.

rafael.tx · March 16, 2023, 5:20pm

@bertin,
could you provide the output of the kernel boot when it hangs waiting for /dev/mmcblk1p2?
Do you have any specific scenarios where this happens?
Also, what module, carrier board and what software version are you using?

I ask all of this because it’s not normal for the boot process to hang while waiting for the root device, and there’s nothing wrong with using the device name for that. You shouldn’t need to setup UUID based mount partitions for your root device and as already mentioned before it’s not trivial to get the UUID that the partition will have while booting on the real hardware at build time.

Best regards,
Rafael

bertin · March 17, 2023, 3:03pm

here is the kernel when it failed looking for the rootfs on the partition with label “rootfs”
kernellogfailedlookingforlabeledrootfs.txt (17.9 KB)

here is the kernel log when it failed to look for the root partition by PARTUUID
kernellogfailedlookingforrootfswithUUID.txt (18.0 KB)

here is the kernel log when it failed looking root fs with a UUID
kernellogfailedlookingforrootfswithUUID2.txt (18.0 KB)

Here is the output from blkid when the device boots

root@colibri-t30-mainline:~# cat /boot/config-6.1.8-2.8.8 | grep CONFIG_BLOCK
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
root@colibri-t30-mainline:~# blkid
/dev/mmcblk1p2: LABEL="rootfs" UUID="69d5497a-d096-471e-88af-eee96b5cec11" TYPE="ext3"
/dev/mmcblk1p1: LABEL="boot" UUID="0A8A-E8FD" TYPE="vfat"

the only way that root works is when we use root=/dev/mmcblk1p2

We have the Colibri T30 and with a custom carrier board

rafael.tx · March 17, 2023, 4:32pm

Does the system hang when you have the root=/dev/mmcblk0p2? Do you have a log of that?

bertin · March 20, 2023, 10:37am

kernellogfailedlookingforrootfswithmmcblk0p2.txt (18.0 KB)

I tried booting with root=/dev/mmcblk0p2 and it works sometimes (as expected) and sometimes not (as expected). Above is the log attached.

The problem is that the kernel cannot find /dev/mmcblk0 or /dev/mmcblk1 every time. There is a race condition here. From what I have read setting root=UUID or root=PARTUUID and then setting the /etc/fstab to use the same UUID for root (otherwise I guess the kernel would try and remount the rootfs again after it read the /etc/fstab ?) should solve this problem… but for some reason doesn’t.

seasoned_geek · March 20, 2023, 11:21am

Captain Obvious wants to ask a stupid question here.

Are you absolutely certain you don’t have a power supply problem?

Nothing to do with Toradex, but this blog post is why I ask. Most of today’s power supplies start getting weaker the moment they are first plugged in. Yesteryear’s power supplies used to “just die.” Usually that death involved great stench and sometimes even “magic smoke.” Today’s power supplies just keep getting weaker. Systems require much more juice at boot than they do during flat run so a failing power supply manifests itself as “strange boot issues.”

I’m also going to point to the --use-uuid conversation again.

rafael.tx · March 20, 2023, 12:53pm

@bertin,
From your log output I can see that your emmc device is being detected as mmc1 instead of mmc0. In the latest versions of our BSP’s we have udev rules in place that create devices that won’t change based on the addresses of the controllers.

Here’s an example of how this looks like for a Verdin iMX8MP running BSP 6:

root@verdin-imx8mp-07330987:~# cat /etc/udev/rules.d/99-toradex.rules 
ACTION=="add|change", KERNEL=="i2c-[0-9]*", ATTRS{name}=="30a50000.i2c", SYMLINK+="verdin-i2c1"
ACTION=="add|change", KERNEL=="i2c-[0-9]*", ATTRS{name}=="30a30000.i2c", SYMLINK+="verdin-i2c2"
ACTION=="add|change", KERNEL=="i2c-[0-9]*", ATTRS{name}=="DesignWare HDMI", SYMLINK+="verdin-i2c3"
ACTION=="add|change", KERNEL=="i2c-[0-9]*", ATTRS{name}=="30a40000.i2c", SYMLINK+="verdin-i2c4"
ACTION=="add|change", KERNEL=="i2c-[0-9]*", ATTRS{name}=="30a20000.i2c", SYMLINK+="verdin-i2c-on-module"
ACTION=="add|change", ATTRS{iomem_base}=="0x30860000", SYMLINK+="verdin-uart1"
ACTION=="add|change", ATTRS{iomem_base}=="0x30890000", SYMLINK+="verdin-uart2"
ACTION=="add|change", ATTRS{iomem_base}=="0x30880000", SYMLINK+="verdin-uart3"
ACTION=="add|change", KERNELS=="watchdog", SYMLINK+="verdin-watchdog"
ACTION=="add|change", SUBSYSTEM=="spidev", KERNELS=="30820000.*spi", SYMLINK+="verdin-spi-cs%n"
KERNEL=="mmcblk[0-9]", ENV{DEVTYPE}=="disk", KERNELS=="30b60000.mmc", SYMLINK+="emmc"
KERNEL=="mmcblk[0-9]boot[0-9]", ENV{DEVTYPE}=="disk", KERNELS=="30b60000.mmc", SYMLINK+="emmc-boot%n"
KERNEL=="mmcblk[0-9]p[0-9]", ENV{DEVTYPE}=="partition", KERNELS=="30b60000.mmc", SYMLINK+="emmc-part%n"
SUBSYSTEM=="iio", KERNELS=="iio:device0", RUN+="/etc/udev/scripts/toradex-adc.sh"
ACTION=="add|change", KERNEL=="mmcblk[0-9]", KERNELS=="30b50000.mmc", SYMLINK+="verdin-sd"
ACTION=="add|change", KERNEL=="mmcblk[0-9]p[0-9]*", KERNELS=="30b50000.mmc", SYMLINK+="verdin-sd-part%n"

I think you could try something like this on your device as well, add a udev rules file that creates stable emmc device names that you can then use for the mounting process. Here’s an edited example that I think should work for your case. Notice that I used the address of the mmc1 controller from your boot log file:

KERNEL=="mmcblk[0-9]", ENV{DEVTYPE}=="disk", KERNELS=="78000600.mmc", SYMLINK+="emmc"
KERNEL=="mmcblk[0-9]boot[0-9]", ENV{DEVTYPE}=="disk", KERNELS=="78000600.mmc", SYMLINK+="emmc-boot%n"
KERNEL=="mmcblk[0-9]p[0-9]", ENV{DEVTYPE}=="partition", KERNELS=="78000600.mmc", SYMLINK+="emmc-part%n"

With this file in /etc/udev/rules.d/99-toradex.rules you should be able to mount your root partition in /dev/emmc-part2 and this should eliminate this problem.

bertin · March 20, 2023, 12:55pm

That is an interesting question and I had thought so at first as well. And I tried to rule that out early on by switching multiple units and also using the Colibri carrier board.

I am playing with the wks file for our image right now and I will try that again. I guess I did not understand that wic could change the fstab file and pass args to the bootloader. I will report back soon

bertin · March 20, 2023, 3:18pm

Ok I will add an bbappend and try this.

I changed the rootfs to ext4 and I tried again to see if I could pass UUID or PARTUUID and see if it would find the rootfs partition…no luck.

bertin · March 20, 2023, 5:30pm

I have figured out the answer. The problem was that we are using a msdos partition table and that the “UUID” returned by lsblk and ls /dev/drive/by-uuid is not a real one.

The answer is to use the /dev/drive/by-partuuid value instead on msdos partitioned drives.

I don’t know how I am going to add that to yocto . Do you have any ideas on how I would do that? Because I need to somehow figure out how to add that the wks or something. I am really not sure how I will get emmcboot to work with these part UUIDs. Probably would be a lot easier to use GPT tables and use the UUID from those?

The udev only works after root I mean, it does actually create a /dev/emmc-part0 /dev/emmc-part1 /dev/emmc-part2 … but that is happening after the root file system is loaded

below is the output from fdisk and how I figured out what was going on.

Thanks alot guys. Thanks @seasoned_geek !

Disk /dev/mmcblk1: 3.6 GiB, 3850371072 bytes, 7520256 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xa1346015

Device         Boot Start     End Sectors  Size Id Type
/dev/mmcblk1p1       8192   40959   32768   16M  c W95 FAT32 (LBA)
/dev/mmcblk1p2      40960 7413759 7372800  3.5G 83 Linux


Disk /dev/mmcblk1boot0: 16 MiB, 16777216 bytes, 32768 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mmcblk1boot1: 16 MiB, 16777216 bytes, 32768 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

seasoned_geek · March 20, 2023, 9:11pm

I do not know how to create GPT partition table in Yocto, but that is what you need if you want to use UUID. DOS never had the ability to store a UUID if memory serves.

rafael.tx · March 21, 2023, 1:15pm

@bertin,
I’m sorry, you’re right. The solution that I offered won’t work for the root device.
I’m assuming that you are not using Toradex Easy Installer for the installation of your images, right? I ask because at the moment tezi is only able to generate MSDOS partition tables.

I looked into the documentation, and it seems that the wic image type would be able to generate proper partitions and fstab entries. According to the documentation here the part command has a --use-uuid that supposedly creates a random uuid and even sets up the bootloader configuration:

--use-uuid: This option is specific to wic. It makes wic to generate                                                                                  
                     random globally unique identifier (GUID) for the partition                                                                                
                     and use it in bootloader configuration to specify root partition.

I would expect that in this case the partition table that’s being created should support UUID’s, otherwise what’s the point?

I didn’t test any of this myself and beware that the u-boot specific setup could depend on stuff that’s done differently in our standard boot process, so additional changes could be needed.

bertin · March 21, 2023, 2:06pm

I disabled the TEZI image in the machine.conf like below

# IMAGE_CLASSES += "image_type_tezi"
# IMAGE_FSTYPES += "wic.gz teziimg"

I tried this before… to use the --use-uuid option before without success. That was one of the things @seasoned_geek suggested. I am using wic.gz image type in yocto.

# short-description: Create SD card image with a boot partition
# long-description: Creates a partitioned SD card image. Boot files
# are located in the first vfat partition.

part /boot --source bootimg-partition --ondisk mmcblk --fstype=vfat --label boot --active --align 4 --size 16
part / --source rootfs --fstype=ext4 --label root --align 4 --ondisk mmcblk --use-uuid

bootloader  --ptable gpt

I saw the same documentation. Modded the wks file and then cleaned the image (with bitbake -c clean , but I supposed I could have called wic create ) But I don’t know what else needs to be modified to make the image use GPT …or if that is possible to do. Can please ask around? I am stumped here about what I need to modify in BSP 2.8.8 to make this work.

We are very close to being done with this image for our Colbri T30. We want to move to new Toradex hardware but we need to have working yocto for both old and new chips.

seasoned_geek · March 21, 2023, 2:31pm

Well, I’ve never tried this and the link isn’t a Toradex site, but this says to include UBOOT commands.

Ah, this may have some links of interest. It seems someone was trying to get Yocto to use GPT 2 years ago. The first comment seemed very informed with respect to the other things you need in that vfat to make it work.

bertin · March 21, 2023, 3:41pm

It just occurred to me that we should could can use “wic create” instead of modifying the update.sh

wic create “mywksfile” -e angstrom-lxde-image

then use the following to make the output media (19 Creating Partitioned Images Using Wic — The Yocto Project ® dev documentation)
sudo dd if=“nameofthewksoutput.direct” of=/dev/sdb

then that should bypass the problems I having with the update.sh at all

Edit: Oh and we will have to run “cbootimage” as well I think to make the output tegra friendly