Occasionally linux hangs on wait for root device

bertin · March 21, 2023, 3:58pm

Thanks. Those are interesting and I think I will have to change the “update.sh” script provided by Toradex. Below is the part in the script that I will probably need to edit to generate the gpt partitions.

else
        if [ "${MODTYPE}" = "apalis-t30" ] || [ "${MODTYPE}" = "apalis-tk1" ] || [ "${MODTYPE}" = "apalis-tk1-mainline" ] || [ "${MODTYPE}" = "colibri-t30" ] || [ "${MODTYPE}" = "co$
                # Boot partition [in sectors of 512]
                BOOT_START=$(expr 4096 \* 2)
                # Rootfs partition [in sectors of 512]
                ROOTFS_START=$(expr 20480 \* 2)
                # Boot partition volume id
                BOOTDD_VOLUME_ID="boot"

                echo ""
                echo "Creating MBR file and do the partitioning"
                # Initialize a sparse file
                dd if=/dev/zero of=${BINARIES}/mbr.bin bs=512 count=0 seek=${EMMC_SIZE}
                ${PARTED} -s ${BINARIES}/mbr.bin mklabel msdos
                ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary fat32 ${BOOT_START} $(expr ${ROOTFS_START} - 1 )
                # the partition spans to the end of the disk, even though the fs size will be smaller
                # on the target the fs is then grown to the full size
                if [ "${IMAGEFILE}" = "root.ext3" ] ; then
                        ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary ext3 ${ROOTFS_START} $(expr ${EMMC_SIZE} \- ${ROOTFS_START} \- 1)
                else
                        ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary ext4 ${ROOTFS_START} $(expr ${EMMC_SIZE} \- ${ROOTFS_START} \- 1)

then I might need to change the include/configs/colibri_t30.h file in the u-boot source to modify the commands used by the bootloader to perform the update

#define BOARD_EXTRA_ENV_SETTINGS \
        "boot_file=zImage\0" \
        "console=ttyS0\0" \
        "defargs=core_edp_mv=1300 usb_high_speed=1 user_debug=30\0" \
        "dfu_alt_info=" DFU_ALT_EMMC_INFO "\0" \
        EMMC_BOOTCMD \
        "fdt_board=eval-v3\0" \
        "fdt_fixup=;\0" \
        NFS_BOOTCMD \
        SD_BOOTCMD \
        "setethupdate=if env exists ethaddr; then; else setenv ethaddr " \
                "00:14:2d:00:00:00; fi; usb start && tftpboot ${loadaddr} " \
                "flash_eth.img && source ${loadaddr}\0" \
        "setsdupdate=setenv interface mmc; setenv drive 1; mmc rescan; " \
                "load ${interface} ${drive}:1 ${loadaddr} flash_blk.img && " \
                "source ${loadaddr}\0" \
        "setup=setenv setupargs asix_mac=${ethaddr} " \
                "consoleblank=0 no_console_suspend=1 console=tty1 " \
                "console=${console},${baudrate}n8 debug_uartport=lsport,0 " \
                "vmalloc=128M mem=1012M@2048M fbmem=12M@3060M\0" \
        "setupdate=run setsdupdate || run setusbupdate || run setethupdate\0" \
        "setusbupdate=usb start && setenv interface usb; setenv drive 0; " \
                "load ${interface} ${drive}:1 ${loadaddr} flash_blk.img && " \
                "source ${loadaddr}\0" \
        USB_BOOTCMD \
        "vidargs=video=tegrafb0:640x480-16@60\0"

Oh man this is painful. I should reversion our BSP just to keep track of the fact that we have modified so much of the Toradex BSP. lol

bertin · March 24, 2023, 1:39pm

Ok I changed the update.sh script in our BSP to use gpt. below is where I changed the mklabel command to use GPT.

dd if=/dev/zero of=${BINARIES}/mbr.bin bs=512 count=0 seek=${EMMC_SIZE}
                ${PARTED} -s ${BINARIES}/mbr.bin mklabel gpt
                ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary fat32 ${BOOT_START} $(expr ${ROOTFS_START} - 1 )

But the problem is that I get the following errors

reading colibri_t30/zImage
8763336 bytes read in 425 ms (19.7 MiB/s)
GUID Partition Table Header signature is wrong: 0x0 != 0x5452415020494645
part_get_info_efi: *** ERROR: Invalid GPT ***
GUID Partition Table Header signature is wrong: 0x0 != 0x5452415020494645
part_get_info_efi: *** ERROR: Invalid Backup GPT ***
** Invalid partition 1 **
reading colibri_t30/tegra30-colibri-eval-v3.dtb
73293 bytes read in 28 ms (2.5 MiB/s)
GUID Partition Table Header signature is wrong: 0x0 != 0x5452415020494645
part_get_info_efi: *** ERROR: Invalid GPT ***
GUID Partition Table Header signature is wrong: 0x0 != 0x5452415020494645
part_get_info_efi: *** ERROR: Invalid Backup GPT ***

How do I change the following section/can I change the following section of update.sh to fix that problem?

echo ""
                echo "Creating VFAT partition image with the kernel"
                rm -f ${BINARIES}/boot.vfat
                #${MKFSVFAT} -n "${BOOTDD_VOLUME_ID}" -S 512 -C ${BINARIES}/boot.vfat $BOOT_BLOCKS
                dd if=/dev/zero of=${BINARIES}/boot.vfat bs=1024 count=0 seek=${BOOT_BLOCKS_BYTES} 
                ${PARTED} -s ${BINARIES}/boot.vfat mklabel gpt
                ${PARTED} -a none -s ${BINARIES}/boot.vfat unit b mkpart primary ext4 
                export MTOOLS_SKIP_CHECK=1
                #mcopy -i ${BINARIES}/boot.vfat -s ${BINARIES}/${KERNEL_IMAGETYPE} ::/${KERNEL_IMAGETYPE}
                dd if=${BINARIES}/${KERNEL_IMAGETYPE} of=${BINARIES}/boot.vfat 

                # Copy device tree file
                COPIED=false
                if test -n "${KERNEL_DEVICETREE}"; then
                        for DTS_FILE in ${KERNEL_DEVICETREE}; do
                                DTS_BASE_NAME=`basename ${DTS_FILE} .dtb`
                                if [ -e "${BINARIES}/${KERNEL_IMAGETYPE}-${DTS_BASE_NAME}.dtb" ]; then
                                        kernel_bin="`readlink ${BINARIES}/${KERNEL_IMAGETYPE}`"
                                        kernel_bin_for_dtb="`readlink ${BINARIES}/${KERNEL_IMAGETYPE}-${DTS_BASE_NAME}.dtb | sed "s,$DTS_BASE_NAME,${MODTYPE},g;s,\.dtb$,.bin,g"`"
                                        if [ "$kernel_bin" = "$kernel_bin_for_dtb" ]; then
                                                mcopy -i ${BINARIES}/boot.vfat -s ${BINARIES}/${DEPLOY_DIR_IMAGE}/${KERNEL_IMAGETYPE}-${DTS_BASE_NAME}.dtb ::/${DTS_BASE_NAME}.dtb
                                                #copy also to out_dir
                                                sudo cp ${BINARIES}/${DEPLOY_DIR_IMAGE}/${KERNEL_IMAGETYPE}-${DTS_BASE_NAME}.dtb "$OUT_DIR/${DTS_BASE_NAME}.dtb"
                                                COPIED=true
                                        fi
                                fi
                        done

@marcel.tx Hi Marcel, do you know who wrote the script?

I just would like to change to GPT if possible…or is it not possible to do that?

Edit: I should say that I have already started to change the update.sh in the extract above. you can see that in the commented out lines. I call PARTED instead of MKFSVAT

seasoned_geek · March 24, 2023, 1:54pm

I’m of no use at this point, but I do hope Toradex implements this as a build option since UUID is a reality going forward.

bertin · March 24, 2023, 2:07pm

thanks for chiming in here. I appreciate your contributions to this thread.

I hope so as well. I hope they help out here, I would like that as well. It is clear that specifying partitions with UUIDs is the way forward. As long as the SOC supports it, I am not sure why it is not possible

marcel.tx · March 24, 2023, 5:32pm

If I am not mistaken, should you still be using NVIDIA’s ancient downstream Linux kernel it may have some GPT-specific patches which only work with their proprietary partitioning.

Our later BSPs for any of our later SoMs do, of course, support UUID based handling. But NVIDIA’s T30 is rather obsolete at this point, at least when used with their ancient downstream Linux kernel.

bertin · March 27, 2023, 7:23am

Hi Marcel, Ok that sounds great. So because we are using kernel 6.1.8, I modded the BSP to use 6.1.8, we can use GPT as well?

How do we create the image? Should I just use the WKS file or what? How can I package the device tree and pass the address to the kernel? Are there any examples you can point to?

bertin · March 28, 2023, 7:13am

@marcel.tx @rafael.tx

Can we just change the wks file and use wic to create this image and finish this up?

I don’t understand what is going on in certain parts of the update.sh script. Specifically in the excerpt below. I can find no documentation about what this syntax is really doing. Is this really necessary if we can use the wks file to create the boot partition for the 6.1.8 Kernel?

rm -f ${BINARIES}/boot.vfat
${MKFSVFAT} -n "${BOOTDD_VOLUME_ID}" -S 512 -C ${BINARIES}/boot.vfat $BOOT_BLOCKS
mcopy -i ${BINARIES}/boot.vfat -s ${BINARIES}/${KERNEL_IMAGETYPE} ::/${KERNEL_IMAGETYPE}

seasoned_geek · March 29, 2023, 2:36pm

It’s only “obsolete” when customers stop paying for it.

seasoned_geek · March 29, 2023, 2:39pm

I don’t have anything to help right now, just wanted to check back and make sure you weren’t dangling from a ceiling fan in your office.

Something I wrote a while back seems to be most relevant now.

bertin · March 29, 2023, 3:24pm

Lol, thanks for the laugh.

Not yet, but another year with this old BSP and I might. To say that we have our backs against the wall to get our T30 based systems to Linux doesn’t do the situation justice.

I don’t know about you, be we were using WinCE before…

seasoned_geek · March 29, 2023, 3:53pm

Never touched WinCE. CAT was the only company with management shortsighted enough to use that abomination as far as I knew. Last I heard they were deleting huge chunks of WinCE itself during the build just to get enough room for the app.

Agile is how you end up with disasters like that because Agile is not Software Engineering.

Can Captain Obvious ask a stupid question at this point?

Have you reached out to NVIDIA itself? @marcel.tx made mention that this is an ancient downstream kernel. If you could actually reach the engineering support people at NVIDIA, they might have something “newer” that supports UUID and simply hasn’t bubbled out to the rest of the world. With Heritage hardware that is still selling this is common. The “retail channels” for lack of a better phrase only care about “the new stuff.”

You might have to find a UseNet group for NVIDIA or a mailing list. Probably won’t be any direct communication options with the group you need from the Web site. Might be a directly link to one of those things though. The real problem I have with every vendor on the Internet is they never put dates on pages like these. Unless you are up on Linux Kernel numbers, there is no way to know if that is from 1986 or 2023.

OMG! This has a 2022 date on it!

I apologize in advance if I’m sending you down an irrelevant rabbit hole with that link.

rafael.tx · March 29, 2023, 5:17pm

I’m sorry @bertin but I don’t have an answer to you at the moment. I’ll try to look into the changes you’re trying to do and see if I can find some way to help you.

rafael.tx · March 29, 2023, 6:23pm

@bertin, I found one reason that maybe your image creation with a GPT partition table is failing. As you can see on the update.sh script, right after the partitions are created the file is truncated to 512 bytes:

                echo ""                                                                                                                                        
                echo "Creating MBR file and do the partitioning"                                                                                               
                # Initialize a sparse file                                                                                                                     
                dd if=/dev/zero of=${BINARIES}/mbr.bin bs=512 count=0 seek=${EMMC_SIZE}                                                                        
                ${PARTED} -s ${BINARIES}/mbr.bin mklabel msdos                                                                                                 
                ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary fat32 ${BOOT_START} $(expr ${ROOTFS_START} - 1 )                                
                # the partition spans to the end of the disk, even though the fs size will be smaller                                                          
                # on the target the fs is then grown to the full size                                                                                          
                if [ "${IMAGEFILE}" = "root.ext3" ] ; then                                                                                                     
                        ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary ext3 ${ROOTFS_START} $(expr ${EMMC_SIZE} \- ${ROOTFS_START} \- 1)       
                else                                                                                                                                           
                        ${PARTED} -a none -s ${BINARIES}/mbr.bin unit s mkpart primary ext4 ${ROOTFS_START} $(expr ${EMMC_SIZE} \- ${ROOTFS_START} \- 1)       
                fi                                                                                                                                             
                ${PARTED} -s ${BINARIES}/mbr.bin unit s print                                                                                                  
                # get the size of the VFAT partition                                                                                                           
                BOOT_BLOCKS=$(LC_ALL=C ${PARTED} -s ${BINARIES}/mbr.bin unit b print \                                                                         
                        | awk '/ 1 / { print int(substr($4, 1, length($4 -1)) / 1024) }')                                                                      
                # now crop the file to only the MBR size                                                                                                       
                IMG_SIZE=512                                                                                                                                   
                truncate -s $IMG_SIZE ${BINARIES}/mbr.bin

This is done because the MBR occupies only one block. However, GPT occupies 35 blocks at the beginning of the disk and 35 more at the end for the backup GPT, as you can see here:

So by letting the script truncate the file to 512 bytes you’ll essentially remove all the GPT partitions that were previously created. I would suggest that you at least remove the truncation on your script. Because this is a change in the layout that’s written on the disk, you should also take a look at what the update script is doing in u-boot, just to make sure it’s not truncating something there as well, or otherwise overlapping parts of the GPT partition table with other pieces of information.

As for the :: syntax you referred to before, this is from mcopy and it’s what they call in the documentation the : drive (hence the second :, imagine that the first one would be normally something like a or c in windows speak). This : drive is substituted with the contents of the file being inserted with the -s option.

henrique.tx · April 20, 2023, 11:40am

Hi @bertin!

Were you able to solve your issue? Did @rafael.tx’s answer help you?

Best regards,

bertin · May 4, 2023, 9:59am

Sorry, I haven’t had chance to try it. This place is chaotic right now and I have been working with wince again now. I should get a chance in a few weeks I think.

seasoned_geek · May 4, 2023, 10:12am

Nice to know you aren’t dangling from a ceiling fan in the office . . . though WinCE can make that seem a good option for anyone. What a tragic abomination!

bertin · May 4, 2023, 11:36am

It really is close to that man. I had to drop our linux development and go back to WinCE and the last traces of information on WinCE are being deleted as we speak.

Not to mention that some of the APIs we are using for communications on WinCE… I am not sure if they ever were or will be stable. At this point it feels like a page out of a Kafka novel.