we have a device that’s running a yocto based Linux system and u-boot bootloader.
The bootloader code is pre 2019 and was missing some fixes for ECC errors that can happen through e.g. bit-flips. In case of such errors the ubi partition fails to be mounted and the Linux system is not loaded.
To fix this I built a new u-boot image and also successfully flashed it following the very good Toradex instructions from the article “updating nand based modules from userspace”.
Since the devices are in the field and in different countries, I wanted to make sure the update is as risk-less as can be; hence I was happy that there are redundant u-boot partitions.
I tested what would happen if the update of a partition fails and the result wasn’t quite as expected.
Flashing a broken bootloader (Only the first 2000bytes are valid data an the rest is 0x00) to u-boot1 (/dev/mtd1) lead to the device not booting anymore. u-boot2 (/dev/mtd2) still contained the old bootloader which I verified before doing any changes using
So in the end I have some open questions:
- Are there limitations that can cause the redundant u-boot partition to be ignored?
- How is the selection of u-boot partition done during boot?
With best regards,
What fixes do refer? If you had no or few correctable bit-flips and it was running well, and now you have it failed due to numerous uncorrectable bitflips, then it could be flash page overwrite without erase, flash cells self discharging, flash cells affected by write/read/erase of other flash cells or similar issues.
U-Boot has nothing to do with it. It just reads kernel and dtb from UBI and boots Linux, just that.
Do you have ubihealthd service enabled on your Colibri. It should be enabled to help rewriting blocks with many correctable bitflips before number of bitflips becomes uncorrectable.
Flashing broken bootloader doesn’t make sense, unless it is HAB signed (before you break data, not signed with your 0’s) and HAB is enabled. Instead try erasing one of bootloader copy or somehow overwrite part of it so that it leads to uncorrectable ECC error. Boot ROM, when HAB is disabled, just looks for ECC status. If all bootloader data specified in DCD is readable - bootloader is considered OK. With HAB enabled, after read is OK boot ROM verifies signature. If one of verify steps (read ECC or HAB signature check) fails, boot ROM tries to load next copy specified in FCB. On Colibri there are perhaps 2 or 3 FCB copies in mx7-bcb partition, each one points to two bootloader copies.
thank you so much for all the great information. The second part gave good answers to my questions.
We do not use HAB, so that way of detection is not possible. My main concern are any kind of power issues during flashing that could interrupt the process. But these situations could result in ECC errors and get detected.
Regarding your question about fixes, these are the commits to u-boot I was referring to:
ubihealthd is currently not installed, but will be added to the system. During investigation of the issue I already saw it being mentioned and it seems to be required to ensure stable operation.
Thanks for patches, interesting.
I saw patches for UBI to implement decent FS checks and recovery. I fear eMMC is more popular and those won’t see the sunshine.
Regarding HAB, no fear power is cut during flash write or fuse burning. Unless someone writes all fuse patterns including SRK and HAB enable in one block write to /sys/…/nvmem. Since HAB enable comes on lower address than SRK, power cut may lead to HAB enabled with bad SRK, which is problem for given IMX instance. Perhaps JTAG could help recovering. So just program HAB fuse only after fusing SRK.
uuu recovery and imx_usb recovery (imx_usb is used by Toradex Tezi iMX7 recovery scripts) are possible with HAB enabled. The annoying thing is, bootloader has to be signed not the same for all 3 boot variants. Native, uuu and imx_usb bootloaders need bit different signing, which is known how, but not clearly stated in documentation. The only fear is loosing signing keys, so that you can’t sign new images for HAB enabled iMXes. It is possible to (re-)sign any bootloader, already signed or not, like one which comes with Tezi.