UBI corruption after reboot

Hi,

I’m currently trying to store the u-boot environments in redundant UBI volumes.
I have done several modifications to enable UBI support and UBI redundant environment in u-boot.

It seems u-boot is able to store properly its data in UBI volumes now, but I have however a side effect.
After an initial update of the board, I can only boot once. I can reboot several time from u-boot, but after the first boot to kernel/rootfs, all the static volumes of the UBI partition seems to be corrupted.

Before a reboot, “ubi info layout” reports the following for updated static volumes:

Volume information dump:
	vol_id          0
	reserved_pebs   81
	alignment       1
	data_pad        0
	vol_type        4
	name_len        8
	usable_leb_size 126976
	used_ebs        34
	**used_bytes      4297112**
	last_eb_bytes   106904
	corrupted       0
	upd_marker      0
	name            kernel

After the first boot back in u-boot, “ubi info layout” reports:

Volume information dump:
	vol_id          0
	reserved_pebs   81
	alignment       1
	data_pad        0
	vol_type        4
	name_len        8
	usable_leb_size 126976
	used_ebs        0
	**used_bytes      0**
	last_eb_bytes   0
	corrupted       0
	upd_marker      0
	name            kernel

If I remove the “ubi.fm_autoconvert=1” kernel option, it seems that there is no more corruption.

Is there some know issue between u-boot in UBI and fastmap?

Here are the main modifications done

  • I have created static UBI volumes for u-boot environment.

  • I have patched u-boot to enable UBI support and redundant environment (CONFIG_CMD_UBI,
    CONFIG_ENV_IS_IN_UBI, CONFIG_ENV_UBI_PART “ubi”, CONFIG_ENV_UBI_VOLUME “u-boot-env”,
    CONFIG_ENV_UBI_VOLUME_REDUND “u-boot-env-redund”, CONFIG_ENV_SIZE ( (64-2) * 2048))

  • I have also added MTD_UBI_GLUEBI support to kernel.

Regards

There is one issue we currently still see with UBI/UBIFS in VF50/61:

However, the information in this issue are a bit outdated: Further investigation showed that the issue is not fastmap related, but related to extended file system attributes (I will update the ticket soon).

So, this particular issue is not known to Toradex at this point.

However, since 2.7b3 we updated the Linux kernel used in VF50/VF61 to the latest stable version several times. Showing the difference between the Linux kernel from back then and today shows several UBI and fastmap related fixes:

git log Colibri-VF_LXDE-Image_2.7b3-20170630...toradex_vf_4.4

I would recommend to test with a kernel built from the top of branch toradex_vf_4.4 (it should be safe to use the rest of the BSP from the 2.7b3 release).

It seems my problem is coming from the modifications I have done in mtd partitions.

I have removed u-boot-env partition from mtd partitions and add u-boot-env and u-boot-end-redund volumes to UBI partition.

If I’m keeping the u-boot-env partition in mtd (with my u-boot-env volumes in UBI) with the same u-boot/kernel/rootfs, my UBI volumes are not more corrupted.

Here is what I have done to move mtd into UBI volume.

  1. I have updated the define MTDPARTS_DEFAULT in u-boot/include/configs/colibri_vf.h from mtdparts=vf610_nfc:128k(vf-bcb)ro,1408k(u-boot)ro,512k(u-boot-env),-(ubi) to mtdparts=vf610_nfc:128k(vf-bcb)ro,1408k(u-boot)ro,-(ubi)
  2. I have changed u-boot update script to create needed ubi volumes with “ubi create u-boot-env ${u-boot-env-size} static && ubi create u-boot-env-redund ${u-boot-env-size} static” with u-boot-env-size 0x1F000
  3. During migration from one mtd to the other I do a nand erase.part ubi to ensure it is clean

Is there other things that need to be updated on mtd partition changes?

Hi

This looks right.

If you use ‘erase.part ubi’ you lose any wear levelling information. If you do that only once then that probably is irrelevant. The better approach is probably to delete the existing volumes and then create your new environment volumes together with the kernel, dtb and rootfs partition.

The following in the existing update script checks and if needed migrates from our older UBI partitioning with only a rootfs to the newer one with kernel, dtb and rootfs partition.
You could check in a similar way if you already have your environment partitions or not and only touch the partitioning if not already migrated.

# Migrate to UBI volume based boot schema
setenv prepare_kernel_fdt 'ubi create kernel 0x800000 static && ubi create dtb 0x20000 static'
setenv prepare_rootfs 'ubi create rootfs 0 dynamic'
setenv prepare_ubi 'ubi part ubi && if ubi check rootfs; then if ubi check kernel; then else ubi remove rootfs && run prepare_kernel_fdt && run prepare_rootfs; fi; else run prepare_kernel_fdt && run prepare_rootfs; fi'

Max

Hi,

I have let my custom image aside and applied modifications to enable UBI u-boot-env on the last Toradex 2.8b4 image and still have a similar issue.
The first boot is working fine but after a reboot the UBI partition seems to be corrupted and u-boot fails to load kernel.

Find attached the patch UBI patch and the logs of the boot sequence Boot serial logs

Did you know what could be wrong in the way I have enabled environment storage in UBI?

I have kept CONFIG_ENV_SIZE bigger than a LEB size, but as u-boot-env volumes have the same size I didn’t expect any issue.

Regards

Hi @ykrons

What I can see from the log file is that it can’t load the devicetree anymore. Where did you put the two u-boot-envs? In front of the devicetree partition maybe? What was the order when creating the UBIFS volumes?

Regards,
Stefan

Hi,

I have started from default UBI layout, remove the rootfs volume, then create u-boot-env and u-boot-env-redund and finally recreate rootfs volume.
So the order of the UBI volumes is as follow:

  • kernel
  • dtb
  • u-boot-env
  • u-boot-env-redund
  • rootfs

Regards

The u-boot-env volumes are stored in the UBI partition, so I don’t expect any corruption between volume. Even if not properly used an overflow must be reported when volume are written.
I’m suspecting the fastmap creation during kernel boot to break something.

Dear @ykrons

It looks like something goes wrong with fastmap autoconvert. Can you please try to disable that in u-boot after setting up the partitions? You have to change ubiargs as follows:

setenv ubiargs ubi.mtd=ubi root=ubi0:rootfs rw rootfstype=ubifs

Ubiattach will be slightly slower if it is disabled, however I hope this isn’t too bad for your scenario.

Regards,
Stefan

Hello Stefan,

I have not tested with the last Toradex 2.8b4, but I have seen with my customized image that if I remove ubi.fm_autoconvert=1, the issue disappears (see my initial question). At least I can boot 3-4 times without issue.
For the VF61, it is probably not very critical as the flash size is not so big, but we plan to use dual UBI env also on a iMX7d 1G with a bigger flash where the additional boot time may be significant.

However I wonder if there is still a risk with autoconvert disabled. According to some test done with latest mainline u-boot (with some other error messages), the UBI volumes seem not to be corrupted. The problem maybe comes from the way u-boot changes the UBI volume (create some corruption?) and fastmap from the kernel probably detects the error because it scans all PEB. So isn’t there a risk that after more access to the UBI partition, I reach the corrupted point and that my system becomes again unbootable?

Will you continue investigation on that problem to fix it and have a reliable UBI support in u-boot?

Regards

Hi @ykrons,

There are currently no plans to investigate further into this issue because it is related to fm_autoconvert and we currently don’t have the resources to fix that in the ubifs stack.

On the iMX7d with 1GB we don’t use NAND Flash (eMMC instead) and no UBIFS therefore. So this problem wont affect the modules with eMMC flash.

For what we know at the moment, there is no risk for corrupting the filesystem if you disable fm_autoconvert. The problem seems to be, that Linux enables fastmap and U-Boot can’t handle it.

I hope this helps,
Stefan

Hi,

Ok thanks for the feedback.

I’m currently doing some test to ensure there is no corruption even when fw_autoconvert is disabled. After a full erase of the UBI partition, I’m writing random data into the flash until the UBI maximum erase counter is increased by 2.
That way, I’m expecting that all the nand blocks where updated at least once and so that the error is detected in case of corruption.

At the time, I can’t find corruption so I think I will go that way without fastmap autoconvert.

Thanks for your support.

You are welcome. Perfect, that it works.