UBIFS corruption after update and reboot

Hello, we have custom hardware board using Colibri IMX6ULL 512MB IT SoM.

We use latest stable Toradex BSP v5.7 with Toradex linux kernel 5.4.193 and Toradex u-boot v2020.07

I have implemented OTA updates using swupdate and A/B model.

When doing OTA update we update kernel, device tree and rootfs A or B ubi volumes. But sometimes in about 1 time from 50 random boards we get corrupted ubifs after doing reboot after update.

Kernel image and device tree binary is written by swupdate directly on ubi volumes replacing previous kernel image and device tree and rootfs is written as image of ubifs volume.

I do reboot immediately after swupdate finishes OTA update. Maybe that could be the issue?

The board during update is doing it’s usual work read/writes active rootfs and homefs ubi volumes.

What I have tried already to solve the ubifs corruption:

  • Add sync command before rebooting
  • Remove fastmap support from u-boot and kernel

But the same problem exists.

The usual problem is I can’t read or access some directories on updated volumes. And in some cases there was even corruption on homefs ubi volume that is not touched by update.

Maybe you have experienced the same behaviour and could help with our case?

Kernel messages on corrupted fs

[  265.291090] Hardware name: Freescale i.MX6 Ultralite (Device Tree)
[  265.297312] [<8010ea38>] (unwind_backtrace) from [<8010bb90>] (show_stack+0x10/0x14)
[  265.305088] [<8010bb90>] (show_stack) from [<8084e390>] (dump_stack+0x90/0xa4)
[  265.312338] [<8084e390>] (dump_stack) from [<8038ea44>] (ubifs_read_node+0x260/0x2a8)
[  265.320199] [<8038ea44>] (ubifs_read_node) from [<803ada50>] (ubifs_tnc_read_node+0x158/0x1ec)
[  265.328839] [<803ada50>] (ubifs_tnc_read_node) from [<803903c0>] (tnc_read_hashed_node+0x98/0x1b8)
[  265.337827] [<803903c0>] (tnc_read_hashed_node) from [<80393ea0>] (ubifs_tnc_next_ent+0x188/0x254)
[  265.346816] [<80393ea0>] (ubifs_tnc_next_ent) from [<803852e0>] (ubifs_readdir+0x164/0x4e8)
[  265.355201] [<803852e0>] (ubifs_readdir) from [<80251754>] (iterate_dir+0x74/0x15c)
[  265.362889] [<80251754>] (iterate_dir) from [<80251ef4>] (ksys_getdents64+0x78/0x138)
[  265.370748] [<80251ef4>] (ksys_getdents64) from [<80101000>] (ret_fast_syscall+0x0/0x54)
[  265.378858] Exception stack(0x95235fa8 to 0x95235ff0)
[  265.383929] 5fa0:                   00000020 016afb08 00000003 016afb28 00008000 00000000
[  265.392130] 5fc0: 00000020 016afb08 016afb0c 000000d9 00000000 016afb28 00000001 00000000
[  265.400326] 5fe0: 000000d9 7e9bb3f4 76c8c313 76c33c26
[  265.405490] UBIFS error (ubi0:5 pid 499): ubifs_readdir: cannot find next direntry, error -22
[  268.371662] UBIFS error (ubi0:5 pid 501): ubifs_read_node: bad node type (255 but expected 2)
[  268.380376] UBIFS error (ubi0:5 pid 501): ubifs_read_node: bad node at LEB 397:39928, LEB mapping status 1
[  268.390122] Not a node, first 24 bytes:
[  268.390138] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff             

Our NAND layout is the following: mtdinfo -a

Count of MTD devices:           5
Present MTD devices:            mtd0, mtd1, mtd2, mtd3, mtd4
Sysfs interface supported:      yes

mtd0
Name:                           mx6ull-bcb
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          4 (524288 bytes, 512.0 KiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  2048 bytes
OOB size:                       112 bytes
Character device major/minor:   90:0
Bad blocks are allowed:         true
Device is writable:             true

mtd1
Name:                           u-boot1
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          12 (1572864 bytes, 1.5 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  2048 bytes
OOB size:                       112 bytes
Character device major/minor:   90:2
Bad blocks are allowed:         true
Device is writable:             true

mtd2
Name:                           u-boot2
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          12 (1572864 bytes, 1.5 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  2048 bytes
OOB size:                       112 bytes
Character device major/minor:   90:4
Bad blocks are allowed:         true
Device is writable:             true

mtd3
Name:                           u-boot-env
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          4 (524288 bytes, 512.0 KiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  2048 bytes
OOB size:                       112 bytes
Character device major/minor:   90:6
Bad blocks are allowed:         true
Device is writable:             true

mtd4
Name:                           ubi
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          4064 (532676608 bytes, 508.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  2048 bytes
OOB size:                       112 bytes
Character device major/minor:   90:8
Bad blocks are allowed:         true
Device is writable:             true

root@06883894:~# mtdinfo -u
Count of MTD devices:           5
Present MTD devices:            mtd0, mtd1, mtd2, mtd3, mtd4
Sysfs interface supported:      yes

UBI device layout on mtd4 is the following

UBI version:                    1
Count of UBI devices:           1
UBI control device major/minor: 10:62
Present UBI devices:            ubi0
root@06883894:~# ubinfo -a
UBI version:                    1
Count of UBI devices:           1
UBI control device major/minor: 10:62
Present UBI devices:            ubi0

ubi0
Volumes count:                           8
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     4055 (514887680 bytes, 491.0 MiB)
Amount of available logical eraseblocks: 2 (253952 bytes, 248.0 KiB)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       9
Count of reserved physical eraseblocks:  71
Current maximum erase counter value:     140
Minimum input/output unit size:          2048 bytes
Character device major/minor:            244:0
Present volumes:                         0, 1, 2, 3, 4, 5, 6, 7

Volume ID:   0 (on ubi0)
Type:        static
Alignment:   1
Size:        100 LEBs (12697600 bytes, 12.1 MiB)
Data bytes:  6037168 bytes (5.7 MiB)
State:       OK
Name:        kernel_copy1
Character device major/minor: 244:1
-----------------------------------
Volume ID:   1 (on ubi0)
Type:        static
Alignment:   1
Size:        100 LEBs (12697600 bytes, 12.1 MiB)
Data bytes:  6037168 bytes (5.7 MiB)
State:       OK
Name:        kernel_copy2
Character device major/minor: 244:2
-----------------------------------
Volume ID:   2 (on ubi0)
Type:        static
Alignment:   1
Size:        2 LEBs (253952 bytes, 248.0 KiB)
Data bytes:  56163 bytes (54.8 KiB)
State:       OK
Name:        dtb_copy1
Character device major/minor: 244:3
-----------------------------------
Volume ID:   3 (on ubi0)
Type:        static
Alignment:   1
Size:        2 LEBs (253952 bytes, 248.0 KiB)
Data bytes:  56163 bytes (54.8 KiB)
State:       OK
Name:        dtb_copy2
Character device major/minor: 244:4
-----------------------------------
Volume ID:   4 (on ubi0)
Type:        static
Alignment:   1
Size:        8 LEBs (1015808 bytes, 992.0 KiB)
Data bytes:  0 bytes
State:       OK
Name:        m4firmware
Character device major/minor: 244:5
-----------------------------------
Volume ID:   5 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        1210 LEBs (153640960 bytes, 146.5 MiB)
State:       OK
Name:        rootfs_copy1
Character device major/minor: 244:6
-----------------------------------
Volume ID:   6 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        1210 LEBs (153640960 bytes, 146.5 MiB)
State:       OK
Name:        rootfs_copy2
Character device major/minor: 244:7
-----------------------------------
Volume ID:   7 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        1346 LEBs (170909696 bytes, 162.9 MiB)
State:       OK
Name:        homefs
Character device major/minor: 244:8

Hello

1 and 2 are fine using ubiupdatevol command, perhaps we just haven’t met such failure yet. Do you know, does SWUpdate use ubiupdatevol or specialized UBI library, or indeed it writes UBI volumes directly using SWUpdate own means?

Unless you reboot cycling power on/off, normal reboot command should be fine, at least using ubiupdatevol command. Booting from USB and using kernel LED class with LED trigger set to ‘mtd’ or nand-disk, I see LED stops blinking at the same time when ubiupdatevol completes and exits. No further blinks even on reboot.

Edward

Quote from swupdate documentation:

When updating volumes, it is guaranteed that erase counters are preserved and not lost. The behavior of updating is identical to that of the ubiupdatevol(1) tool from mtd-utils. In fact, the same library from mtd-utils (libubi) is reused by SWUpdate.

Reboot is complete linux reboot process and not power reset.

Today I have managed to get corruption on my development board and have complete log file on serial console from booting board, doing update with swupdate (swupdate log captured doing update), then system rebooting, failing to mount new rootfs volume (try three times) and then recover back to the old rootfs ubi volume. I can send you log privately, because it could contain sensitive information.

The main problem that I haven’t found a stable way to reproduce corruption, as it happens random.

Hi @tomasvilda,

Doesn’t your log file include any bad kernel messages regarding UBI? With “quiet” on kernel command prompt you should see them easily on serial console without the need to dmesg. I guess such kernel message may happen during swupdate as well. No more ideas, sorry.

Edward

There are no ubi errors during swupdate or until reboot on serial console. I get ubi errors on serial console only after update and reboot.

Hello @tomasvilda, how frequently are you running these updates on your system? I ask because if your partitions are going a long time without reads, you could be getting hit by read disturb problems. You won’t notice these problems until you switch between the currently working partition and the new working partition.
You could try to run ubihealthd on your systems and see if this improves the situation:

Best regards,
Rafael Beims

1 Like

Thanks for response, when I got first corruptions, I have installed ubihealthd as systemd service with default parameters. But the result is the same, ratio of corrupted systems stays about the same even with ubihealthd running.

I have setup automatic updates that pushes update after update to the same system. And I get corruption 1 or 2 times from 100 updates.

Hi @tomasvilda,

I’m trying to incorporate SWUpdate in our image. A/B kernel, dtb and rootfs. Using “ubiswap” in sw-description to swap all 3 volume pairs simultaneously. Works nicely.

I was suspecting fastmap in U-Boot in the past for few VF61 rootfs crashes. Only booting to Linux it never happened, only just when upgrading from U-Boot to new version. I disabled fastmap in U-Boot for VF61 long ago and it seems helped. Perhaps in reality it had nothing to do with fastmap, but I think it helped. And I haven’t see anything like this on iMX7D or iMX6ULL with much newer U-Boot, no problems at all for years.

And now, testing in a loop SWUpdate + reboot on iMX6ULL, I see from time to time this message from U-Boot 2020.07-5.7.1-

WARNING at drivers/mtd/ubi/fastmap.c:850/ubi_attach_fastmap()!
ubi0 error: ubi_scan_fastmap: Attach by fastmap failed, doing a full scan!

But no crashes so far. It doesn’t matter how fast reboot is issued, about couple of times SWUpdate+reboot boots cleanly, and then the message from U-Boot.

Are you sure fastmap is disabled in your U-Boot? It shouldn’t show strings like these on boot:

ubi0: attached by fastmap
ubi0: fastmap pool size: 200
ubi0: fastmap WL pool size: 100

Interesting thing. Upgrading image from U-Boot, ubi max/mean erase counters seem growing both along with amount of write with max usually leading ~1.5 times. Using SWUpdate and rewriting UBI volumes from Linux, looks like only lower mean number is growing trying to reach max number, so wear leveling seems working much better in Linux than in U-Boot. If U-Boot UBI implementation is limited, then fastmap as well may work less efficiently or even with error?

I think that either UBI volume rename on Linux misses some fastmap work, or there’s a bug in U-Boot.

Edward

fastmap is definetely disabled in uboot and kernel configs

uboot:

Booting from NAND...
ubi0: attaching mtd5
ubi0: scanning is finished
ubi0: attached mtd5 (name "ubi", size 508 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 4055, bad PEBs: 9, corrupted PEBs: 0
ubi0: user volume: 8, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 298/227, WL threshold: 4096, image sequence number: 2057030606
ubi0: available PEBs: 2, total reserved PEBs: 4053, PEBs reserved for bad PEB handling: 71

kernel

Booting from NAND...
ubi0: attaching mtd5
ubi0: scanning is finished
ubi0: attached mtd5 (name "ubi", size 508 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 4055, bad PEBs: 9, corrupted PEBs: 0
ubi0: user volume: 8, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 298/227, WL threshold: 4096, image sequence number: 2057030606
ubi0: available PEBs: 2, total reserved PEBs: 4053, PEBs reserved for bad PEB handling: 71

Our sw-description:

software =
{
    version = "0.1.0";

    imx6ull = {
        hardware-compatibility: [ "1.0", "2.0", "3.0", "4.0", "5.0"];
        stable = {
            copy1 : {
                images: (
                    {
                        filename = "rootfs.ubifs";
                        volume = "rootfs_copy1";
                        installed-directly = true;
                        type = "ubivol"
                        
                    },
                    {
                        filename = "zImage";
                        volume = "kernel_copy1";
                        installed-directly = true;
                        type = "ubivol"
                    },
                    {
                        filename = "devicetree.dtb";
                        volume = "dtb_copy1";
                        installed-directly = true;
                        type = "ubivol"
                    }
                );
                uboot: (
                    {
                        name = "copy";
                        value = "1";
                    },
                    {
                        name = "upgrade_available";
                        value = "1";
                    }
                );
            };
            copy2 : {
                images: (
                    {
                        filename = "rootfs.ubifs";
                        volume = "rootfs_copy2";
                        installed-directly = true;
                        type = "ubivol"
                    },
                    {
                        filename = "zImage";
                        volume = "kernel_copy2";
                        installed-directly = true;
                        type = "ubivol"
                    },
                    {
                        filename = "devicetree.dtb";
                        volume = "dtb_copy2";
                        installed-directly = true;
                        type = "ubivol"
                    }
                );
                uboot: (
                    {
                        name = "copy";
                        value = "2";
                    },
                    {
                        name = "upgrade_available";
                        value = "1";
                    }
                );
            };
        };
    }
}

Thanks.
Did you try with other modules? I’m trying with other Colibri instance and no U-Boot warnings so far.
Didn’t you do any clock changes in DT that could affect GPMI operation?

Could you try the same with your custom services, that are writing to UBI, stopped? This could narrow search, is error simultaneous swupdate and other OS writes related, or just swupdate.

According to your sw-description, it seems you are altering U-Boot environment to choose which volumes to boot from. This involves libubootenv. I also was thinking about copy-1 + copy-2, until I found ubiswap, which eliminated any work around U-Boot environment. Perhaps libubootenv has something to do with the issue? Here’s how you can eliminate U-Boot from equation:

software =
{
	version = "0.1.0";

	colibri-imx6ull = {
		hardware-compatibility: [ "8" ];

		images: (
			{
				filename = "imx6ull.ubifs";
				volume = "rootfs2";
				type = "ubivol";
			}
			,
			{
				filename = "imx6ull-default.dtb";
				volume = "dtb2";
				type = "ubivol";
			}
			,
			{
				filename = "zImage";
				volume = "kernel2";
				type = "ubivol";
			}
		);
		scripts: (
			{
				type = "ubiswap";
				properties: {
					swap-0 = [ "dtb2" , "dtb" ];
					swap-1 = [ "kernel2" , "kernel" ];
					swap-2 = [ "rootfs2" , "rootfs" ];
				},
			},
		);
	};
}

All volume pairs swap-n are swapped simultaneously by SWUpdate after all volumes are written successfully, nothing to do with U-Boot Env. Some block could be required to not update again without reboot. Though, putting ubi image to *.swu first, and only then kernel and dtb, SWUpdate would stop and not damage anything since new shadow rootfs after rename is still mounted by current boot.

Edward

BTW does your /etc/fw_env.config agree with your MTD parts, IIRC some SWupdate recipe was trying to overwrite it. Colibri default fw_env.config is this

# MTD device name       Device offset   Env. size       Flash sector size      Number of sectors
# Colibri iMX6ULL
/dev/mtd3               0x00000000      0x00020000      0x20000                4
Colibri iMX6ULL # mtdparts

device nand0 <gpmi-nand>, # parts = 5
 #: name                size            offset          mask_flags
 0: mx6ull-bcb          0x00080000      0x00000000      0
 1: u-boot1             0x00180000      0x00080000      1
 2: u-boot2             0x00180000      0x00200000      1
 3: u-boot-env          0x00080000      0x00380000      0
 4: ubi                 0x1fc00000      0x00400000      0

We have a fleet of over 150 modules running and while updating them I encounter corruptions. During tests all other services are running, they are writing logs to active partitions.

Ubiswap is OK, but then how do you handle recovery and fallback to previous update state?

Device tree is almost identical to evaluation board with some pins reconfigured for different functions.

I will try to do tests with minimal writes.

My fw_env.conf is configured with redundancy

# MTD device name	Device offset	Env. size	Flash sector size	Number of sectors
# Colibri iMX6ULL
/dev/mtd3		0x00000000	0x00020000	0x20000			4
/dev/mtd3		0x00020000	0x00020000	0x20000			4
Colibri iMX6ULL # mtdparts 

device nand0 <gpmi-nand>, # parts = 5
 #: name                size            offset          mask_flags
 0: mx6ull-bcb          0x00080000      0x00000000      0
 1: u-boot1             0x00180000      0x00080000      1
 2: u-boot2             0x00180000      0x00200000      1
 3: u-boot-env          0x00080000      0x00380000      0
 4: ubi                 0x1fc00000      0x00400000      0

I’m just starting with SWUpdate. Fallback to previous version is possible from Linux. U-Boot from Toradex seems not having commands to rename UBI volumes. Even if it had, for atomic multivolume swap special commands would be required anyway. And on Linux it is clearly possible to write some small application to swap those volumes. Recovery? Do you mean upgrade failure recovery? SWUpdate won’t swap until all shadow copies are complete.

I see problems in your fw_env.config. MTD parts have u-boot-env, like in default BSP, 0x80000 wide, or 4 0x20000 erase sectors. /dev/mtd3 starts at 0x380000 and ubi at 380000+80000=0x400000
First line of your fw_env:
start at /dev/mtd3 (0x380000) + offset(0x000) =0x380000
end at 0x380000 + 4 * 0x20000 = 0x400000 and you already reach start of ubi partition!
Yes, you specify Env Size of only 0x20000, but you still specify that it spans for 4 erase sectors! Number of sectors should be 1 to allow next fv_env line starting at offset 0x20000. And 2nd line should have number of sectors no more than 3 to not cross start of ubi volume.

Did you make any U-Boot mods to support this custom fw_env.config? I think U-Boot should be modified accordingly to support that 2nd environment. I don’t know how. I spent some minutes figuring if U-Boot 2020.07 supports redundant environment. My impression that it doesn’t support it.

Edward

Oh thanks! That in rare cases as I have can corrupt my UBI FS.

Will fix that and do batch update/reboot tests again.

This is how redundant env is enabled in uboot

diff --git a/configs/colibri-imx6ull_defconfig b/configs/colibri-imx6ull_defconfig
index c7ac275daf..43b43712ba 100644
--- a/configs/colibri-imx6ull_defconfig
+++ b/configs/colibri-imx6ull_defconfig
@@ -3,6 +3,10 @@ CONFIG_ARCH_MX6=y
 CONFIG_SYS_TEXT_BASE=0x87800000
 CONFIG_ENV_SIZE=0x20000
 CONFIG_ENV_OFFSET=0x380000
+CONFIG_ENV_OFFSET_REDUND=0x3A0000
+CONFIG_SYS_REDUNDAND_ENVIRONMENT=y
+CONFIG_BOOTCOUNT_BOOTLIMIT=3
+CONFIG_BOOTCOUNT_ENV=y
 CONFIG_TARGET_COLIBRI_IMX6ULL=y
 CONFIG_DM_GPIO=y
 CONFIG_TARGET_COLIBRI_IMX6ULL_NAND=y

Fix will look like this

diff --git a/recipes-bsp/u-boot/u-boot-toradex/colibri-imx6ull/fw_env.config b/recipes-bsp/u-boot/u-boot-toradex/colibri-imx6ull/fw_env.config
index a1401b6cf69e8828ecf2f9e77a08829ab5a6a069..8bde6055763eea60eff4244d060bd44ecd80be00 100644
--- a/recipes-bsp/u-boot/u-boot-toradex/colibri-imx6ull/fw_env.config
+++ b/recipes-bsp/u-boot/u-boot-toradex/colibri-imx6ull/fw_env.config
@@ -7,5 +7,5 @@
 
 # MTD device name	Device offset	Env. size	Flash sector size	Number of sectors
 # Colibri iMX6ULL
-/dev/mtd3		0x00000000	0x00020000	0x20000			4
-/dev/mtd3		0x00020000	0x00020000	0x20000			4
\ No newline at end of file
+/dev/mtd3		0x00000000	0x00020000	0x20000			1
+/dev/mtd3		0x00020000	0x00020000	0x20000			1
\ No newline at end of file

Hm, I saw config items with REDUND, but failed to find them in U-Boot code. I see now, thanks.
Did you modify as well CONFIG_ENV_RANGE? Since there’s redundant environment, I think range should be reduced 2x times. And if OFFSET’s are one erase sector apart, ENV_RANGE should be 0x20000. I’d move redundand env. to 0x3C0000.

I wonder, did you check if this redundancy really works? Erasing one part of uboot-env partition, reset, saveenv, then another part, reset, should confirm if redundancy is working or not. U-Boot vars shouldn’t change.

Regards,
Edward

Never came across to the RANGE setting, but that makes sense. And never checked if redundant env actually works. Will make actual tests.

And in the current buggy environment will make update and check the first sectors of ubi volume before update and after. They should be all empty (FFs) after I do update in linux using swupdate in current configuration.

And in case UBI PEB is mapped to LEB that is actually used by file system I get corruption.

Thanks for spotting my mistake! Will report results.

Hi @tomasvilda,

Few comments after some testing.

U-Boot CONFIG_ENV_RANGE is told to be optional, and when not specified equals CONFIG_ENV_SIZE, which is 0x20000 by default in Toradex U-Boot for colibri_imx6ull. CONFIG_ENV_RANGE by default is set to u-boot-env partition size. CONFIG_ENV_RANGE is how much space is erased starting from current main or redundant environment. So clearly you should reduce this ENV_RANGE so that erase started from redundant Env doesn’t cross the top of u-boot-env partition.

I don’t know how it is actually done in Linux, I hope Linux MTD driver doesn’t allow crossing MTD device size. I wonder how to check it. But since you are using U-Boot bootcounter in Env, I think U-Boot was destroying your UBI when it was updating bootcounter. It does it on start. With wrong CONFIG_ENV_RANGE bottom of UBI partition won’t be overwritten saving to primary Env location. And saving to redundant location UBI is clearly crossed.
Additional issue, you could never notice. Due to ENV_OFFSET + ENV_RANGE overlapping ENV_OFFSET_REDUND, when U-Boot was writing to primary Env, it was destroing redundand Env. After write to redundand Env, primary Env was not affected.

I was assuming that redundant environment is about saving two equal copies every time. In fact, every saveenv in U-Boot writes every time only one copy and alters copy destination every time. So, to be confident about two equal copies, you need to saveenv in U-Boot twice. U-Boot confirms which instance is overwritten every time.

What about Linux side and U-Boot fw_env? It is the same. But in contrast to U-Boot, it doesn’t allow saving two identical Env copies. You need to fw_saveenv with some new setting, perhaps toggling some extra unused variable, else Linux won’t save anything to alternate Env location. You may check it like below using hexdump and grep <your_variable_name>. First and 2nd copy is easily distinguishable by hexdump address.

~# hexdump -C /dev/mtd3 | grep boo
00000040  74 77 72 5f 6d 79 00 62  6f 6f 3d 31 37 00 62 6f  |twr_my.boo=17.bo|
000000c0  62 6f 6f 74 5f 65 78 74  6c 69 6e 75 78 3d 73 79  |boot_extlinux=sy|
...
00040040  74 77 72 5f 6d 79 00 62  6f 6f 3d 31 38 00 62 6f  |twr_my.boo=18.bo|
000400c0  62 6f 6f 74 5f 65 78 74  6c 69 6e 75 78 3d 73 79  |boot_extlinux=sy|
...

Like I understand SWUpdate by U-Boot Env atomicy means simultaneous write of all used U-Boot variables. But does it handle the case power is cut during that single write? I doubt it, unless they toggle something extra, else 2nd identical Env copy won’t be stored. No big deal though, just update again.

Lack of knowledge of details is why I tried to avoid changing U-Boot environment from Linux. Now, after things are clear, it’s quite safe to use it, especially with redundant Env enabled.

Back to your former question

I think you meant here bootcounter + upgrade_available. Yet another useful U-Boot feature, I wonder why is not enabled in Toradex U-Boot.
So that if for example update file passed all checks and readched end user with kernel for different machine, after bootcounter elapses, how to make U-Boot executing right altbootcmd command. With ubiswap things change a bit, because power may be cut after UBI volumes are swapped, but before upgrade_available variable is set. It could be solved setting upgrade_available=1 just before update process. All Linux versions still should clear it as a result of successfull boot, so no problems with this. And altbootcmd is stright forward, just load kernel, dtb from alternative volumes, as well specify alternative volume on kernel command prompt. Well, both ubiswap and copy-N methods are equally robust, provided U-Boot is compiled with right settings.

Regards,
Edward

Thanks for such in depth analysis. Will do some tests on my current configuration to see if and when I cross to UBI partition.

Hello @tomasvilda,
do you have any news on this topic? Are you still experiencing FS corruption?

I’m doing final tests, will report my experience tomorrow. But everything looks very good now!