Ubifs power cycling improvements

Does anyone know what were the changes to improve the power cycling in the distro for ubifs? We’re a bit stuck at 2.5 due to a lot of customization with source (non-yocto builds) and have been fighting occasional this error [Error reading superblock on volume ‘ubi0:rootfs’ errno=-22!]. I know in 2.7 / 2.8 this was fixed and would like to incorporate those improvements only.
Thanks,
Paul

I guess you already looked at the following and the proposed read-only workaround, right?

Hello, yes. But would this prevent a read/write partition to be safe? Our concern is we have to configure devices during installation time or while running during their service life. Loosing the configuration is also bad as well. I remember being told there was a correct with 2.7/2.8 that solved this problem.
Thanks,
Paul

Unfortunately, I don’t think there is such a one and only change which would improve the situation significantly.

Hello.
We have a large number of VF50 modules in the field using Linux BSP 2.5 and have occasionally come across identical issues to the one above [Error reading superblock on volume ‘ubi0:rootfs’ errno=-22!] after power recycling.

My questions is - has this problem been fixed in the latest BSP or is the fix to “Use a Read-Only Root File system” as mentioned in the link below?
https://developer.toradex.com/software/linux/linux-software/release-details?view=all&issue=22228

Are you having occasionally power-cuts?

We are using the VF50 in an embedded appliance. Most customers turn off the appliance with the power switch rather than shutting it down. For an embedded appliance, I don’t think this is unreasonable.

It depends on what is your application doing? If it is writing to the filesystem, then data can be corrupt.

Most customers turn off the appliance with the power switch rather than shutting it down.

If this is important, then you should introduce solutions on your carrier board, which detect a power cut/power switch off and turn off the module safetly.

Best regards,
Jaski

The only data that is being written to the file system is the system log periodically.
I thought the whole idea of a flash filing system was that it could handle power offs?

I believe we fixed it by merging code from a later version of uboot directly. I am not at the company anymore but I asked someone who should know.
PS, the file system is advertised as power cycle safe…

It was combination of changes to ubifs/super.c in u-boot and changing to a very small read/write partition and making the remainder of the filesystem read-only.

We verified the super.c changes using a hardware method of toggling the relay on boot up, so we were able to verify that the issue would have happened, but the kernel recovered correctly. Adding the partition changes basically eliminated the occurrences altogether.

Toradex do not seem know if this problem has been fixed or not. If not, I think it is a pretty fundamental fault. To suggest a hardware fix or a read only file system is not a fix.

We purchase a large number of VF50 modules and we are currently left questioning the quality of the supplied board support packages.

Hi @ashinton: I address this issue to our UbiFs expert (@stefan.tx), he should come back to you within this week.
We have done Tests on UbiFs with Bsp 2.8b6 and we did not see any issue regarding Power Cuts.

Best regards,
Jaski

Unfortunately, raw NAND and UBI showed quite some issues over time with various degree of impact and severity. Often it is not easy to tell which side is at fault (UBI/UBIFS, U-Boot or Linux), so it is often not easy to pinpoint what particular change fixed a certain problem.

Which exact Version of the BSP 2.5 (particularly U-Boot) are you using?

From what I remember is that U-Boot 2015.04 really was never completely stable. Going through discussions in our support system really shows that Error reading superblock on volume 'ubi0:rootfs' errno=-22 was always tied to U-Boot 2015.04. This commit in upstream U-Boot made in October 2015 suggests that fastmap was not stable before that commit. We did make efforts backporting fixes, but it seems to not have catched all issues in fastmap.

Since most issues seem fastmap related, it might help to just disable CONFIG_MTD_UBI_FASTMAP in U-Boot 2015.04 to work around the issue. But I’d rather suggest to move to a newer U-Boot 2016.11 (see #21227), which has proven to be very stable for Colibri VF50/VF61. Since the move to U-Boot most likely fixed this issue, I will extend that roadmap entry with that information.

I had spoken to a few consultants and a couple of design houses - this is a uboot source problem, where journaling gets hit during a second boot that is powered down. Ask for the source, which should have the changes - https://www.setra.com/hubfs/SS-FLEX-OSS%20Rev%20A.pdf

Test to run: cut out power during the stage the journaling fix up occurs, on our systems this was was a window in the first few minutes as it boots and Linux starts up. Our testing sometimes took hundreds of cycles to see this, while at customer sites it would happen at a much higher rate - just luck. One or two power cycles is not a test. If you want to be through, vary on time so the OS is up and running but the house keeping has yet to finish. When it happens, it will not get to hand off to the OS, you’ll get the -22 error on the console port.

Indeed, some of the issue we tracked down were very hard to trigger. Best approach is to set up an automated test which power cuts at random time during boot. The V2.6 stable release we put through some long time testing where we did 300000 and more power cut tests…

Indeed, raw NAND and UBI issues are sometimes very hard to catch. We usually do automated testing with power cuts at random times during boot. The last V2.6 release with disabled xattr support we tested some 300k and more random power cuts without seeing issues anymore.