Recovery after FS failure

Hi,

Here in Mectronic we are working on a complete Yocto system and we want to go deeper into a problem we encountered in the past.

On our old architecture, based on iMX6DL, some times it happened a kernel panic because the FS was corrupted.
This happened since the machine was turned on/off by a physical switch and it doesn’t perform a controlled shutdow process.
The fast solution we found, since the FS was very small, was to detect the Kernel panic and re-write the FS from the u-boot.
All sensible data were stored in an external SD card, so re write the entire FS was not a problem.

We would ask if does exist a better solution to implement this “recovery option” also on your system.
It should be better to find a more elegant solution since the FS we are developing is bigger than the old one.

Thanks,

Merlin

Dear @Merlin

Do you maybe have more information regarding this crashes? Which BSP version did you use on these modules? It shouldn’t happen that a power-off destroys the kernel. Maybe this was a bug in an older BSP version?

What you can do to reduce the risks is:

  • Mount the root partition read only
  • Use an initramfs as root filesystem
  • Try to write as little as possible to flash

Especially logging only as little as possible helps a lot. You can e.g. log to a ramdisk instead of eMMC and then do a backup once an hour. Of course this highly depends on your use case.

Also try to find the root cause if you see strange behaviour like the one described by you. Maybe it hides a serious problem.

Regards,
Stefan

Hi @stefan_e.tx, thanks for answering.

Our old architecture was not a Toradex, we are testing Apalis iMX8QM 4GB V1.0B since we need high performance on our top class machine and our old system isn’t enough.
By the way it was an iMX6DL performing Yocto 1.7.3 and 3.10 Linux Kernel. Quite old :D.

I’m not very expirienced in this field so forgive me if now I say something wrong. I just try to describe what we noticed and what we tried to do.

The problem was not in the Kernel i think, simple it went to a kernel panic because a part of the file system wa corrupted and so the start up procedure failed.
Every times it happened I always check which “block” was corrupted, it was always a File system part, the Kernel was ok.

So I implemented a procedure in the kernel which detected this kind of kernel panic and reboot the machine. At this time u-boot started a procedure that had to re write the file system.

Obviously I’m not looking here for a solution regarding another product problem, I’m just investigating if this problem could be present also in our new system (Toradex iMX8qm based) and how to avoid it.
We writed just a little on the flash, something like a configuration file (less then 1KB) updated once a minute.

I imagined that maybe someone worked on a recovery solution better than mine (since my solution is not properly a recovery option), but maybe, as you suggested, initramfs should be the better solution.

I will try to go deeper inside, maybe does exist a guide in Toradex documentation? :D.

Thanks for your support,

Merlin

Hi @Merlin

Unfortunately, we don’t document exactly your use case. For initramfs we have some documentation available here:

Our focus at the moment is on Torizon where we support fail safe updates. However, it also does a lot more:

It is still under development and not yet ready for productio but we plan an initial release for April.

We also have partners that provide commercial solutions for e.g. failsafe updates.
Mender.io: Mender.io - Toradex Service Partner
Foundries.io: Foundries.io - Toradex Service Partner
Balena.io: Balena - Toradex Service Partner

Regards,
Stefan

Hi @stefan_e.tx,
I’ll put my focus on initramfs and I will come back to support if I find something difficult.

Thanks a lot for your help.

Merlin

You are welcome.

Best regards,
Jaski

Hi @stefan_e.tx and Hi @jaski.tx

I’m a collegue of Merlin and we had a problem with initramfs. Our image is based on your console-tdx-image to which we have added recipes for cranksoftware and gstreamer. The result is a FS of about 200 MB and so performances at startup are extremely poor. We lose about 2/3 seconds on u-boot and another 5/6 second on kernel and in this way the boot process is extremely slow.

Do you have any suggestion for this use case?

Thanks

Enrico

Hi @Enrico

You can try to use another compression algorithm. I think by default it uses gzip? Maybe something like lz4 could help:
https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

However, the time in U-Boot won’t change because it’s the loading time of the ramdisk… Alternatively you could just use a normal rootfs instead of the initramfs and make that read only… Then you don’t lose time for loading and unzipping. So the boot up should be faster.

Regards,
Stefan

Hi @stefan_e.tx,

Using a read only file system is the simpliest solution, but are you sure that we will not have any kernel panic due to FS failure in normal situations? I’m asking because in the past we had to bring some machines from abroad, with costs for us and inconveniences for customers, when they were used normally (switch on/off).

Thanks

Enrico

Hi @Enrico

If you use eMMC in read only mode you should be on the safe side. With RAW NAND there are sometimes issues when not properly implemented. However, whether using an initramfs or using a read only partition should be exactly the same… If you had corruption in the past on a read only filesystem you would may also have corruption when loading the initramfs.

So switching from an initramfs to a read only filesystem is safe.

Regards,
Stefan

Hi @stefan_e.tx,

since it’s the first time i create a read-only rootfs i would like to ask you if everything i’ve done is correct:

  • I’ve added EXTRA_IMAGE_FEATURES += “read-only-rootfs” in apalis-imx8.conf file
  • I’ve edited fstab and changed mount options of /dev/root adding ro

It seems that everything is correct since if i try to touch test or mkdir test in the root directory i get an error that said that is a Read-only file system. Is there something else i have to do?

Thanks

Enrico

Hi @Enrico

This is perfect! You can also check if the partition is mounted read only by doing:

mount

You should then see something like:

/dev/mmcblk0p2 on / type ext4 (ro,...)

Regards,
Stefan