File corruption after power loss on Docker volume

raccoman · July 21, 2025, 1:57pm

Hello,

I’m experiencing an issue with file saving inside docker container on a defined volume.

Scenario:

Inside a Docker container, we write files to a bind-mounted volume pointing to a host directory (/home/root/configs).

After writing a file using either vim or a C application (which explicitly calls fsync() on the file descriptor), if the device is power-cut abruptly, the file appears truncated or corrupted upon reboot.

The same operation performed outside the container on the host does not result in corruption.

Technical Details:

The volume is defined in docker-compose.yml as:

    ```yaml
    driver_opts:
      type: none
      device: /home/root/configs
      o: bind,rw,sync,commit=1
    ```

The file is written with: open(), write(), fsync(), and close() in C.
However, even with this, the corruption still occurs if the device loses power before a clean shutdown.

From inside the container, the mount shows:

/dev/disk/by-label/otaroot on /root/logs type ext4 (rw,relatime)

→ indicating that mount options such as sync or commit=1 are not in effect.

Questions:

Are there any specific kernel or filesystem settings in TorizonCore that could interfere with data persistence even after calling fsync()?
Is there an official best practice from Toradex for ensuring data persistence inside containers under sudden power loss scenarios?

Thank you in advance for your support.

Best regards,
Riccardo

ToradexAI · July 21, 2025, 1:58pm

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer?

Hello Riccardo,

Thank you for the detailed description of your issue. I’ll address your questions step by step.

1. Behavior of Data Persistence, fsync(), and Filesystem Settings

Inside a Docker container using a bind mount to the host directory, when writing files and calling fsync(), you are ensuring data is flushed from userspace and the container environment down to the kernel and storage stack (1).
However, even with fsync(), data may still be buffered in the kernel or the storage device itself (such as eMMC), especially if the storage’s write cache is enabled and does not honor flushes or if unexpected power loss interrupts these processes.

Your observation—that files written directly on the host can survive power loss more reliably—might be due to differences in how data paths behave with containerization. However, in principle, fsync() called from within a container should be as effective as when called from the host, since bind mounts provide direct access to the underlying filesystem.

2. Docker Bind Mount Options and Limitations

The sync and commit=1 options you specified in docker-compose.yml are passed to the mount helper inside Docker, but for bind mounts, only the bind and ro/rw options have effect. Options like sync and commit=1 are ignored for bind mounts—they are not propagated to the base filesystem.
The actual base filesystem’s options (e.g., /dev/disk/by-label/otaroot mounted as ext4) determine the ultimate persistence characteristics. You would have to mount your root or relevant mount point with options like sync or commit=1 outside of Docker, at the OS level, for them to take effect.

3. Toradex Best Practices and Additional Protection

Please refer to these best-practice guidelines:

Key recommendations:

eMMC Write Reliability: Ensure that your module’s eMMC is write-reliability enabled. This minimizes the risk of data corruption during a sudden power loss. You can check this in U-Boot by mmc info and confirm the presence of WRREL. (2)
Host Mount Options: If your application relies on enhanced data consistency, remount your host filesystem (e.g., ext4) with additional safety options such as sync or a smaller commit interval (commit=1). For example:
```
mount -o remount,sync /dev/disk/by-label/otaroot /
```
But be aware: using sync can drastically decrease performance and increase flash wear.
Application-level Protection: Continue calling fsync() after every critical write. Avoid using O_SYNC or mounting with sync unless absolutely necessary, as these have significant performance penalties.
Consider Docker Volumes: While bind mounts are common, Docker volumes may provide a slightly improved abstraction, but ultimate behavior is still governed by the host filesystem. (1)
Torizon/OSTree Safety: TorizonCore, when updating with OSTree, uses atomic operations for upgrades, but this doesn’t apply to generic application data written at runtime—so the above precautions are still needed for your files (3).

Summary Table

Setting	Where to Change	Effect
`sync`, `commit=1`	OS-level mount options	Yes, affects disk writes
Application `fsync()`	Your C application	Flushes to kernel and storage
Write-reliability	eMMC config	Enables atomic sector writes
Docker mount options	Only `bind,rw` for bind	Others ignored for bind mounts

Final Notes

The filesystem’s commit interval and cache handling may cause you to see data loss even after fsync() if the storage hardware does not honor flushes or loses power in the middle of transferring blocks.
True power-loss safety is very difficult to achieve in software alone, but all of the above can reduce the risk.

For further reading:

If you need details on checking or setting eMMC write reliability, or adjusting host mount options at runtime, let me know and I can provide specific commands.

Best regards,
Toradex AI Assistant.

jeremias.tx · July 21, 2025, 7:33pm

Greetings @raccoman,

Are there any specific kernel or filesystem settings in TorizonCore that could interfere with data persistence even after calling fsync()?

Not that we are aware of.

Is there an official best practice from Toradex for ensuring data persistence inside containers under sudden power loss scenarios?

We don’t have any specific recommendations. We usually would just say the usual recommendations with data persistence. Doing syncs after important file operations and such.

When you say “file corruption” is observed what do you observe exactly?

Also I noticed you’re on a somewhat old version of Torizon (5.7.2). As a sanity check could you perhaps try on a newer version of Torizon OS. Perhaps the newer versions of the kernel or Docker would improve or fix your use-case. I haven’t heard of such reports of file corruption recently. So either no one else is doing sudden power losses with no proper shutdown on their devices (unlikely), or maybe the situation really is improved on the newer versions.

If the newer versions do not address your issue, then perhaps you could share your exact steps for running your container, writing the file, and shutting down the device for further analysis and reproduction attempts.

Best Regards,
Jeremias

raccoman · July 22, 2025, 7:07am

Hi,

Here is my tdx-info:

Bootloader: U-Boot
Kernel version: 5.4.193-5.7.2+git.b60d3160fd04 #1-TorizonCore SMP PREEMPT Fri Dec 23 15:47:24 UTC 2022
Kernel command line: pci=nomsi root=LABEL=otaroot rootfstype=ext4 quiet logo.nologo vt.global_cursor_default=0 plymouth.ignore-serial-consoles splash fbcon=map:3 ostree=/ostree/boot.1/torizon/462f211b372fbfea3c16e9154648bd2052363f5fc9f9e464b6258d18a435d5eb/0
Distro name: NAME=“TorizonCore”
Distro version: VERSION_ID=5.7.2-build.20
Distro variant: VARIANT=“Docker”
Hostname: apalis-imx8-15310204

HW model: Toradex Apalis iMX8QM V1.1 on Apalis Evaluation Board
Toradex version: 0067 V1.1A
Serial number: 15310204
Processor arch: aarch64
Available block devices:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 30M 0 disk
sda1 8:1 1 30M 0 part /var/rootdirs/media/PlutoSDR
mmcblk0 179:0 0 29.1G 0 disk
mmcblk0p1 179:1 0 29.1G 0 part /sysroot
mmcblk0boot0 179:32 0 4M 1 disk
mmcblk0boot1 179:64 0 4M 1 disk

Step to reproduce:

Create example container: docker run --privileged -v /home/torizon:/home -v /dev:/dev -v /var/run/dbus:/var/run/dbus -v /etc/os-release:/etc/host-os-release -d --restart always --name debian --network host --env-file .env -w /home/ torizon/debian:2.6.0-bullseye sh -c "tail -f /dev/null"
Install vim package: docker exec -w /home/ debain sh -c 'apt-get update && apt-get install -y vim
Create file inside the volume: docker exec -it debian echo "test" > test.txt
Wait from 1 to 5 seconds and cut off the power to the toradex
Wait for the reboot, and check on the host with cat test.txt that the file is empty, partial (if you use fsync inside a C program) or missing

I noticed that if we wait from 15s to 20s then file is persistent after power cycle, this make me think that there’s some kind of period interval inside docker that persist the file (?)

If you perform the same steps directly on the host, with power cut off in less then 1s from when the file has been created there aren’t any issues.

jeremias.tx · July 22, 2025, 5:42pm

Okay I was able to reproduce similar results using similar steps as you described. However, I was also pretty easily able to get a empty file write when creating a file on the host as well and immediately cutting power.

This isn’t really a unique aspect of containers. Maybe file writes are a touch slower in containers, but file writes being left incomplete due to power cuts is a generic problem overall.

In this case the recommendation would be similar to how the issue is dealt with generally. If you have a critical file write/operation that must be written to disk even if a power cut occurred right after the operation, then using sync is the only way to guarantee the operation is flushed to the disk.

This works even in a container with a volume/bind-mount as I just tried for myself to be sure.

Best Regards,
Jeremias

raccoman · July 23, 2025, 7:39am

Hi,

I do use fsync inside my C prgram, but I still get the issue… How did you performed the sync test?

jeremias.tx · July 23, 2025, 5:12pm

How did you performed the sync test?

Well, I didn’t do a C program. Going off the steps you provided I just wrote the file like so echo "test" > test.txt" && sync, so I ran the actual sync CLI command. Then I powered the device down within 1 second of executing this command. As long as I executed sync I never noticed any issues with the file write being completed properly.

There may be some underlying difference between fsync and sync since you’re trying to sync a file process that goes between two filesystems (the container and host filesystems).

Best Regards,
Jeremias