UBI warnings on ImageV2.8

Greetings,

We had an application based on LinuxImageV2.7. After a increasing the rate we write to the flash, it started to kernel panic (ubifs errors) upon booting. The module itself could be recovered after reflash.

Then after recommendation from Toradex we upgraded to ImageV2.8 (kernel 4.9.87 3bb6e32). Then, upon git clone our application, we got some warnings like this:

[  385.144017] Workqueue: writeback wb_workfn (flush-ubifs_0_3)
[  385.149734] [<8010e904>] (unwind_backtrace) from [<8010b5f4>] (show_stack+0x10/0x14)
[  385.157493] [<8010b5f4>] (show_stack) from [<804100b0>] (dump_stack+0x78/0x8c)
[  385.164734] [<804100b0>] (dump_stack) from [<80377d40>] (ubifs_log_start_commit+0x324/0x38c)
[  385.173187] [<80377d40>] (ubifs_log_start_commit) from [<803789e4>] (do_commit+0x1c4/0x4e0)
[  385.181552] [<803789e4>] (do_commit) from [<80362454>] (make_reservation+0x260/0x3a8)
[  385.189395] [<80362454>] (make_reservation) from [<80362e94>] (ubifs_jnl_write_data+0xe8/0x26c)
[  385.198105] [<80362e94>] (ubifs_jnl_write_data) from [<803652c8>] (do_writepage+0x94/0x200)
[  385.206471] [<803652c8>] (do_writepage) from [<801c4184>] (__writepage+0x14/0x40)
[  385.213969] [<801c4184>] (__writepage) from [<801c4858>] (write_cache_pages+0x180/0x3a8)
[  385.222073] [<801c4858>] (write_cache_pages) from [<801c4ac0>] (generic_writepages+0x40/0x60)
[  385.230613] [<801c4ac0>] (generic_writepages) from [<80229064>] (__writeback_single_inode+0x28/0x190)
[  385.239847] [<80229064>] (__writeback_single_inode) from [<80229860>] (writeback_sb_inodes+0x1f8/0x3c8)
[  385.249252] [<80229860>] (writeback_sb_inodes) from [<80229aa4>] (__writeback_inodes_wb+0x74/0xac)
[  385.258222] [<80229aa4>] (__writeback_inodes_wb) from [<80229c84>] (wb_writeback+0x1a8/0x1c0)
[  385.266758] [<80229c84>] (wb_writeback) from [<8022a1a0>] (wb_workfn+0x190/0x2b8)
[  385.274255] [<8022a1a0>] (wb_workfn) from [<8013a774>] (process_one_work+0x120/0x3e8)
[  385.282099] [<8013a774>] (process_one_work) from [<8013aa74>] (worker_thread+0x38/0x580)
[  385.290202] [<8013aa74>] (worker_thread) from [<8013ff10>] (kthread+0xf8/0x110)
[  385.297525] [<8013ff10>] (kthread) from [<801078d0>] (ret_from_fork+0x14/0x24)

Other error:

[  166.644109] UBIFS assert failed in ubifs_log_start_commit at 412 (pid 204)
    [  166.667562] CPU: 0 PID: 204 Comm: ubifs_bgt0_4 Tainted: G           O    4.9.87-dirty #1
    [  166.675666] Hardware name: Freescale i.MX7 Dual (Device Tree)
    [  166.681443] [<8010e904>] (unwind_backtrace) from [<8010b5f4>] (show_stack+0x10/0x14)
    [  166.689204] [<8010b5f4>] (show_stack) from [<804100b0>] (dump_stack+0x78/0x8c)
    [  166.696445] [<804100b0>] (dump_stack) from [<80377d40>] (ubifs_log_start_commit+0x324/0x38c)
    [  166.704899] [<80377d40>] (ubifs_log_start_commit) from [<803789e4>] (do_commit+0x1c4/0x4e0)
    [  166.713266] [<803789e4>] (do_commit) from [<80378e88>] (ubifs_bg_thread+0x188/0x18c)
    [  166.721022] [<80378e88>] (ubifs_bg_thread) from [<8013ff10>] (kthread+0xf8/0x110)
    [  166.728519] [<8013ff10>] (kthread) from [<801078d0>] (ret_from_fork+0x14/0x24)

Another one:

 [  193.872544] UBIFS assert failed in ubifs_log_start_commit at 412 (pid 279)
    [  193.901673] CPU: 0 PID: 279 Comm: kworker/u2:2 Tainted: G           O    4.9.87-dirty #1
    [  193.909778] Hardware name: Freescale i.MX7 Dual (Device Tree)
    [  193.915543] Workqueue: writeback wb_workfn (flush-ubifs_0_4)
    [  193.921261] [<8010e904>] (unwind_backtrace) from [<8010b5f4>] (show_stack+0x10/0x14)
    [  193.929021] [<8010b5f4>] (show_stack) from [<804100b0>] (dump_stack+0x78/0x8c)
    [  193.936262] [<804100b0>] (dump_stack) from [<80377d40>] (ubifs_log_start_commit+0x324/0x38c)
    [  193.944715] [<80377d40>] (ubifs_log_start_commit) from [<803789e4>] (do_commit+0x1c4/0x4e0)
    [  193.953080] [<803789e4>] (do_commit) from [<80362454>] (make_reservation+0x260/0x3a8)
    [  193.960924] [<80362454>] (make_reservation) from [<80362e94>] (ubifs_jnl_write_data+0xe8/0x26c)
    [  193.969634] [<80362e94>] (ubifs_jnl_write_data) from [<803652c8>] (do_writepage+0x94/0x200)
    [  193.977998] [<803652c8>] (do_writepage) from [<801c4184>] (__writepage+0x14/0x40)
    [  193.985496] [<801c4184>] (__writepage) from [<801c4858>] (write_cache_pages+0x180/0x3a8)
    [  193.993600] [<801c4858>] (write_cache_pages) from [<801c4ac0>] (generic_writepages+0x40/0x60)
    [  194.002141] [<801c4ac0>] (generic_writepages) from [<80229064>] (__writeback_single_inode+0x28/0x190)
    [  194.011376] [<80229064>] (__writeback_single_inode) from [<80229860>] (writeback_sb_inodes+0x1f8/0x3c8)
    [  194.020781] [<80229860>] (writeback_sb_inodes) from [<80229aa4>] (__writeback_inodes_wb+0x74/0xac)
    [  194.029751] [<80229aa4>] (__writeback_inodes_wb) from [<80229c84>] (wb_writeback+0x1a8/0x1c0)
    [  194.038287] [<80229c84>] (wb_writeback) from [<8022a1a0>] (wb_workfn+0x190/0x2b8)
    [  194.045784] [<8022a1a0>] (wb_workfn) from [<8013a774>] (process_one_work+0x120/0x3e8)
    [  194.053628] [<8013a774>] (process_one_work) from [<8013aa74>] (worker_thread+0x38/0x580)
    [  194.061731] [<8013aa74>] (worker_thread) from [<8013ff10>] (kthread+0xf8/0x110)
    [  194.069054] [<8013ff10>] (kthread) from [<801078d0>] (ret_from_fork+0x14/0x24)

And another after a sync:

root@iMX7:~# sync
[   40.931610] INFO: task sync:529 blocked for more than 10 seconds.
[   40.938084]       Not tainted 4.9.87-dirty #1
[   40.945608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[   40.954540] sync            D    0   529    520 0x00000000
[   40.960790] [<807a3d38>] (__schedule) from [<807a4144>] (schedule+0x4c/0xac)
[   40.969121] [<807a4144>] (schedule) from [<80229214>] (wb_wait_for_completion+0x48/0x78)
[   40.978104] [<80229214>] (wb_wait_for_completion) from [<80229440>] (sync_inodes_sb+0x94/0x204)
[   40.987711] [<80229440>] (sync_inodes_sb) from [<80204878>] (iterate_supers+0xc8/0x128)
[   40.996590] [<80204878>] (iterate_supers) from [<8022d894>] (sys_sync+0x34/0xa4)
[   41.004775] [<8022d894>] (sys_sync) from [<80107800>] (ret_fast_syscall+0x0/0x48)

Should I be worried about the warnings? This is a experimental procedure intended for testing, but should be more steps to follow or another kernel / image?

We are also using the kernel features:

Preemptible Kernel (Low-latency Desktop)
Timer Frequency (1000 Hz)

EDIT:
We are trying to achieve a more stable file system. We had several kernel panic from field where the boards have been reseted a few times. For that we changed from LinuxImageV2.7.4 to V2.8, and it seems to respond better, but these warnings are shown if we try to write to the flash the first time the module boots.
We are kind of worried from the experience from the kernel panic. A following step would be to have a read-only file system separated from the application partition, but in this step we’re going in the all-in-one.

It look likes the Ubi error only appears in the first time the board boots after reflash (run update && run setupdate). If after the first time I reboot the board and try to git clone on the second time linux boots it works.
Any reasons?

I’m not sure whether I really understand what exaktly it is that you are trying to achieve. Could you please elaborate some more and post complete log files?

HI,

I’ve just edited the problem.

Hi

Could you describe in a little more detail how you do stress the filesystem?

Also do you have the U-Boot/Linux output from the debug console from top to bottom?

Max

P.S.
I found very few of these warnings from our stress testing in the temperature cabinet. (i.e. from about 12 tests of around 400000+ test runs.

P.P.S.
This mail from one of the developers of UBIFS shows that at least he is not to concerned about the assert triggering.

Greetings

Usually, this happens the first time i write to the Flash after a nand erase. A way it happens quite frequently is when we execute a sequence like:

On U-Boot:

nand erase.ubi
... configs ...
setenv ubiargs ubi.mtd=ubi root=ubi0:rootfs rw rootfstype=ubifs ubi.fm_autoconvert=1 threadirqs
saveenv
run setupdate
run update

On Linux:

git clone [ 20 MB repo ]
sync

On Linux, it may happen in any command. Note that there is no reset after it boots for the first time. There is no warnings anymore after a reboot.

Hi @weinsen

Can you once try to remove the ubi.fm_autoconvert=1 from the ubiargs? Just to check if the problem still appears or if your problem is somehow related to fastmap.

Regards,
Stefan

Will try it soon. I’ll keep report of its results.

ok, we will wait for your feedback.