Kernel panic for jbd2/mmcblk0p1 blocked for more than 122 seconds

Dear Developer Community,
has anybody ever experienced a kernel panic related to internal Colibri iMX8X flash and its filesystem like mine below?

[  243.663016] INFO: task jbd2/mmcblk0p1-:494 blocked for more than 122 seconds.
[  243.670197]       Tainted: G           O      5.4.129-5.4.0+git.cb88cc157bfb #1-TorizonCore
[  243.678617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.686491] jbd2/mmcblk0p1- D    0   494      2 0x00000028
[  243.692019] Call trace:
[  243.694584]  __switch_to+0x144/0x1a0
[  243.698457]  __schedule+0x2f8/0x740
[  243.701950]  schedule+0x40/0xe0
[  243.705119]  io_schedule+0x18/0xf0
[  243.708529]  bit_wait_io+0x14/0x58
[  243.711937]  __wait_on_bit+0x70/0xe0
[  243.715590]  out_of_line_wait_on_bit+0x80/0xa0
[  243.720057]  __wait_on_buffer+0x2c/0x38
[  243.723947]  jbd2_journal_commit_transaction+0x15c0/0x1c20
[  243.729442]  kjournald2+0xb8/0x258
[  243.732874]  kthread+0x138/0x158
[  243.736125]  ret_from_fork+0x10/0x1c
[  243.739840] Kernel panic - not syncing: hung_task: blocked tasks
[  243.745869] SMP: stopping secondary CPUs
[  243.749817] Kernel Offset: disabled
[  243.753312] CPU features: 0x0002,20002008
[  243.757323] Memory Limit: none
[  243.760391] Rebooting in 5 seconds..

Context is the following
Hardware: Toradex Colibri iMX8 QuadXPlus 2GB Wi-Fi / BT IT V1.0D
OS: TorizonCore 5.4.0+build.10
Frequency and steps to reproduce: always, when deploying an image to the board with VS Code Torizon Extension, device connected via network (LAN/Ethernet).

I’m not interested for the issue itself, I could easily try to recover this development board and reinstall Torizon, but I’d like to understand a little better if it can be recovered and/or could happen “in production environment”. What do you think about?

Thanks for your suggestions and best regards,
ldvp

Greetings @ldvp,

You say this always happens to you? That’s very strange I’ve never seen such an error before with the VS Code extensions. It’s hard to say what could be going on here. Though I can attest that such similar errors are very rare in my experience.

I do have a couple of questions however that may help clarify things.

  • Do you have any other Colibri i.MX8X modules? Does this issue always happen to them as well, or just this specific module?
  • If you do recover and reflash the module does the issue still always happen?

At the moment I can’t seem to reproduce this issue myself. So my ability to investigate this on my side is a bit limited.

Best Regards,
Jeremias

Hi @jeremias.tx,
sorry if I was not clear enough: I’m having the issue only on one single board used for several months without issues, all the other boards are working correctly.

I could easily recover and reflash the board with the issue, but before doing that I’d like to understand better the problem, it’s root cause and if it could be recovered in a less invasive way.

I understand you cannot reproduce it, but I can and I am available to make tests and investigate you think can make sense or you can suggest. To me, it smells like a filesystem corruption or something similar.

BR,
ldvp

Ahh okay I understand now. Filesystem corruption while rare could be possible. Let’s try and tackle more common issues before we jump to filesystem corruption.

Usually a process being blocked for such a long period of time indicates a system overwhelmed. Either this be with not having enough memory, or a process producing a lot of I/O in a short time frame.

The particular task that got blocked is jbd2/mmcblk0p1 according to your kernel panic. jbd2 is the “Journaling Block Device” that sits between the file system and block device driver. As a start I’d suggest to see if a lot of logs anywhere are being produced to see if maybe the blockage is due to high I/O from logs being produced.

Best Regards,
Jeremias