Apalis T30 Mainline Kernel MMC issue (At least on Kernel 4.14.0 and 4.16.3)

Dear Toradex Support,
I’m using Kernel 4.16.3 mainline kernel on Aplis T30 SOC and I see problems on rootfs running on MMC partition /dev/mmcblk2p2.
The problem is that after a while, I get this error when I use “dmesg” command and affection is that the eMMC does not respond and the command stuck and have noway until I boot the device and it again and again happens.

[ 7616.592761] INFO: task mongod:517 blocked for more than 120 seconds.
[ 7616.603602]       Not tainted 4.16.3-ELAR-Systems #4
[ 7616.613012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7616.632796] mongod          D    0   517      1 0x00000000
[ 7616.632839] [<c0a9ea80>] (__schedule) from [<c0a9f0e8>] (schedule+0x50/0xb4)
[ 7616.632860] [<c0a9f0e8>] (schedule) from [<c0aa3cc0>] (schedule_timeout+0x200/0x45c)
[ 7616.632875] [<c0aa3cc0>] (schedule_timeout) from [<c0a9f678>] (io_schedule_timeout+0x1c/0x3c)
[ 7616.632893] [<c0a9f678>] (io_schedule_timeout) from [<c0aa0014>] (wait_for_common_io.constprop.2+0xa4/0x124)
[ 7616.632920] [<c0aa0014>] (wait_for_common_io.constprop.2) from [<c03d1070>] (submit_bio_wait+0x60/0x84)
[ 7616.632948] [<c03d1070>] (submit_bio_wait) from [<c03deb2c>] (blkdev_issue_flush+0x84/0xac)
[ 7616.632969] [<c03deb2c>] (blkdev_issue_flush) from [<c02d9f6c>] (ext4_sync_file+0x3a8/0x4a8)
[ 7616.632995] [<c02d9f6c>] (ext4_sync_file) from [<c023e22c>] (SyS_msync+0x1a0/0x218)
[ 7616.633013] [<c023e22c>] (SyS_msync) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
[ 7616.633020] Exception stack(0xece23fa8 to 0xece23ff0)
[ 7616.633031] 3fa0:                   00000003 00000000 abe5d000 04000000 00000004 0000000c
[ 7616.633044] 3fc0: 00000003 00000000 b6f52ce8 00000090 02d7c420 0094ffc0 00000003 b5e5baa0
[ 7616.633052] 3fe0: 00000000 b5e5b7c0 00000000 b67dbd30
[ 7663.518799] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
[ 7753.768664] systemd[1]: systemd-journald.service: State 'stop-final-sigterm' timed out. Killing.
[ 7844.018538] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 7844.019572] systemd[1]: systemd-journald.service: Unit entered failed state.
[ 7844.019719] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 7844.022455] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[ 7844.028897] systemd[1]: Stopped Flush Journal to Persistent Storage.
[ 7844.028980] systemd[1]: Stopping Flush Journal to Persistent Storage...
[ 7844.029742] systemd[1]: Stopped Journal Service.
[ 7844.033939] systemd[1]: Starting Journal Service...
[ 7872.677949] mmc2: Card stuck being busy! mmc_poll_for_busy
[ 7872.687907] mmc2: Error -110 starting bkops
[ 7934.268388] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.
[ 8024.518321] systemd[1]: systemd-journald.service: State 'stop-final-sigterm' timed out. Killing.

Is there anything related to mainline kernel or there is something that you can advise to fix this issue?

Many thanks

@marcel.tx

Another set of errors continuously shows on “dmesg”:

[10192.432541] systemd[1]: Failed to start Journal Service.
[10192.441213] systemd[1]: systemd-journald.service: Unit entered failed state.
[10192.441296] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[10192.444010] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[10192.446722] systemd[1]: Stopped Journal Service.
[10192.451154] systemd[1]: Starting Journal Service...
[10282.681282] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.
[10372.931201] systemd[1]: systemd-journald.service: State 'stop-final-sigterm' timed out. Killing.
[10463.181073] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[10463.182200] systemd[1]: Failed to start Journal Service.
[10463.190738] systemd[1]: systemd-journald.service: Unit entered failed state.
[10463.190828] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[10463.193576] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[10463.196272] systemd[1]: Stopped Journal Service.
[10463.200848] systemd[1]: Starting Journal Service...
[10553.430995] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.
[10643.680532] systemd[1]: systemd-journald.service: State 'stop-final-sigterm' timed out. Killing.
[10733.930800] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[10733.932463] systemd[1]: Failed to start Journal Service.
[10733.941428] systemd[1]: systemd-journald.service: Unit entered failed state.
[10733.941508] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[10733.947568] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[10733.952598] systemd[1]: Stopped Journal Service.
[10733.957845] systemd[1]: Starting Journal Service...
[10824.180731] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.
[10914.430600] systemd[1]: systemd-journald.service: State 'stop-final-sigterm' timed out. Killing.
[11004.680524] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[11004.681644] systemd[1]: Failed to start Journal Service.
[11004.690872] systemd[1]: systemd-journald.service: Unit entered failed state.
[11004.690951] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[11004.696912] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[11004.701944] systemd[1]: Stopped Journal Service.
[11004.706931] systemd[1]: Starting Journal Service...
[11094.930454] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.

What exact hardware and software versions of things are you talking about?

I’m using Kernel 4.16.3 on Apalis T30 (V1.0E) module with Ixora Carrier Board(V1.0A).
The rootfs is Ubuntu 16.04.4LTS debootstraped from Base Ubuntu.

Had no issue with same setup on other boards from other manufacturer but on ApalisT30 SOC.
I’m just wondering if it is hardware issue or something related to Kernel!

Any help or advise will be appreciated

Most possibly, as you are using one of them older modules with Hynix resp. SKHynix eMMC parts, you are suffering from the same issue as described in the following question.

Oh,no!!!
So is it possible I get it replaced?

I don’t think so as there is nothing per se wrong with it. It’s just that certain drivers/settings may not be compatible with such eMMC parts but as outlined in above linked to question this may easily be worked around.

so the question is that how?
Do you have any solution or workaround for that?

Yes, well, did you actually look at the question I referred to above? What exactly is it that is unclear?