eMMC corruption?

jaski.tx · November 10, 2020, 7:40pm

Could you give us more Information to reproduce this issue? Maybe you could also send the one module which is having issues back to Toradex.

Best regards,
Jaski

qojote · November 17, 2020, 8:44am

Hi. So far the problem only arises on V1.1B modules. Additionally, it seems that only our kernel with the additional initramfs triggers the issue. When just removing the initramfs from that kernel the issue is (so far) not reproducable. I have appended an addtional log. Its quite clear that the eMMC has some kind of “problem”.

[   16.093822] random: crng init done
[  615.550096] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[  615.561341] mmc0: cache flush error -110
[  618.630209] mmc0: tried to reset card, got error -110
[  618.645262] blk_update_request: I/O error, dev mmcblk0, sector 8192
[  618.655489] Buffer I/O error on dev mmcblk0p1, logical block 0, lost sync page write
KEZU LOG fsck on /dev/mmcblk0p2 with ext4
[  628.960128] mmc0: Timeout waiting for hardware interrupt. retries left=0 opcode=12
[  628.977639] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[  628.993938] mmc0: sdhci: Sys addr:  0x134b6000 | Version:  0x00000002
[  629.010220] mmc0: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000020
[  629.026381] mmc0: sdhci: Argument:  0x00022000 | Trn mode: 0x0000003b
[  629.042426] mmc0: sdhci: Present:   0x01fd8009 | Host ctl: 0x00000011
[  629.058479] mmc0: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
[  629.074499] mmc0: sdhci: Wake-up:   0x00000008 | Clock:    0x000010ff
[  629.090487] mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
[  629.106417] mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
[  629.122431] mmc0: sdhci: AC12 err:  0x00000082 | Slot int: 0x00000003
[  629.138401] mmc0: sdhci: Caps:      0x07eb0000 | Caps_1:   0x0000a007
[  629.154372] mmc0: sdhci: Cmd:       0x0000123a | Max curr: 0x00ffffff
[  629.170372] mmc0: sdhci: Resp[0]:   0x00ff8080 | Resp[1]:  0xffffffff
[  629.186339] mmc0: sdhci: Resp[2]:   0x320f5913 | Resp[3]:  0x00000900
[  629.202274] mmc0: sdhci: Host ctl2: 0x00000000
[  629.216163] mmc0: sdhci: ADMA Err:  0x00000003 | ADMA Ptr: 0x18078204
[  629.232065] mmc0: sdhci: ============================================
[  629.249229] mmcblk0: error -110 sending status command, retrying
[  629.259449] mmcblk0: error -110 sending status command, retrying
[  629.269657] mmcblk0: error -110 sending status command, aborting
[  629.281056] mmc0: cache flush error -110
[  632.350212] mmc0: tried to reset card, got error -110
[  632.364793] blk_update_request: I/O error, dev mmcblk0, sector 139264
[  632.375045] blk_update_request: I/O error, dev mmcblk0, sector 139272
[  632.385203] blk_update_request: I/O error, dev mmcblk0, sector 139280
[  632.395319] blk_update_request: I/O error, dev mmcblk0, sector 139288

jaski.tx · November 18, 2020, 3:15pm

HI @qojote

Thanks for the log.

So you just include the initramfs in your Kernel or also soemthing else?

Best regards,
Jaski

qojote · November 19, 2020, 9:35am

Hi @jaski.tx
Besides including an initramfs we added some kernel configurations to support additional hardware. The same kernel with the extended configuration runs fine without the initramfs part (>10k power cycles). Because the resulting kernel is quite big (~20M) so we needed to changed the RAM layout in u-boot as well. We are trying to narrow the problem down to a minimal example and we really would appriciate your support (we need to deliver some units to customers soon). One more thing: Mounting the bootfs does always work. The error occurs only when mounting rootfs. And what’s really mysterious is the fact that a broken module is “cured” when mounting the bootfs and writing onto this partition (no special content). In that case mounting rootfs will succeed.

jaski.tx · November 24, 2020, 8:01pm

Hi @qojote

Thanks for this information. It seems to be a interesting issue.
Please share the kernel config and initramfs configuration and the memory layout? What is the RAM and eMMC layout in U-Boot?

Best regards,
Jaski

patdex · February 18, 2021, 9:07am

We were able to break the problem down to a small minimal example.
It can be created as follows:

checkout BSP 2.8b7
apply emmc_crash.patch to meta-openembedded layer | emmc_crash.patch
build console-tdx-image and initramfs-debug-image for apalis-imx6
add the following two lines to local.conf:

.

INITRAMFS_IMAGE = "initramfs-debug-image"
INITRAMFS_IMAGE_BUNDLE = "1"

build console-tdx-image again
replace zImage in Apalis-iMX6_Console-Image-Tezi* > *bootfs.tar.xz by zImage-initramfs-apalis-imx6.bin from deploy directory
install and run Apalis-iMX6_Console-Image-Tezi*

To produce the error, the system is started and disconnected from the power supply as soon as it is accessible on the network. After a few hundred boots, the system freezes with the following kernel output: boot.log

jaski.tx · February 23, 2021, 7:47am

Hi @patdex

Thanks for the information. How many SoM are showing this issue?
Could you reproduce the issue on Bsp5.1?

Thanks and best regards,
Jaski

mik · January 4, 2022, 6:31am

Hi @qojote / @jaski.tx , we are also seeing this issue.
Was a root-cause ever identified?
We have hundreds of devices in the field, so are quite concerned.
( we’re using Model: Toradex Apalis iMX6 Dual 1GB IT V1.1B )

jaski.tx · January 4, 2022, 2:06pm

HI @mik

Have you seen this issue? Could you reproduce this with Bsp 5.4?

Best regards,
Jaski

qojote · January 4, 2022, 2:15pm

Hi @mik,
In our initramfs all partitions were mounted to do some update stuff. We had to mount the rootfs statically and before every other partition to resolve our specific issue.
BR