eMMC Timeout Error on Colibri iMX6S

One customer is doing continuous reboot test (reboot in 0s to 20s) with Colibri iMX6S. After around 15k times of test, imx6 fails to boot with emmc timeout error.

[    2.149644] EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
[    2.157015] EXT4-fs (mmcblk0p2): write access will be enabled during recovery
[    2.318513] random: nonblocking pool is initialized
[    7.952390] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0x400e00
[   20.063585] INFO: task swapper/0:1 blocked for more than 10 seconds.
[   20.069949]       Not tainted 4.1.44-2.7.5+g18717e2 #1
[   20.075120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[   20.082956] swapper/0       D 8064f75c     0     1      0 0x00000000
[   20.089362] Backtrace: 
[   20.091844] [<8064f590>] (__schedule) from [<8064fb30>] (schedule+0x44/0x9c)
[   20.098922]  r10:862ed400 r9:00000000 r8:80650470 r7:00000002 r6:7fffffff r5:00000000
[   20.106857]  r4:86040000

...


[ 3902.453395] mmcblk0: timed out sending r/w cmd command, card status 0x400e00
[ 3912.483589] mmc0: Timeout waiting for hardware interrupt. retries left=0 opcode=0
[ 3912.493387] mmcblk0: timed out sending r/w cmd command, card status 0x400e00
[ 3922.523591] mmc0: Timeout waiting for hardware interrupt. retries left=0 opcode=0
[ 3922.533386] mmcblk0: timed out sending r/w cmd command, card status 0x400e00

The full log is attached here.link text

Unfortunately, that full log does not seem overly full (;-p). What exact module hardware version and serial number are you talking about? What exact carrier board is used? Is the provided power source stable? How exactly are the reboots done (e.g. soft reboot, power cuts or how/what exactly)? How many modules are tested?

Customer uses Colibri imx6S,256MB IT V1.1A, SN:05194655. It is on customer’s own carrier board. The power source is DC 12V adapter input. A MCU is used to control a relay to cut power by random time from 0s to 20s after carrier board is power on. Normally the application will be launched within 16 seconds. Only one module has been tested and it is the one encounters this issue.

What is the customers application doing? Is it possible to reinstall an image and check if the errors are still there? Could the customer try a different module?

It is car dash board so it is frequent that user just turns off the key when Linux on IMX6 is still running. Now this module is kept as sample and not reflashed. By now only one module is performed such test. More modules will be test in next month. Is it possible that accident power cut could ruin ext3 filesystem itself especially when IO writing is being occurring?

Thanks for the update. Let us know your results.

Is it possible that accident power cut could ruin ext3 filesystem itself especially when IO writing is being occurring?

As far I know the corruption can happen on some files but not entire partition. Usually fsck should handle if there are any errors on the partition. Please also have a loo here.