Suspend/resume eMMC issue on T30

Dear Community.

I am testing a Colibri T30 in a Iris board and I am using the latest release from Toradex (actually discovered it on custom BSP based on this release but the problem occurs on a clean Toradex release as well).

I came across a problem with eMMC and suspend/resume to/from LP1. Upon resume it always fails to restore eMMC functionality and eventually re-mounts rootfs read-only but even then just produces a lot of errors. A log is attached below.

Thoughts?

.---O---.
|       |                  .-.           o o
|   |   |-----.-----.-----.| |   .----..-----.-----.
|       |     | __  |  ---'| '--.|  .-'|     |     |
|   |   |  |  |     |---  ||  --'|  |  |  '  | | | |
'---'---'--'--'--.  |-----''----''--'  '-----'-'-'-'
                -'  |
                '---'

The Angstrom Distribution colibri-t30 ttyS0

Angstrom v2015.12 - Kernel 3.1.10-V2.6b2+g34c5824

Colibri_T30_LinuxImageV2.6_20160331

colibri-t30 login: root
Last login: Thu Mar 31 16:11:15 UTC 2016 on ttyS0
root@colibri-t30:~# echo "+15" > /sys/class/rtc/rtc1/wakealarm && echo mem > /sys/power/state
[   74.462312] PM: Syncing filesystems ... done.
[   74.731498] tegradc tegradc.0: switching framebuffer to 640x480
[   74.738695] tegradc tegradc.0: Calculated sync href=0 vref=1
[   74.738721] tegradc tegradc.0: using mode 640x480 pclk=25175000 href=0 vref=1
[   74.800078] Tegra emc suspend: enabled bridge.emc
[   74.804982] Tegra cpufreq suspend: setting frequency to 620000 kHz
[   74.813518] Freezing user space processes ... (elapsed 0.01 seconds) done.
[   74.833690] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[   74.854511] [mmc]:mmc_queue_suspend:354 mmc0: blk_stop_queue start
[   74.860837] [mmc]:mmc_queue_suspend:354 mmc0: blk_stop_queue start
[   74.867191] [mmc]:mmc_queue_suspend:354 mmc0: blk_stop_queue start
[   74.873580] tegra_pwm tegra_pwm.3: pwm_disable called on disabled PWM
[   74.880183] tegra_pwm tegra_pwm.2: pwm_disable called on disabled PWM
[   74.886808] tegra_pwm tegra_pwm.1: pwm_disable called on disabled PWM
[   74.893593] alarm_suspend(e6b47000, 2)
[   74.900734] tegradc tegradc.1: suspend
[   74.904647] tegradc tegradc.0: suspend
[   74.908783] mpe mpe: suspend status: 0
[   74.912792] isp isp: suspend status: 0
[   74.916747] gr2d gr2d: suspend status: 0
[   74.920876] gr3d gr3d: suspend status: 0
[   74.925565] host1x host1x: suspend status: 0
[   74.929959] [mmc]:sdhci_pltfm_suspend:203 mmc1: ++
[   74.934903] [mmc]:sdhci_pltfm_suspend:218 mmc1: --
[   74.939822] [mmc]:sdhci_pltfm_suspend:203 mmc0: ++
[   74.958356] [mmc]:tegra_sdhci_suspend:885 mmc0: pull up data pin
[   74.965776] [mmc]:sdhci_pltfm_suspend:218 mmc0: --
[   74.973615] PM: suspend of devices complete after 119.348 msecs
[   74.980613] PM: late suspend of devices complete after 0.938 msecs
[   74.986958] Disabling non-boot CPUs ...
[   74.993652] Wake[31-0] level=0x10012
[   74.999149] Tegra3 wake[63-32] level=0x780
[   75.005127] Wake[31-0] enable=0x1050102
[   75.010805] Tegra3 wake[63-32] enable=0x700
[   75.016799] Entering suspend state LP1
[   75.022356] partition ve is left on before suspend
[   75.022356] partition vde is left on before suspend
[   75.022356] partition heg is left on before suspend
[   75.022434] Exited suspend state LP1
[   75.027702]  legacy wake status=0x0
[   75.032855]  tegra3 wake status=0x0
[   75.038050] Suspended for 13.828 seconds
[   75.044554] PM: early resume of devices complete after 0.629 msecs
[   75.060786] [mmc]:sdhci_pltfm_resume:229 mmc0: ++
[   75.067686] [mmc]:tegra_sdhci_resume:946 mmc0: disable data pin
[   75.145983] [mmc]:sdhci_pltfm_resume:242 mmc0: --
[   75.152500] [mmc]:sdhci_pltfm_resume:229 mmc1: ++
[   75.167025] [mmc]:sdhci_pltfm_resume:242 mmc1: --
[   75.173483] host1x host1x: resuming
[   75.195972] gr3d gr3d: resuming
[   75.200880] gr2d gr2d: resuming
[   75.205736] isp isp: resuming
[   75.216005] mpe mpe: resuming
[   75.220729] tegradc tegradc.0: resume
[   75.240931] tegradc tegradc.1: resume
[   75.316549] tegradc tegradc.1: using mode 0x0 pclk=0 href=0 vref=0
[   75.336808] alarm_resume(e6b47000)
[   75.341956] tegra_pwm tegra_pwm.1: pwm_disable called on disabled PWM
[   75.350199] tegra_pwm tegra_pwm.2: pwm_disable called on disabled PWM
[   75.358348] tegra_pwm tegra_pwm.3: pwm_disable called on disabled PWM
[   75.366511] [mmc]:mmc_queue_resume:375 mmc0: blk_start_queue start
[   75.374379] [mmc]:mmc_queue_resume:375 mmc0: blk_start_queue start
[   75.382217] [mmc]:mmc_queue_resume:375 mmc0: blk_start_queue start
[   75.390168] PM: resume of devices complete after 337.103 msecs
[   75.397921] Restarting tasks ...
[   75.401466] usb 1-1: USB disconnect, device number 2
[   75.417079] done.
[   75.421177] Tegra cpufreq resume: restoring frequency to 475000 kHz
[   75.434867] eth0: unregister 'asix' usb-tegra-ehci.1-1, ASIX AX88772B USB 2.0 Ethernet
[   75.451254] Tegra emc resume: disabled bridge.emc
[   75.479664] mmcblk0: error -110 transferring data, sector 287240, nr 2, cmd response 0x900, card status 0xc00
root@colibri-t30:~# [   75.580923] end_request: I/O error, dev mmcblk0, sector 287240
[   75.586782] Buffer I/O error on device mmcblk0p2, logical block 123140
[   75.593313] lost page write due to I/O error on mmcblk0p2
[   75.677222] mmcblk0: error -110 transferring data, sector 287242, nr 2, cmd response 0x900, card status 0xc00
[   75.697507] end_request: I/O error, dev mmcblk0, sector 287242
[   75.703364] Buffer I/O error on device mmcblk0p2, logical block 123141
[   75.709894] lost page write due to I/O error on mmcblk0p2
[   75.766768] ADDRCONF(NETDEV_UP): dummy0: link is not ready
[   75.772497] ADDRCONF(NETDEV_UP): sit0: link is not ready
[   75.777913] ADDRCONF(NETDEV_UP): ip6tnl0: link is not ready
[   75.783580] ADDRCONF(NETDEV_UP): rndis0: link is not ready
[   75.836205] usb 1-1: new high speed USB device number 3 using tegra-ehci
[   75.997687] usb 1-1: New USB device found, idVendor=0b95, idProduct=772b
[   76.004492] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   76.011664] usb 1-1: Product: AX88772B
[   76.015505] usb 1-1: Manufacturer: ASIX Elec. Corp.
[   76.020487] usb 1-1: SerialNumber: 000001
[   76.029322] ASIX USB Ethernet Adapter:v4.17.0 20:55:27 Mar 30 2016
[   76.029336]     http://www.asix.com.tw
[   76.798057] eth%d: status ep1in, 8 bytes period 11
[   76.821015] eth0: register 'asix' at usb-tegra-ehci.1-1, ASIX AX88772B USB 2.0 Ethernet, 00:14:2d:2a:77:c3
[   76.831979] eth0: rxqlen 0 --> 5
[   76.902884] eth0: ax88772b - Link status is: 0
[   76.947206] mmcblk0: error -110 transferring data, sector 287244, nr 2, cmd response 0x900, card status 0xc00
[   76.960013] end_request: I/O error, dev mmcblk0, sector 287244
[   76.965872] Buffer I/O error on device mmcblk0p2, logical block 123142
[   76.972402] lost page write due to I/O error on mmcblk0p2
[   77.011924] mmcblk0: error -110 transferring data, sector 287246, nr 2, cmd response 0x900, card status 0xc00
[   77.048913] end_request: I/O error, dev mmcblk0, sector 287246
[   77.054776] Buffer I/O error on device mmcblk0p2, logical block 123143
[   77.061307] lost page write due to I/O error on mmcblk0p2
[   77.087600] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   78.571983] eth0: ax88772b - Link status is: 1
[   78.577824] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   79.693254] mmcblk0: error -110 transferring data, sector 423484, nr 64, cmd response 0x900, card status 0xc00
[   79.703330] end_request: I/O error, dev mmcblk0, sector 423484
[   79.709184] end_request: I/O error, dev mmcblk0, sector 423486
[   79.715033] end_request: I/O error, dev mmcblk0, sector 423488
[   79.720877] end_request: I/O error, dev mmcblk0, sector 423490
[   79.726721] end_request: I/O error, dev mmcblk0, sector 423492
[   79.732564] end_request: I/O error, dev mmcblk0, sector 423494
[   79.738407] end_request: I/O error, dev mmcblk0, sector 423496
[   79.744249] end_request: I/O error, dev mmcblk0, sector 423498
[   79.750092] end_request: I/O error, dev mmcblk0, sector 423500
[   79.755935] end_request: I/O error, dev mmcblk0, sector 423502
[   79.761777] end_request: I/O error, dev mmcblk0, sector 423504
[   79.767619] end_request: I/O error, dev mmcblk0, sector 423506
[   79.773462] end_request: I/O error, dev mmcblk0, sector 423508
[   79.779304] end_request: I/O error, dev mmcblk0, sector 423510
[   79.785146] end_request: I/O error, dev mmcblk0, sector 423512
[   79.790988] end_request: I/O error, dev mmcblk0, sector 423514
[   79.796830] end_request: I/O error, dev mmcblk0, sector 423516
[   79.802671] end_request: I/O error, dev mmcblk0, sector 423518
[   79.808511] end_request: I/O error, dev mmcblk0, sector 423520
[   79.814353] end_request: I/O error, dev mmcblk0, sector 423522
[   79.820194] end_request: I/O error, dev mmcblk0, sector 423524
[   79.826035] end_request: I/O error, dev mmcblk0, sector 423526
[   79.831876] end_request: I/O error, dev mmcblk0, sector 423528
[   79.837717] end_request: I/O error, dev mmcblk0, sector 423530
[   79.843559] end_request: I/O error, dev mmcblk0, sector 423532
[   79.849432] end_request: I/O error, dev mmcblk0, sector 423534
[   79.855274] end_request: I/O error, dev mmcblk0, sector 423536
[   79.861115] end_request: I/O error, dev mmcblk0, sector 423538
[   79.866956] end_request: I/O error, dev mmcblk0, sector 423540
[   79.872796] end_request: I/O error, dev mmcblk0, sector 423542
[   79.878635] end_request: I/O error, dev mmcblk0, sector 423544
[   79.884476] end_request: I/O error, dev mmcblk0, sector 423546
[   79.891134] Aborting journal on device mmcblk0p2.
[   79.912061] mmcblk0: error -110 transferring data, sector 418308, nr 2, cmd response 0x900, card status 0xc00
[   79.922020] end_request: I/O error, dev mmcblk0, sector 418308
[   79.927871] Buffer I/O error on device mmcblk0p2, logical block 188674
[   79.934401] lost page write due to I/O error on mmcblk0p2
[   79.939879] JBD: I/O error detected when updating journal superblock for mmcblk0p2.
[   80.380477] EXT3-fs (mmcblk0p2): error: ext3_journal_start_sb: Detected aborted journal
[   80.417705] EXT3-fs (mmcblk0p2): error: remounting filesystem read-only

root@colibri-t30:~#

I will have to give that a try as well. Meanwhile what exact module version did you see this with and have you by any chance tried whether the same happens running one of our previous releases?

Module version is Col T30 1GB IT V1.1A.

I will give a test run on previous releases, have not done that yet. Will repport back.

Ok tested some old releases and got some interesting results.

Colibri_T30_LinuxImageV2.3Beta5_20141219 - WORKS!

Colibri_T30_LinuxImageV2.4Beta1_20150518 - DOES NOT WORK!

Might be easier to bisect now, will do a bisect as well since I am excited now :slight_smile:

Edit: The second time I ran the suspend/resume cycle on Colibri_T30_LinuxImageV2.3Beta5_20141219 I got following error:

[  178.704375] mmcblk0: timed out sending r/w cmd command, card status 0x900
[  178.711179] mmcblk0: not retrying timeout

But above seems like another issue possibly fixed by http://git.toradex.com/cgit/linux-toradex.git/commit/drivers/mmc?h=tegra&id=da189c7c55460940e980da6b474e31d664b20118 ?

And the one to blame is http://git.toradex.com/cgit/linux-toradex.git/commit/?h=tegra&id=7f7b042a978edfc6c7d1e14ec9c6ec01ffcc284b

This seems to fix it

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 1aed278..65990fd 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -2439,6 +2439,15 @@ int sdhci_suspend_host(struct sdhci_host *host, pm_message_t state)
        }
 
        if (mmc->card) {
+               /*
+                * If eMMC cards are put in sleep state, Vccq can be disabled
+                * but Vcc would still be powered on. In resume, we only restore
+                * the controller context. So, set MMC_PM_KEEP_POWER flag.
+                */
+               if (mmc_card_can_sleep(mmc) &&
+                       !(mmc->caps & MMC_CAP2_NO_SLEEP_CMD))
+                       mmc->pm_flags = MMC_PM_KEEP_POWER;
+
                ret = mmc_suspend_host(host->mmc);
                if (ret)
                        goto err_suspend_host;

It does seem a bit odd that it was removed in http://git.toradex.com/cgit/linux-toradex.git/commit/?h=tegra&id=7f7b042a978edfc6c7d1e14ec9c6ec01ffcc284b

Hi Mirza, thanks for the update! The change originally came from a Android Tegra 3 tree. It really seems that this is a merge error, the change in the Android kernel does not has this removal:
https://bitbucket.org/morfic/seven/commits/8eadc6d514b7838c398ff3499ab5f2e012e2fc06?at=master

Thanks, Mirza. This is probably indeed my bad and I committed your fix.