IMX6ULL kernel warnings after suspend

Arnout · February 4, 2019, 11:57am

Hi,

To evaluate low-power modes of the IMX6ULL, I downloaded the latest console image through the Toradex EZ installer.
But when go into suspend and resume, I run into a lot of kernel errors.
Most of these are due to bluetooth (which is not supported on my module anyway), but also ethernet complains (see logs below in attachment here).
Is this a known issue?

Attachment snip:

root@colibri-imx6ull:~# echo mem > /sys/power/state
[  107.135964] PM: Syncing filesystems ... done.
[  107.183692] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  107.209819] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  107.253643] Suspending console(s) (use no_console_suspend to debug)
[  107.379384] ------------[ cut here ]------------
[  107.379421] WARNING: CPU: 0 PID: 291 at /workdir/oe/tmp-glibc/work-shared/colibri-imx6ull/kernel-source/drivers/clk/clk.c:594 clk_core_disable+0x8c/0x90
[  107.379451] Modules linked in: bluetooth(O) compat(O) usb_f_rndis u_ether mcp251x can_dev libcomposite configfs
[  107.379463] CPU: 0 PID: 291 Comm: sh Tainted: G        W  O    4.9.144-2.8.5+g5758a8e648a3 #1
[  107.379468] Hardware name: Freescale i.MX6 UltraLite (Device Tree)
[  107.379473] Backtrace:
[  107.379489] [<8010ba4c>] (dump_backtrace) from [<8010bd24>] (show_stack+0x18/0x1c)
[  107.379499]  r7:00000009 r6:60000093 r5:00000000 r4:80c1cab0
[  107.379511] [<8010bd0c>] (show_stack) from [<8042fca4>] (dump_stack+0x90/0xa4)
[  107.379524] [<8042fc14>] (dump_stack) from [<80125270>] (__warn+0xec/0x104)
[  107.379533]  r7:00000009 r6:809a6bec r5:00000000 r4:00000000
[  107.379548] [<80125184>] (__warn) from [<80125340>] (warn_slowpath_null+0x28/0x30)
..
..

[  107.383362] PM: suspend of devices complete after 105.413 msecs
[  107.385011] PM: late suspend of devices complete after 1.625 msecs
[  107.386547] PM: noirq suspend of devices complete after 1.514 msecs
[  107.386554] Disabling non-boot CPUs ...
[  107.387783] PM: noirq resume of devices complete after 1.098 msecs
[  107.389336] PM: early resume of devices complete after 1.248 msecs
[  107.391141] gpmi-nand 1806000.gpmi-nand: mode:5 ,failed in set feature.
[  107.403090] Suspended for 10.988 seconds
[  107.471227] PM: resume of devices complete after 81.861 msecs
[  108.597548] fec 20b4000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[  119.264035] Restarting tasks ... done.
root@colibri-imx6ull:~#
root@colibri-imx6ull:~#
root@colibri-imx6ull:~#
root@colibri-imx6ull:~# [  123.902177] fec 20b4000.ethernet eth0: rcv is not +last
[  123.916536] fec 20b4000.ethernet eth0: rcv is not +last
[  123.927360] fec 20b4000.ethernet eth0: rcv is not +last
[  123.937457] fec 20b4000.ethernet eth0: rcv is not +last
[  123.947285] fec 20b4000.ethernet eth0: rcv is not +last
[  123.956999] fec 20b4000.ethernet eth0: rcv is not +last
[  123.966883] fec 20b4000.ethernet eth0: rcv is not +last
[  123.976553] fec 20b4000.ethernet eth0: rcv is not +last
[  123.986215] fec 20b4000.ethernet eth0: rcv is not +last
[  124.452479] fec 20b4000.ethernet eth0: rcv is not +last

Kind regards,
Arnout

jaski.tx · February 4, 2019, 1:36pm

hi @Arnout

Could you provide the Software version of your module? Which carrier Board are you using?
Please share the dmesg log in a file?

Thanks and best regards,
Jaski

wang701 · March 3, 2019, 7:48am

Hello,

I am having a similar issue on my board as well. The kernel spits out the following after the board tries to go to suspend:

WARNING: CPU: 3 PID: 2054 at
/home/yang/isoblue-core/build/tmp-glibc/work-shared/apalis-imx6/kernel-source/drivers/clk/clk.c:594
clk_core_disable+0x8c/0x90
[  107.236739] Modules linked in: bluetooth usb_f_rndis u_ether libcomposite
configfs galcore(O)
[  107.245395] CPU: 3 PID: 2054 Comm: sh Tainted: G           O
4.9.144-2.8.5+g5758a8e #1
[  107.253585] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  107.260120] Backtrace:
[  107.262605] [<8010b760>] (dump_backtrace) from [<8010ba38>]
(show_stack+0x18/0x1c)
[  107.270189]  r7:00000009 r6:60010093 r5:00000000 r4:80e1aab0
[  107.275867] [<8010ba20>] (show_stack) from [<803fac24>]
(dump_stack+0x90/0xa4)
[  107.283106] [<803fab94>] (dump_stack) from [<80125928>] (__warn+0xec/0x104)
[  107.290081]  r7:00000009 r6:80b6bb00 r5:00000000 r4:00000000
[  107.295753] [<8012583c>] (__warn) from [<801259f8>]
(warn_slowpath_null+0x28/0x30)
[  107.303339]  r9:e21f4044 r8:e21f4010 r7:e210c0c0 r6:00000000 r5:e201fd80
r4:e201fd80
[  107.311097] [<801259d0>] (warn_slowpath_null) from [<8049043c>]
(clk_core_disable+0x8c/0x90)
[  107.319551] [<804903b0>] (clk_core_disable) from [<80490538>]
(clk_core_disable_lock+0x20/0x2c)
[  107.328260]  r5:e201fd80 r4:80010013
[  107.331849] [<80490518>] (clk_core_disable_lock) from [<80490568>]
(clk_disable+0x24/0x28)
[  107.340124]  r5:e21f4010 r4:e2651b40
[  107.343719] [<80490544>] (clk_disable) from [<807494dc>]
(sdhci_esdhc_suspend+0x108/0x128)
[  107.352008] [<807493d4>] (sdhci_esdhc_suspend) from [<805159f0>]
(platform_pm_suspend+0x34/0x5c)
[  107.360805]  r7:00000000 r6:00000000 r5:805159bc r4:e21f4010
[  107.366483] [<805159bc>] (platform_pm_suspend) from [<805205a0>]
(dpm_run_callback+0x3c/0xd0)
[  107.375028] [<80520564>] (dpm_run_callback) from [<80521564>]
(__device_suspend+0x124/0x3ec)
[  107.383480]  r8:00000002 r7:e21f4094 r6:00000000 r5:80ec6fc0 r4:e21f4010
[  107.390198] [<80521440>] (__device_suspend) from [<8052285c>]
(dpm_suspend+0x124/0x220)
[  107.398217]  r9:00000000 r8:80e25a50 r7:80ec6fc0 r6:80e25a50 r5:e21f4010
r4:e21f408c
[  107.405977] [<80522738>] (dpm_suspend) from [<80522c6c>]
(dpm_suspend_start+0x5c/0x60)
[  107.413911]  r10:e2de340c r9:e28bab00 r8:00000004 r7:80e70490 r6:00000003
r5:00000000
[  107.421749]  r4:00000002
[  107.424301] [<80522c10>] (dpm_suspend_start) from [<80169958>]
(suspend_devices_and_enter+0xb4/0x564)
[  107.433533]  r5:00000000 r4:80e70488
[  107.437122] [<801698a4>] (suspend_devices_and_enter) from [<8016a120>]
(pm_suspend+0x318/0x324)
[  107.445836]  r10:e2de340c r9:e28bab00 r8:00000004 r7:80e70490 r6:80e70494
r5:00000000
[  107.453673]  r4:00000003
[  107.456220] [<80169e08>] (pm_suspend) from [<80168a70>]
(state_store+0x78/0xc4)
[  107.463542]  r6:00000003 r5:80b30734 r4:00000003
[  107.468173] [<801689f8>] (state_store) from [<803fcba4>]
(kobj_attr_store+0x1c/0x28)
[  107.475930]  r9:e28bab00 r8:e317ff78 r7:00000000 r6:00000000 r5:e2de3400
r4:801689f8
[  107.483691] [<803fcb88>] (kobj_attr_store) from [<80278cf0>]
(sysfs_kf_write+0x40/0x4c)
[  107.491704]  r5:e2de3400 r4:803fcb88
[  107.495297] [<80278cb0>] (sysfs_kf_write) from [<8027849c>]
(kernfs_fop_write+0xf0/0x1d4)
[  107.503486]  r5:e2de3400 r4:00000004
[  107.507081] [<802783ac>] (kernfs_fop_write) from [<80209238>]
(__vfs_write+0x30/0x118)
[  107.515012]  r10:00000004 r9:00000004 r8:00000000 r7:e317ff78 r6:e317ff78
r5:e363acc0
[  107.522851]  r4:802783ac
[  107.525397] [<80209208>] (__vfs_write) from [<8020a07c>]
(vfs_write+0xac/0x170)
[  107.532721]  r9:00000004 r8:00000000 r7:e317ff78 r6:0077d500 r5:e363acc0
r4:00000004
[  107.540478] [<80209fd0>] (vfs_write) from [<8020aee0>] (SyS_write+0x44/0x98)
[  107.547540]  r9:00000004 r8:0077d500 r7:00000000 r6:00000000 r5:e363acc0
r4:e363acc0
[  107.555300] [<8020ae9c>] (SyS_write) from [<80107be0>]
(ret_fast_syscall+0x0/0x48)
[  107.562887]  r9:e317e000 r8:80107dc4 r7:00000004 r6:76f41db0 r5:0077d500
r4:00000004
[  107.570639] ---[ end trace 5e4d14e58bb78ad4 ]---
[  107.575340] ------------[ cut here ]------------

After waking up the board, the system doesn’t work anymore (e.g., systemctl status hangs). And the following shows up every once in a while:

[  166.014238] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[  166.033700] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00000002
[  166.053092] mmc0: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000010
[  166.072308] mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000023
[  166.091384] mmc0: sdhci: Present:   0x01fd8009 | Host ctl: 0x00000010
[  166.110348] mmc0: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
[  166.*129222] mmc0: sdhci: Wake-up:   0x00000008 | Clock:    0x000010ff
[  166.147944] mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00010000
[  166.166541] mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
[  166.185056] mmc0: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000003
[  166.203483] mmc0: sdhci: Caps:      0x07eb0000 | Caps_1:   0x0000a007
[  166.221798] mmc0: sdhci: Cmd:       0x00000d1a | Max curr: 0x00ffffff
[  166.240008] mmc0: sdhci: Resp[0]:   0x00ff8080 | Resp[1]:  0xffffffff
[  166.258106] mmc0: sdhci: Resp[2]:   0x320f5913 | Resp[3]:  0x00000900
[  166.276110] mmc0: sdhci: Host ctl2: 0x00000000
[  166.291998] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
[  166.309897] mmc0: sdhci: ============================================
[  166.330971] mmcblk0: error -110 sending status command, retrying

My custom image’s kernel is based on the Apalis-iMX6_LXDE-Image_2.8b5.156-20181228 tag of meta-toradex-nxp.

My hardware:
Apalis imx6Q 2GB IT V1.1C
Ixora motherboard V1.1A

Arnout · May 3, 2019, 8:50am

I just typed an entire post explaining how I fixed the issue, but when submitting it Toradex told me “attack detected - could not issue request”. K guys, fine

jaski.tx · May 3, 2019, 10:50am

Hi @Arnout

Thanks for your Input.
What exactly are you trying to do?

Which Issue did you fix? How do you wanted to submit your code?

Arnout · May 3, 2019, 10:59am

@@ -1589,77 +1592,21 @@ static int sdhci_esdhc_imx_remove(struct platform_device *pdev)
 static int sdhci_esdhc_suspend(struct device *dev)
 {
        struct sdhci_host *host = dev_get_drvdata(dev);
-       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
-       struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host);
-       int ret;
-
-       pm_runtime_get_sync(dev);
-
-       if (host->mmc->caps2 & MMC_CAP2_CQE) {
-               ret = cqhci_suspend(host->mmc);
-               if (ret)
-                       return ret;
-       }
-
-       if ((imx_data->socdata->flags & ESDHC_FLAG_STATE_LOST_IN_LPMODE) &&
-           (host->tuning_mode != SDHCI_TUNING_MODE_1)) {
-               mmc_retune_timer_stop(host->mmc);
-               mmc_retune_needed(host->mmc);
-       }
 
        if (host->tuning_mode != SDHCI_TUNING_MODE_3)
                mmc_retune_needed(host->mmc);
 
-       ret = sdhci_suspend_host(host);
-       pm_runtime_force_suspend(dev);
-
-       pinctrl_pm_select_sleep_state(dev);
-
-       if (!sdhci_sdio_irq_enabled(host)) {
-               clk_disable_unprepare(imx_data->clk_per);
-               clk_disable_unprepare(imx_data->clk_ipg);
-       }
-       clk_disable_unprepare(imx_data->clk_ahb);
-
-       pm_runtime_disable(dev);
-       pm_runtime_set_suspended(dev);
-
-       return ret;
+       return sdhci_suspend_host(host);
 }
 
 static int sdhci_esdhc_resume(struct device *dev)
 {
        struct sdhci_host *host = dev_get_drvdata(dev);
-       struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
-       struct pltfm_imx_data *imx_data = sdhci_pltfm_priv(pltfm_host);
-       int ret;
-
-       if (!sdhci_sdio_irq_enabled(host)) {
-               clk_prepare_enable(imx_data->clk_per);
-               clk_prepare_enable(imx_data->clk_ipg);
-       }
-       clk_prepare_enable(imx_data->clk_ahb);
-
-       pm_runtime_set_active(dev);
-       pm_runtime_enable(dev);
-
-       pinctrl_pm_select_default_state(dev);
 
        /* re-initialize hw state in case it's lost in low power mode */
        sdhci_esdhc_imx_hwinit(host);
 
-       ret = sdhci_resume_host(host);
-       if (ret)
-               return ret;
-
-       if (host->mmc->caps2 & MMC_CAP2_CQE)
-               ret = cqhci_resume(host->mmc);
-
-       pm_runtime_force_resume(dev);
-       pm_runtime_mark_last_busy(dev);
-       pm_runtime_put_autosuspend(dev);
-
-       return ret;
+       return sdhci_resume_host(host);
 }
 #endif

Arnout · May 3, 2019, 11:00am

The issue was that drivers/mmc/host/sdhci-esdhc-imx.c tried to disable clocks twice. In the upstream v5.0 kernel, this behavior is different, and I ported it. This fixes the issue for me. (This is the short explanation)

jaski.tx · May 3, 2019, 1:05pm

Ok, thanks for the patch. We will check it and come back to you.
Best regards,
Jaski