CPU4 kernel panic, reduced A72 speed

Hello,

I have been getting CPU4 kernel panic interrupts during the shutdown process (see log below). The A72 speed has been reduced in an overlay for better runtime stability in hot environments. Is there something I’m missing?

Background: I throttled the A72 speed to 1.296 GHz (down from 1.596 GHz) to prevent lock-ups while running in a hot environment. This solved a problem of locking-up (and rebooting) during start-up, however there is an interrupt error that occurs while attempting a clean shut-down.

Modifications: The build includes the following overlay to reduce the CPU speed

a72_opp_table: a72-opp-table {
        /delete-node/ opp-1596000000;
};

Log error:

2024-12-21T00:54:34.072Z	Dec 21 00:54:33 pppd[2127]: Sent 159736 bytes, received 35649 bytes.
2024-12-21T00:54:34.072Z	D[  166.623178] SError Interrupt on CPU4, code 0xbf000002 -- SError
[  166.623190] CPU: 4 PID: 35 Comm: kworker/4:0H Not tainted 5.15.77 #1
[  166.623202] Hardware name: EMU 3 Apalis iMX8QM V1.1 (DT)
[  166.623209] Workqueue: kblockd blk_mq_run_work_fn
[  166.623239] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  166.623251] pc : esdhc_readl_le+0xc/0x188
[  166.623266] lr : sdhci_send_command+0x5dc/0xd10
[  166.623279] sp : ffff8000097bb930
[  166.623283] x29: ffff8000097bb930 x28: ffff00080198f7c8 x27: ffff000801947000
[  166.623300] x26: ffff00080003b000 x25: 0000000000000000 x24: 0000000000000000
[  166.623313] x23: ffff00080198f948 x22: ffff800009022000 x21: 0000000000000003
[  166.623325] x20: ffff00080198f948 x19: ffff00080003b580 x18: ffff800009022b08
[  166.623337] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000002000
[  166.623349] x14: 0000000000000000 x13: ffff0008019bb200 x12: 0000000000000000
[  166.623362] x11: 000000088fd3f000 x10: ffff8000097bb958 x9 : 0000000000000e43
[  166.623374] x8 : 0000000000002000 x7 : 0000000000001000 x6 : ffff0008019bb200
[  166.623386] x5 : ffff00080003b810 x4 : 0000000000000000 x3 : ffff00080003b810
[  166.623398] x2 : 0000000000000008 x1 : 0000000000000024 x0 : ffff00080003b580
[  166.623415] Kernel panic - not syncing: Asynchronous SError Interrupt
[  166.623421] CPU: 4 PID: 35 Comm: kworker/4:0H Not tainted 5.15.77 #1
[  166.623431] Hardware name: EMU 3 Apalis iMX8QM V1.1 (DT)
[  166.623435] Workqueue: kblockd blk_mq_run_work_fn
[  166.623449] Call trace:
[  166.623452]  dump_backtrace+0x0/0x1b0
[  166.623468]  show_stack+0x14/0x58
[  166.623480]  dump_stack_lvl+0x64/0x7c
[  166.623492]  dump_stack+0x14/0x2c
[  166.623500]  panic+0x15c/0x364
[  166.623508]  add_taint+0x0/0xb0
[  166.623520]  arm64_serror_panic+0x68/0x78
[  166.623528]  do_serror+0x24/0x58
[  166.623534]  el1h_64_error_handler+0x30/0x48
[  166.623549]  el1h_64_error+0x74/0x78
[  166.623557]  esdhc_readl_le+0xc/0x188
[  166.623567]  sdhci_send_command_retry+0x44/0x1c0
[  166.623579]  sdhci_request+0xf4/0x140
[  166.623590]  __mmc_start_request+0x5c/0x140
[  166.623602]  mmc_start_request+0x80/0xa8
[  166.623612]  mmc_blk_mq_issue_rq+0x648/0xaf8
[  166.623620]  mmc_mq_queue_rq+0x1d4/0x330
[  166.623628]  blk_mq_dispatch_rq_list+0x124/0x940
[  166.623642]  blk_mq_do_dispatch_sched+0x2b4/0x318
[  166.623652]  __blk_mq_sched_dispatch_requests+0xf4/0x1d8
[  166.623661]  blk_mq_sched_dispatch_requests+0x34/0x70
[  166.623670]  __blk_mq_run_hw_queue+0x7c/0xa0
[  166.623682]  blk_mq_run_work_fn+0x20/0x28
[  166.623692]  process_one_work+0x15c/0x3c8
[  166.623706]  worker_thread+0x13c/0x548
[  166.623717]  kthread+0xf0/0x128
[  166.623729]  ret_from_fork+0x10/0x20
[  166.623740] SMP: stopping secondary CPUs
[  167.770801] SMP: failed to stop secondary CPUs 0,4
[  167.770814] Kernel Offset: disabled
[  167.770816] CPU features: 0x800820b1,20000842
[  167.770823] Memory Limit: none
[  168.033852] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Hello,

Thank you for reaching out with your issue regarding the kernel panic on CPU4 during shutdown. It seems like the changes made to the A72 CPU speed might be contributing to this problem. Here are some steps you can take to address this:

  1. Verify Overlay Changes: Double-check that the overlay modification to remove the opp-1596000000 node is correctly implemented and does not inadvertently affect other system operations.

  2. Review Known Issues: Look into any errata or known issues related to the Apalis iMX8QM module that might be relevant to your situation.

  3. Kernel Compatibility: Ensure that your kernel version (5.15.77) is compatible with the changes made. Kernel panics can sometimes result from mismatches between the kernel and hardware configurations.

  4. Analyze Error Logs: The error log points to an “SError Interrupt” on CPU4, possibly related to the SDHCI. Investigate any potential issues with the SDHCI driver or related components.

  5. Thermal Management: Since you are operating in a hot environment, confirm that your thermal management settings are adequate. While reducing CPU speed helps, additional cooling or thermal throttling settings might be necessary.

  6. Test Default Settings: Temporarily revert the CPU speed to its default setting to determine if the speed reduction is directly causing the issue.

By following these steps, you should be able to narrow down the cause of the kernel panic and find a suitable solution. If the problem persists, further investigation into your specific hardware and software configuration may be necessary.

Best regards,
Toradex AI Assistant

Please, also check if these links can help you solve your problem:

No HDMI output on IMX8QM and Ixora Carrier Board.

Hello @aaronlal , it seems to be an MMC communication-related error from the kernel panic log. Is there any external SD card or SDIO device connected? Can this issue be reproduced on our carrier e.g. Ixora?