TK1 freezes when using ethernet

Hi,
I am developing a custom image based on BSP 2.8 using the Toradex mainline kernel. Now i have noticed that the TK1 V1.2A freezes when the ethernet connection is used, e.g. when testing the connection with iperf. The same image on a V1.1A module is working without problems. Its hard to say what exactly wents wrong, because even the debug console hangs. The orange LED of the ethernet connector is still blinking but the device does not respond…

I am not aware of any such issue. But what exact BSP 2.8 and Linux kernel version are you using?

I assume you are aware of the difference between V1.1A and V1.2A modules and make sure resp. device trees are deployed and in effect when booting?

https://developer.toradex.com/knowledge-base/apalis-tk1-v1-2a-specific-software-modifications

BTW: The Toradex Easy Installer is also based on the Linux mainline kernel with 2.0b4 even using latest 5.4 LTS.

https://developer.toradex.com/software/toradex-easy-installer

http://git.toradex.com/cgit/linux-toradex.git/log/?h=toradex_5.4.y

The problem still persists. I am aware of the different device trees and the correct file is loaded. How do i determinate the exact BSP 2.8 version? The kernel version is:

Linux TK1 4.14.109-2.8.6 #2 SMP PREEMPT Fri Mar 6 10:57:42 UTC 2020 armv7l armv7l armv7l GNU/Linux

Some additional information about the used layers:

user@user:~/oe-core$ repo info
repo: warning: Python 2 is no longer supported; Please upgrade to Python 3.6+.

... A new version of repo (2.4) is available.
... You should upgrade soon:

    cp /home/user/oe-core/.repo/repo/repo /home/user/bin/repo

Manifest branch: 
Manifest merge branch: refs/heads/LinuxImageV2.8
Manifest groups: all,-notdefault
----------------------------
Project: meta-angstrom.git
Mount path: /home/user/oe-core/layers/meta-angstrom
Current revision: 4318892e08ea9102d29bdd92af83539bea985e4b
Manifest revision: 4318892e08ea9102d29bdd92af83539bea985e4b
Local Branches: 0
----------------------------
Project: meta-browser.git
Mount path: /home/user/oe-core/layers/meta-browser
Current revision: 75640e14e325479c076b6272b646be7a239c18aa
Manifest revision: 75640e14e325479c076b6272b646be7a239c18aa
Local Branches: 0
----------------------------
Project: meta-freescale.git
Mount path: /home/user/oe-core/layers/meta-freescale
Current revision: 61ab34ac6d664a229847b796ec20fd9f7c8ecbf4
Manifest revision: 61ab34ac6d664a229847b796ec20fd9f7c8ecbf4
Local Branches: 0
----------------------------
Project: meta-freescale-3rdparty.git
Mount path: /home/user/oe-core/layers/meta-freescale-3rdparty
Current revision: e71ace9ede9b58f2ed3381a53fdc814f8e963c60
Manifest revision: e71ace9ede9b58f2ed3381a53fdc814f8e963c60
Local Branches: 0
----------------------------
Project: meta-freescale-distro.git
Mount path: /home/user/oe-core/layers/meta-freescale-distro
Current revision: 51756d1c2058139c8a21f89b86cfd8007b71b7f0
Manifest revision: 51756d1c2058139c8a21f89b86cfd8007b71b7f0
Local Branches: 0
----------------------------
Project: meta-jetson-tk1.git
Mount path: /home/user/oe-core/layers/meta-jetson-tk1
Current revision: e8b87fe8da7c6fcffa37ab245f50082953cc1ee1
Manifest revision: e8b87fe8da7c6fcffa37ab245f50082953cc1ee1
Local Branches: 0
----------------------------
Project: meta-lxde.git
Mount path: /home/user/oe-core/layers/meta-lxde
Current revision: f436137fcc4ac700dc5c1b5e31e5b3c27568fc3e
Manifest revision: f436137fcc4ac700dc5c1b5e31e5b3c27568fc3e
Local Branches: 0
----------------------------
Project: meta-openembedded.git
Mount path: /home/user/oe-core/layers/meta-openembedded
Current revision: eae996301d9c097bcbeb8046f08041dc82bb62f8
Manifest revision: eae996301d9c097bcbeb8046f08041dc82bb62f8
Local Branches: 0
----------------------------
Project: meta-qt4
Mount path: /home/user/oe-core/layers/meta-qt4
Current revision: e290738759ef3f39c9e079eaa9b606a62107e5ba
Manifest revision: e290738759ef3f39c9e079eaa9b606a62107e5ba
Local Branches: 0
----------------------------
Project: meta-qt5.git
Mount path: /home/user/oe-core/layers/meta-qt5
Current revision: d8b531530fa42b59aa0a5b123d87a30d749cbcc4
Manifest revision: d8b531530fa42b59aa0a5b123d87a30d749cbcc4
Local Branches: 0
----------------------------
Project: meta-qt5-extra.git
Mount path: /home/user/oe-core/layers/meta-qt5-extra
Current revision: 79e26686520f2ce5f743975e90116b263a33697f
Manifest revision: 79e26686520f2ce5f743975e90116b263a33697f
Local Branches: 0
----------------------------
Project: meta-toradex-bsp-common.git
Mount path: /home/user/oe-core/layers/meta-toradex-bsp-common
Current revision: 78e1cdabe0baeb794945bd4926c28058f9f12aa9
Manifest revision: 78e1cdabe0baeb794945bd4926c28058f9f12aa9
Local Branches: 0
----------------------------
Project: meta-toradex-demos.git
Mount path: /home/user/oe-core/layers/meta-toradex-demos
Current revision: 014344d116f7c17c10be35d0d2a66a0038df6c75
Manifest revision: 014344d116f7c17c10be35d0d2a66a0038df6c75
Local Branches: 0
----------------------------
Project: meta-toradex-nxp.git
Mount path: /home/user/oe-core/layers/meta-toradex-nxp
Current revision: a5f5d85e52717e136e692478fb5d822c1ce70046
Manifest revision: a5f5d85e52717e136e692478fb5d822c1ce70046
Local Branches: 0
----------------------------
Project: meta-toradex-tegra.git
Mount path: /home/user/oe-core/layers/meta-toradex-tegra
Current revision: c10c7168712fa47e50440a7b7b012118939a3371
Manifest revision: c10c7168712fa47e50440a7b7b012118939a3371
Local Branches: 0
----------------------------
Project: openembedded-core.git
Mount path: /home/user/oe-core/layers/openembedded-core
Current revision: 3638cb32ba9ba32b4d498fc31ab7fdf82f0d2495
Manifest revision: 3638cb32ba9ba32b4d498fc31ab7fdf82f0d2495
Local Branches: 0
----------------------------
Project: bitbake.git
Mount path: /home/user/oe-core/layers/openembedded-core/bitbake
Current revision: 2e4845c526117d5524cff910afbb6f8212a3e199
Manifest revision: 2e4845c526117d5524cff910afbb6f8212a3e199
Local Branches: 0
----------------------------

Maybe this crash report is useful:

[  217.240157] INFO: rcu_preempt detected stalls on CPUs/tasks:<\r><\n>
[  217.246502] <9>(detected by 0, t=4214 jiffies, g=1170, c=1169, q=4)<\r><\n>
[  217.253150] All QSes seen, last rcu_preempt kthread activity 4212 (-8277--12489), jiffies_till_next_fqs=1, root ->qsmask 0x0<\r><\n>
[  217.264634] kworker/0:3     R  running task        0   403      2 0x00000000<\r><\n>
[  217.272184] Workqueue: events igb_watchdog_task<\r><\n>
[  217.277286] [<c0111abc>] (unwind_backtrace) from [<c010c5d8>] (show_stack+0x10/0x14)<\r><\n>
[  217.285298] [<c010c5d8>] (show_stack) from [<c0189220>] (rcu_check_callbacks+0xb8c/0xba4)<\r><\n>
[  217.293625] [<c0189220>] (rcu_check_callbacks) from [<c018eedc>] (update_process_times+0x34/0x5c)<\r><\n>
[  217.303329] [<c018eedc>] (update_process_times) from [<c01a08b8>] (tick_sched_timer+0x40/0x9c)<\r><\n>
[  217.312152] [<c01a08b8>] (tick_sched_timer) from [<c01901c4>] (__hrtimer_run_queues+0x154/0x3c4)<\r><\n>
[  217.321829] [<c01901c4>] (__hrtimer_run_queues) from [<c019064c>] (hrtimer_interrupt+0x98/0x1ec)<\r><\n>
[  217.331626] [<c019064c>] (hrtimer_interrupt) from [<c07083d4>] (arch_timer_handler_virt+0x28/0x30)<\r><\n>
[  217.341612] [<c07083d4>] (arch_timer_handler_virt) from [<c017cb90>] (handle_percpu_devid_irq+0x8c/0x2b8)<\r><\n>
[  217.352170] [<c017cb90>] (handle_percpu_devid_irq) from [<c0177bc8>] (generic_handle_irq+0x24/0x34)<\r><\n>
[  217.362190] [<c0177bc8>] (generic_handle_irq) from [<c01780d4>] (__handle_domain_irq+0x5c/0xb4)<\r><\n>
[  217.371890] [<c01780d4>] (__handle_domain_irq) from [<c010141c>] (gic_handle_irq+0x4c/0x90)<\r><\n>
[  217.381267] [<c010141c>] (gic_handle_irq) from [<c010d18c>] (__irq_svc+0x6c/0xa8)<\r><\n>
[  217.389024] Exception stack(0xc29a5e30 to 0xc29a5e78)<\r><\n>
[  217.394676] 5e20:                                     ee104918 0000c030 f0b2c030 f0b20000<\r><\n>
[  217.403499] 5e40: 00000000 0000c030 00000000 ee10477c ee104918 c38be594 ee1044c0 ee104860<\r><\n>
[  217.412324] 5e60: 00000000 c29a5e80 c05d5300 c05cee60 08070013 ffffffff<\r><\n>
[  217.419671] [<c010d18c>] (__irq_svc) from [<c05cee60>] (igb_rd32+0x24/0x7c)<\r><\n>
[  217.427554] [<c05cee60>] (igb_rd32) from [<c05d5300>] (igb_update_stats+0x74/0xccc)<\r><\n>
[  217.436105] [<c05d5300>] (igb_update_stats) from [<c05d604c>] (igb_watchdog_task+0xf4/0x72c)<\r><\n>
[  217.445495] [<c05d604c>] (igb_watchdog_task) from [<c013ba48>] (process_one_work+0x1ec/0x580)<\r><\n>
[  217.455091] [<c013ba48>] (process_one_work) from [<c013c8a8>] (worker_thread+0x50/0x598)<\r><\n>
[  217.464099] [<c013c8a8>] (worker_thread) from [<c0141748>] (kthread+0x150/0x158)<\r><\n>
[  217.472485] [<c0141748>] (kthread) from [<c01087c8>] (ret_from_fork+0x14/0x2c)<\r><\n>
[  217.480039] rcu_preempt kthread starved for 4232 jiffies! g1170 c1169 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x200 ->cpu=0<\r><\n>
[  217.491036] rcu_preempt     R    0     8      2 0x00000000<\r><\n>
[  217.497018] [<c09e5b1c>] (__schedule) from [<c09e625c>] (schedule+0x54/0xc0)<\r><\n>
[  217.504876] [<c09e625c>] (schedule) from [<c09ea680>] (schedule_timeout+0x80/0x420)<\r><\n>
[  217.513526] [<c09ea680>] (schedule_timeout) from [<c0187e30>] (rcu_gp_kthread+0x560/0x940)<\r><\n>
[  217.522808] [<c0187e30>] (rcu_gp_kthread) from [<c0141748>] (kthread+0x150/0x158)<\r><\n>
[  217.531340] [<c0141748>] (kthread) from [<c01087c8>] (ret_from_fork+0x14/0x2c)<\r><\n>
[  217.543844] Unhandled fault: imprecise external abort (0x1406) at 0x00000000<\r><\n>
[  217.551079] pgd = c0004000<\r><\n>
[  217.554060] [00000000] *pgd=00000000<\r><\n>
[  217.558696] Internal error: : 1406 [#1] PREEMPT SMP ARM<\r><\n>
[  217.564500] Modules linked in: apalis_tk1_k20_can gpio_apalis_tk1_k20 apalis_tk1_k20_ts apalis_tk1_k20_adc btusb btrtl btbcm btintel apalis_tk1_k20 ath10k_pci ath10k_core ath nouveau ttm xhci_tegra<\r><\n>
[  217.583391] CPU: 0 PID: 403 Comm: kworker/0:3 Not tainted 4.14.109-2.8.6 #2<\r><\n>
[  217.590504] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)<\r><\n>
[  217.597174] Workqueue: events igb_watchdog_task<\r><\n>
[  217.601902] task: c3168180 task.stack: c29a4000<\r><\n>
[  217.606565] PC is at igb_rd32+0x24/0x7c<\r><\n>
[  217.610618] LR is at igb_update_stats+0x74/0xccc<\r><\n>
[  217.615302] pc : [<c05cee60>]    lr : [<c05d5300>]    psr: 08070013<\r><\n>
[  217.621628] sp : c29a5e80  ip : 00000000  fp : ee104860<\r><\n>
[  217.627018] r10: ee1044c0  r9 : c38be594  r8 : ee104918<\r><\n>
[  217.632365] r7 : ee10477c  r6 : 00000000  r5 : 0000c030  r4 : 00000000<\r><\n>
[  217.639071] r3 : f0b20000  r2 : f0b2c030  r1 : 0000c030  r0 : ee104918<\r><\n>
[  217.645796] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none<\r><\n>
[  217.653177] Control: 10c5387d  Table: 82ba806a  DAC: 00000051<\r><\n>
[  217.659563] Process kworker/0:3 (pid: 403, stack limit = 0xc29a4210)<\r><\n>
[  217.666849] Stack: (0xc29a5e80 to 0xc29a6000)<\r><\n>
[  217.671379] 5e80: ed925b80 0000c030 00000000 c05d5300 00000000 00000000 00000000 00000000<\r><\n>
[  217.679699] 5ea0: 000f9060 c06dfe28 c3781350 ed91b068 ed91b0e0 00000000 68070013 00000001<\r><\n>
[  217.688045] 5ec0: c38be594 ee10481c ee104000 ee1044c0 ee104918 ee104850 c38be594 00000000<\r><\n>
[  217.696309] 5ee0: eef89b40 c05d604c c28e4f40 00000000 c28e4f04 ed850100 c388cae8 ee10481c<\r><\n>
[  217.704549] 5f00: c3007d00 eef89b40 eef8cd00 00000000 c38be594 00000000 eef89b40 c013ba48<\r><\n>
[  217.712873] 5f20: eef89b40 eef89b40 c3802d00 c3007d00 eef89b40 c3007d18 c3802d00 eef89b58<\r><\n>
[  217.721164] 5f40: ffffe000 00000008 eef89b40 c013c8a8 ffffe000 c38be05e c0b9602c 00000000<\r><\n>
[  217.729334] 5f60: ffffe000 c2f5f780 c2ce0640 00000000 c29a4000 c3007d00 c013c858 ee8abec8<\r><\n>
[  217.737508] 5f80: c2f5f79c c0141748 00000000 c2ce0640 c01415f8 00000000 00000000 00000000<\r><\n>
[  217.745682] 5fa0: 00000000 00000000 00000000 c01087c8 00000000 00000000 00000000 00000000<\r><\n>
[  217.753952] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000<\r><\n>
[  217.762934] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000<\r><\n>
[  217.771629] [<c05cee60>] (igb_rd32) from [<c05d5300>] (igb_update_stats+0x74/0xccc)<\r><\n>
[  217.779302] [<c05d5300>] (igb_update_stats) from [<c05d604c>] (igb_watchdog_task+0xf4/0x72c)<\r><\n>
[  217.787817] [<c05d604c>] (igb_watchdog_task) from [<c013ba48>] (process_one_work+0x1ec/0x580)<\r><\n>
[  217.796394] [<c013ba48>] (process_one_work) from [<c013c8a8>] (worker_thread+0x50/0x598)<\r><\n>
[  217.804572] [<c013c8a8>] (worker_thread) from [<c0141748>] (kthread+0x150/0x158)<\r><\n>
[  217.812110] [<c0141748>] (kthread) from [<c01087c8>] (ret_from_fork+0x14/0x2c)<\r><\n>
[  217.819366] Code: e5924000 f57ff04f e3740001 0a000001 (e1a00004) <\r><\n>
[  217.825548] ---[ end trace 4407eb87746bbc34 ]---<\r><\n>
[  217.832263] note: kworker/0:3[403] exited with preempt_count 1

Ok. I took me hours but finally i was able to locate the problem. My application sets the EMC frequency to minimum when no user process is running:

echo 12750000 > /sys/kernel/debug/emc/rate

After this command each action over ethernet (e.g. iperf) will freeze the module. This behaviour can also be seen on a default Toradex image.
Is this a bug? At least with the downstream kernel it was possible to reduce the EMC frequency to reduce the power consumption.

This is the first time I hear of any such issue but have to admit that we also never really played much with any such settings. Most likely there are some limitations as to how low one may go while still exercising PCIe which is what the Gigabit Ethernet requires underneath.

Could you please confirm that behaviour on one of your devices? Is the EMC frequency fixed like the GPU frequency or can it be dynamically set?

Hi @qojote

Which kernel version did you use?

I tried to reproduce the error with Bsp 3.0b3 (Apalis-TK1-Mainline_Console-Image 3.0b3.118 20200101) and TK1 was not frozen even if I was using Ethernet. Could you reproduce the issue on 3.0b3.

Best regards,
Jaski

@jaski.tx

I can reproduce this issue with Bsp 3.0b3 (Apalis-TK1-Mainline_Console-Image 3.0b3.118 20200101) installed via TEZI cloud.
System freezes when testing via iperf and set EMC frequency to minimum (echo 12750000 > /sys/kernel/debug/emc/rate)

Hi @qojote

Finally we could reproduce the issue. We will look internally for the rootcause and come back to you.

Best regards,
Jaski