RPMsg task stuck in rpmsg_rtos_init on M4

Hi @vix

Indeed, that could bring trouble !
Unfortunately, this can’t be our case for we don’t use any printf/debug_printf in our code.

We’ll make sure to check this if we start using them though !

hi @jaski.tx,

Did you find anything you could use in our logs/DTS ?

We’ll start using a debugger next week but we’d gladly investigate any leads you could provide us in the meantime !

Best regards,
Nicolas

hi @jaski.tx,

Apparently our team is experimenting troubles using the debugger, have you made any progress in your investigations so far ?

Best Regards,

Nicolas

hi @nicolas.b

Sorry for the delayed answer.

We compiled and debugged a first example with your memory mapping which was working fine. We will look tomorrow further to compile a rpmsg demo and check if this is working also fine or not?

Do you have any example (code for M4) which we can use to reproduce the error you are having? This will be very helpful.

What troubles are you having using the debugger?

hi @jaski.tx,

Thank you for your answer, I think that knowing the FreeRTOS RPmsg example works with our DTS and Linker could at least prove our code/init is the faulty part. Unless the problem comes from the RPmsg module (which still have this strange “virtio0.rpmsg-openamp-demo-channel.-1.0” name we cannot explain).

Unfortunately I don’t think I have the rights to post our whole M4 code here. Also I don’t think I can quickly extract a standalone RPmsg task from our application.

I’ll have to ask more info to my colleagues but it seems they lose the debugger handle after startup and when they pause the debbuger they always end up in a SDCard handler (which is not enabled nor called in our code)…

Best Regards,

Nicolas

Hi @jaski.tx

This RPMsg crash problem is now blocking us for any further developpement on M4 side (because we need the communication between cores to test our application), did the Toradex team find any leads/ideas we could investigate ?

Our last observations showed that if an app on linux side tries to send a buffer over RPMsg, it looks like the tty_rpmsg driver will crash the core resulting in the following core dump, periodically printed :

[   43.013845] INFO: rcu_sched self-detected stall on CPU
[   43.023911] 	1-...: (2099 ticks this GP) idle=26d/140000000000001/0 softirq=986/986 fqs=1050 
[   43.037350] 	 (t=2100 jiffies g=157 c=156 q=184)
[   43.046828] Task dump for CPU 1:
[   43.054763] RPMsg           R  running task        0   320    315 0x00000002
[   43.066607] [<8010e6a8>] (unwind_backtrace) from [<8010b38c>] (show_stack+0x10/0x14)
[   43.079095] [<8010b38c>] (show_stack) from [<801b61ec>] (rcu_dump_cpu_stacks+0xa8/0xc0)
[   43.091896] [<801b61ec>] (rcu_dump_cpu_stacks) from [<801768d4>] (rcu_check_callbacks+0x598/0x818)
[   43.110629] [<801768d4>] (rcu_check_callbacks) from [<801799d4>] (update_process_times+0x34/0x5c)
[   43.124636] [<801799d4>] (update_process_times) from [<8018ad44>] (tick_sched_timer+0x48/0x8c)
[   43.138336] [<8018ad44>] (tick_sched_timer) from [<8017aa9c>] (__hrtimer_run_queues+0x11c/0x1a8)
[   43.152169] [<8017aa9c>] (__hrtimer_run_queues) from [<8017acec>] (hrtimer_interrupt+0xb4/0x228)
[   43.165964] [<8017acec>] (hrtimer_interrupt) from [<805ef7b0>] (arch_timer_handler_phys+0x28/0x30)
[   43.184757] [<805ef7b0>] (arch_timer_handler_phys) from [<8016d880>] (handle_percpu_devid_irq+0x78/0x138)
[   43.204296] [<8016d880>] (handle_percpu_devid_irq) from [<80168d78>] (generic_handle_irq+0x24/0x34)
[   43.223408] [<80168d78>] (generic_handle_irq) from [<8016927c>] (__handle_domain_irq+0x5c/0xb4)
[   43.237351] [<8016927c>] (__handle_domain_irq) from [<8010147c>] (gic_handle_irq+0x48/0x8c)
[   43.250918] [<8010147c>] (gic_handle_irq) from [<8010be8c>] (__irq_svc+0x6c/0x90)
[   43.263560] Exception stack(0x8c8c3d80 to 0x8c8c3dc8)
[   43.273697] 3d80: f5aa0000 00000001 00010000 00300200 8c23e9c0 00010000 90060400 8d8c2600
[   43.287053] 3da0: 8c23f500 80832208 8d8c2600 00000013 00400000 8c8c3dd4 f5aa0020 80478ec8
[   43.300409] 3dc0: 400d0013 ffffffff
[   43.308925] [<8010be8c>] (__irq_svc) from [<80478ec8>] (MU_SendMessage+0x24/0x58)
[   43.321524] [<80478ec8>] (MU_SendMessage) from [<8060757c>] (imx_rpmsg_notify+0x34/0x48)
[   43.334832] [<8060757c>] (imx_rpmsg_notify) from [<8047a3a4>] (virtqueue_kick+0x2c/0x50)
[   43.348172] [<8047a3a4>] (virtqueue_kick) from [<80606c5c>] (rpmsg_send_offchannel_raw+0x264/0x318)
[   43.367576] [<80606c5c>] (rpmsg_send_offchannel_raw) from [<80606e28>] (virtio_rpmsg_send+0x2c/0x34)
[   43.387239] [<80606e28>] (virtio_rpmsg_send) from [<7f000090>] (rpmsgtty_write+0x48/0x8c [imx_rpmsg_tty])
[   43.407413] [<7f000090>] (rpmsgtty_write [imx_rpmsg_tty]) from [<804898b4>] (n_tty_write+0x354/0x444)
[   43.427446] [<804898b4>] (n_tty_write) from [<80484d10>] (tty_write+0x180/0x358)
[   43.440389] [<80484d10>] (tty_write) from [<801fd49c>] (__vfs_write+0x1c/0x110)
[   43.453139] [<801fd49c>] (__vfs_write) from [<801fe334>] (vfs_write+0xa4/0x1b4)
[   43.465774] [<801fe334>] (vfs_write) from [<801ff438>] (SyS_write+0x3c/0x90)
[   43.478054] [<801ff438>] (SyS_write) from [<80107600>] (ret_fast_syscall+0x0/0x48)
[   48.003847] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [RPMsg:320]
[   48.016053] Modules linked in: imx_rpmsg_tty
[   48.025266] CPU: 1 PID: 320 Comm: RPMsg Not tainted 4.9.87 #1
[   48.035881] Hardware name: Freescale i.MX7 Dual (Device Tree)
[   48.046447] task: 8d56e780 task.stack: 8c8c2000
[   48.055715] PC is at MU_SendMessage+0x24/0x58
[   48.064702] LR is at 0xf5aa0020
[   48.072363] pc : [<80478ec8>]    lr : [<f5aa0020>]    psr: 400d0013
[   48.072363] sp : 8c8c3dd4  ip : 00400000  fp : 00000013
[   48.092851] r10: 8d8c2600  r9 : 80832208  r8 : 8c23f500
[   48.102451] r7 : 8d8c2600  r6 : 90060400  r5 : 00010000  r4 : 8c23e9c0
[   48.113316] r3 : 00300200  r2 : 00010000  r1 : 00000001  r0 : f5aa0000
[   48.124061] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   48.135361] Control: 10c5387d  Table: 8c8b006a  DAC: 00000051
[   48.145156] CPU: 1 PID: 320 Comm: RPMsg Not tainted 4.9.87 #1
[   48.154832] Hardware name: Freescale i.MX7 Dual (Device Tree)
[   48.164421] [<8010e6a8>] (unwind_backtrace) from [<8010b38c>] (show_stack+0x10/0x14)
[   48.176010] [<8010b38c>] (show_stack) from [<8040cd6c>] (dump_stack+0x88/0x9c)
[   48.186989] [<8040cd6c>] (dump_stack) from [<8019fe0c>] (watchdog_timer_fn+0x24c/0x2c4)
[   48.198654] [<8019fe0c>] (watchdog_timer_fn) from [<8017aa9c>] (__hrtimer_run_queues+0x11c/0x1a8)
[   48.211213] [<8017aa9c>] (__hrtimer_run_queues) from [<8017acec>] (hrtimer_interrupt+0xb4/0x228)
[   48.223745] [<8017acec>] (hrtimer_interrupt) from [<805ef7b0>] (arch_timer_handler_phys+0x28/0x30)
[   48.240130] [<805ef7b0>] (arch_timer_handler_phys) from [<8016d880>] (handle_percpu_devid_irq+0x78/0x138)
[   48.257531] [<8016d880>] (handle_percpu_devid_irq) from [<80168d78>] (generic_handle_irq+0x24/0x34)
[   48.274705] [<80168d78>] (generic_handle_irq) from [<8016927c>] (__handle_domain_irq+0x5c/0xb4)
[   48.287755] [<8016927c>] (__handle_domain_irq) from [<8010147c>] (gic_handle_irq+0x48/0x8c)
[   48.300509] [<8010147c>] (gic_handle_irq) from [<8010be8c>] (__irq_svc+0x6c/0x90)
[   48.312446] Exception stack(0x8c8c3d80 to 0x8c8c3dc8)
[   48.321991] 3d80: f5aa0000 00000001 00010000 00300200 8c23e9c0 00010000 90060400 8d8c2600
[   48.334819] 3da0: 8c23f500 80832208 8d8c2600 00000013 00400000 8c8c3dd4 f5aa0020 80478ec8
[   48.347671] 3dc0: 400d0013 ffffffff
[   48.355803] [<8010be8c>] (__irq_svc) from [<80478ec8>] (MU_SendMessage+0x24/0x58)
[   48.368143] [<80478ec8>] (MU_SendMessage) from [<8060757c>] (imx_rpmsg_notify+0x34/0x48)
[   48.381206] [<8060757c>] (imx_rpmsg_notify) from [<8047a3a4>] (virtqueue_kick+0x2c/0x50)
[   48.394266] [<8047a3a4>] (virtqueue_kick) from [<80606c5c>] (rpmsg_send_offchannel_raw+0x264/0x318)
[   48.413204] [<80606c5c>] (rpmsg_send_offchannel_raw) from [<80606e28>] (virtio_rpmsg_send+0x2c/0x34)
[   48.432609] [<80606e28>] (virtio_rpmsg_send) from [<7f000090>] (rpmsgtty_write+0x48/0x8c [imx_rpmsg_tty])
[   48.452772] [<7f000090>] (rpmsgtty_write [imx_rpmsg_tty]) from [<804898b4>] (n_tty_write+0x354/0x444)
[   48.472818] [<804898b4>] (n_tty_write) from [<80484d10>] (tty_write+0x180/0x358)
[   48.485761] [<80484d10>] (tty_write) from [<801fd49c>] (__vfs_write+0x1c/0x110)
[   48.498510] [<801fd49c>] (__vfs_write) from [<801fe334>] (vfs_write+0xa4/0x1b4)
[   48.511145] [<801fe334>] (vfs_write) from [<801ff438>] (SyS_write+0x3c/0x90)
[   48.523427] [<801ff438>] (SyS_write) from [<80107600>] (ret_fast_syscall+0x0/0x48)

Hope this helps !

Best Regards,

Nicolas

It seems that I can only make my initial question “Private”, any further comment/answer doesn’t have this possibility in the small gear menu…

Thanks for your Input. I will check on this and come back to you.

Hi
For the moment you can create a new private question for the sources and put the link here.
Thanks.

hi @nicolas.b

You should now have the possibility to put a private comment by setting the visibility to “Viewable by Moderators” only.

hi @jaski.tx,

We now have more insight, after we finally got the debugger to work !
We reverted our linker changes and got back to the Toradex “DDR” one.

We reproduced and then solved our two different problems :

  1. The example “rpmsg/str_echo_freertos” was crashing in the same instruction as our code, in rpmsg_handler()

    void env_isr(int vector) {
    struct isr_info *info;

     assert(vector < ISR_COUNT);
     if (vector < ISR_COUNT)
     {
         info = &isr_table[vector];
         assert(NULL != info->isr);  <---- This assert is false
         if (NULL != info->isr)
         {
             env_disable_interrupt(vector);
             info->isr(vector, info->data);
             env_enable_interrupt(vector, info->priority, info->type);
         }
     }
    

    }

Thus we reverted the DTS modifications we did since the beginning of the project (our configuration memory was moved back in OCRAM) and after getting rid of the “reserved-memory” space, the example started to work properly again.
Maybe the “no-map” attribute resulted in linux not beeing able to reach this memory ? Anyway.

Now we’d reach the env_acquire_sync_lock(callback_sync); (rpmsg_rtos_init) instruction in our code on M4 side. Meaning linux could just come in and send a first byte.

  1. We then debbuged our code and it turns out it was now crashing in env_delete_sync_lock(callback_sync); .
    Further investigations (step in this instruction with debugger) showed that the vPortFree( pxQueue ); was crashing in this lovely file : heap_1.c

    void vPortFree( void pv )
    {
    /
    Memory cannot be freed using this scheme. See heap_2.c, heap_3.c and
    heap_4.c for alternative implementations, and the memory management pages of
    http://www.FreeRTOS.org for more information. */
    ( void ) pv;

     /* Force an assert as it is invalid to call this function. */
     configASSERT( pv == NULL );
    

    }

So basically that’s why the example code works and not ours, we had the file heap_1.c in our project which does not enable freeing allocated memory…

Replacing this with heap_2.c solved everything !

So yeah, problem solved.

Maybe I’ll create another topic for thisnext matter but we started talking about this here so…
Although I am still hesitating about switching back to our old linker. Performances have dropped dramatically since we moved back the code in DDR, yet I can’t help but thinking that using the OCRAM for code an both TCM memories for Data may end up causing alignement problems at their TCML-TCMU junction (plus we’re mixing up data and code on system and code buses, this is a huge mess haha).

Any advice on the matter ? I found this note on how to move kernel to get 2MB of cacheable DDR for the M4. link text
But can this lead to that kind of problem : https://community.nxp.com/thread/459977 ?

Thank you for your time and assistance !
Best Regards,

Nicolas

HI @nicolas.b

Perfect, that the Issue is solved. Thanks for your feedback.

Please ask a new Question for the performance Issue. Thanks.

Best regards,
Jaski