Problem with serial ports on T30 over colibri board

We are experiencing a problem with the driver of the serial ports on a T30 colibri board. After a few hours working correctly all three ports we are receiving this kernel log:

[85525.947333] ------------[ cut here ]------------
[85525.947396] WARNING: at drivers/tty/serial/tegra_hsuart.c:346 tegra_rx_dma_complete_callback+0x80/0xd0()
[85525.947412] Modules linked in: gpio_mcp23s08
[85525.947473] [<c0014860>] (unwind_backtrace+0x0/0xe8) from [<c05b8b98>] (dump_stack+0x20/0x24)
[85525.947511] [<c05b8b98>] (dump_stack+0x20/0x24) from [<c00588c0>] (warn_slowpath_common+0x5c/0x74)
[85525.947543] [<c00588c0>] (warn_slowpath_common+0x5c/0x74) from [<c0058994>] (warn_slowpath_null+0x2c/0x34)
[85525.947577] [<c0058994>] (warn_slowpath_null+0x2c/0x34) from [<c0305d38>] (tegra_rx_dma_complete_callback+0x80/0xd0)
[85525.947633] [<c0305d38>] (tegra_rx_dma_complete_callback+0x80/0xd0) from [<c003c728>] (tegra_dma_dequeue_req+0x120/0x140)
[85525.947672] [<c003c728>] (tegra_dma_dequeue_req+0x120/0x140) from [<c0305e2c>] (tegra_stop_rx+0xa4/0xc4)
[85525.947708] [<c0305e2c>] (tegra_stop_rx+0xa4/0xc4) from [<c0300e94>] (uart_close+0x20c/0x328)
[85525.947744] [<c0300e94>] (uart_close+0x20c/0x328) from [<c02e97b8>] (tty_release+0x1c8/0x48c)
[85525.947792] [<c02e97b8>] (tty_release+0x1c8/0x48c) from [<c00f16f4>] (fput+0x120/0x1dc)
[85525.947828] [<c00f16f4>] (fput+0x120/0x1dc) from [<c00edbe8>] (filp_close+0x78/0x80)
[85525.947868] [<c00edbe8>] (filp_close+0x78/0x80) from [<c005c514>] (put_files_struct+0xa8/0xf8)
[85525.947901] [<c005c514>] (put_files_struct+0xa8/0xf8) from [<c005c5fc>] (exit_files+0x48/0x4c)
[85525.947930] [<c005c5fc>] (exit_files+0x48/0x4c) from [<c005ca6c>] (do_exit+0x27c/0x6c4)
[85525.947960] [<c005ca6c>] (do_exit+0x27c/0x6c4) from [<c005cfc0>] (sys_exit_group+0x0/0x20)
[85525.948008] [<c005cfc0>] (sys_exit_group+0x0/0x20) from [<c006ad48>] (get_signal_to_deliver+0x4d4/0x50c)
[85525.948061] [<c006ad48>] (get_signal_to_deliver+0x4d4/0x50c) from [<c0010b94>] (do_signal+0x26c/0x4ec)
[85525.948100] [<c0010b94>] (do_signal+0x26c/0x4ec) from [<c001112c>] (do_notify_resume+0x28/0x60)
[85525.948134] [<c001112c>] (do_notify_resume+0x28/0x60) from [<c000deb8>] (work_pending+0x24/0x28)
[85525.948155] ---[ end trace 0911cf36931b7105 ]---
[85525.948179] tegra_uart tegra_uart.1: Not able to copy uart data to tty layer Req 100 and coped 0

We are running Debian 8.5, kernel 3.1.10

Closing the serial ports and reopening them solve the failure and all works again… but the software is not capable of detecting this so we need to stop processes and rerun them again.

thank you very much

We have another T30 and works perfectly… maybe a hardware issue?

it happens again with different t30 module and different viola boad. Running for 10 days and then same kernel error

Oct 21 21:10:42 RTT67 kernel: [213670.358855] ------------[ cut here ]------------
Oct 21 21:10:42 RTT67 kernel: [213670.368339] WARNING: at drivers/tty/serial/tegra_hsuart.c:346 tegra_rx_dma_complete_callback+0x80/0xd0()
Oct 21 21:10:42 RTT67 kernel: [213670.387220] Modules linked in: gpio_mcp23s08
Oct 21 21:10:42 RTT67 kernel: [213670.396246] [<c0014860>] (unwind_backtrace+0x0/0xe8) from [<c05b8b98>] (dump_stack+0x20/0x24)
Oct 21 21:10:42 RTT67 kernel: [213670.413988] [<c05b8b98>] (dump_stack+0x20/0x24) from [<c00588c0>] (warn_slowpath_common+0x5c/0x74)
Oct 21 21:10:42 RTT67 kernel: [213670.432180] [<c00588c0>] (warn_slowpath_common+0x5c/0x74) from [<c0058994>] (warn_slowpath_null+0x2c/0x34)
Oct 21 21:10:42 RTT67 kernel: [213670.451088] [<c0058994>] (warn_slowpath_null+0x2c/0x34) from [<c0305d38>] (tegra_rx_dma_complete_callback+0x80/0xd0)
Oct 21 21:10:42 RTT67 kernel: [213670.471016] [<c0305d38>] (tegra_rx_dma_complete_callback+0x80/0xd0) from [<c003c728>] (tegra_dma_dequeue_req+0x120/0x140)
Oct 21 21:10:42 RTT67 kernel: [213670.491476] [<c003c728>] (tegra_dma_dequeue_req+0x120/0x140) from [<c0305560>] (do_handle_rx_dma+0x38/0x68)
Oct 21 21:10:42 RTT67 kernel: [213670.510708] [<c0305560>] (do_handle_rx_dma+0x38/0x68) from [<c03060d8>] (tegra_uart_isr+0x68/0x2f4)
Oct 21 21:10:42 RTT67 kernel: [213670.529350] [<c03060d8>] (tegra_uart_isr+0x68/0x2f4) from [<c009a8e8>] (handle_irq_event_percpu+0xb4/0x25c)
Oct 21 21:10:42 RTT67 kernel: [213670.548705] [<c009a8e8>] (handle_irq_event_percpu+0xb4/0x25c) from [<c009aadc>] (handle_irq_event+0x4c/0x6c)
Oct 21 21:10:42 RTT67 kernel: [213670.568149] [<c009aadc>] (handle_irq_event+0x4c/0x6c) from [<c009d224>] (handle_fasteoi_irq+0xe8/0x10c)
Oct 21 21:10:42 RTT67 kernel: [213670.587140] [<c009d224>] (handle_fasteoi_irq+0xe8/0x10c) from [<c009a2b0>] (generic_handle_irq+0x30/0x40)
Oct 21 21:10:42 RTT67 kernel: [213670.606334] [<c009a2b0>] (generic_handle_irq+0x30/0x40) from [<c000e824>] (handle_IRQ+0xa0/0xbc)
Oct 21 21:10:42 RTT67 kernel: [213670.624832] [<c000e824>] (handle_IRQ+0xa0/0xbc) from [<c0008440>] (asm_do_IRQ+0x18/0x1c)
Oct 21 21:10:42 RTT67 kernel: [213670.642720] [<c0008440>] (asm_do_IRQ+0x18/0x1c) from [<c000d9b8>] (__irq_svc+0x38/0xd0)
Oct 21 21:10:42 RTT67 kernel: [213670.660576] Exception stack(0xc0883ee0 to 0xc0883f28)
Oct 21 21:10:42 RTT67 kernel: [213670.670648] 3ee0: 00000004 00000000 00000000 000f4240 60d41c95 0000c26d 00000000 e6afe410
Oct 21 21:10:42 RTT67 kernel: [213670.688722] 3f00: c085c0ac e6afe400 c09111e4 c0883f54 00000c48 c0883f28 c005e020 c003ee38
Oct 21 21:10:42 RTT67 kernel: [213670.706767] 3f20: 200f0013 ffffffff
Oct 21 21:10:42 RTT67 kernel: [213670.715172] [<c000d9b8>] (__irq_svc+0x38/0xd0) from [<c003ee38>] (tegra_idle_enter_lp3+0xe4/0xf4)
Oct 21 21:10:42 RTT67 kernel: [213670.733776] [<c003ee38>] (tegra_idle_enter_lp3+0xe4/0xf4) from [<c03d11ec>] (cpuidle_idle_call+0x1c4/0x314)
Oct 21 21:10:42 RTT67 kernel: [213670.753261] [<c03d11ec>] (cpuidle_idle_call+0x1c4/0x314) from [<c000ee98>] (cpu_idle+0xc0/0x104)
Oct 21 21:10:42 RTT67 kernel: [213670.771752] [<c000ee98>] (cpu_idle+0xc0/0x104) from [<c05ab140>] (rest_init+0x94/0xac)
Oct 21 21:10:42 RTT67 kernel: [213670.789369] [<c05ab140>] (rest_init+0x94/0xac) from [<c082f854>] (start_kernel+0x2cc/0x324)
Oct 21 21:10:42 RTT67 kernel: [213670.807400] ---[ end trace 6246f745822b16aa ]---
Oct 21 21:10:43 RTT67 kernel: [213670.958842] ------------[ cut here ]------------

and this one happens ocasionally:

 6855.636998] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0x400900
[ 6855.646983] mmcblk0: error -110 transferring data, sector 72112, nr 56, cmd response 0x900, card status 0x0
[ 6855.657300] end_request: I/O error, dev mmcblk0, sector 72112
[ 6855.663267] end_request: I/O error, dev mmcblk0, sector 72120
[ 6855.669207] end_request: I/O error, dev mmcblk0, sector 72128
[ 6855.675128] end_request: I/O error, dev mmcblk0, sector 72136
[ 6855.681051] end_request: I/O error, dev mmcblk0, sector 72144
[ 6855.686992] end_request: I/O error, dev mmcblk0, sector 72152
[ 6855.692928] end_request: I/O error, dev mmcblk0, sector 72160
[ 6855.699009] Aborting journal on device mmcblk0p2.
[ 6858.230753] EXT3-fs (mmcblk0p2): error: ext3_journal_start_sb: Detected aborted journal
[ 6858.239823] EXT3-fs (mmcblk0p2): error: remounting filesystem read-only

I doubt this being a hardware issue. What exact hardware/software model/version are you using?

And how exactly does your test setup look like?

This looks like the following issue meanwhile supposed to be fixed.

Are you doing any unexpected power-cuts in your testing? Or how does your test setup look like?

We have four sets with viola 1.2A, colibri t30 1Gb IT V1.1A. A board with a MAX236, MCP23S17, ACCELEROMETER and BAROMETER. Running a debian 8.5, linux kernel 3.1.10 (your stock kernel with mcp23s17 enabled).
The test suite is an access control system (running for years on several hardware/arquitechture, from i386 to arm systems).
The mmcplk0p2 error appeared when the systems was two days online (this is the first time this error appears).
[upload|i5rjxWDndfXvQBFaclicTCyh5+8=][upload|xy81gkWkzj4tSIBfgp+HX+XgnBA=]

@marcel.tx We are facing a similar issue here on an Apalis T30, the application that read/writes from/to the serial port hangs and the dmesg is flooded with messages like those:

[21614.645244] tegra_uart tegra_uart.2: Not able to copy uart data to tty layer Req 128 and coped 0
[21614.649784] ------------[ cut here ]------------
[21614.649846] WARNING: at /build/t30_old/build/tmp-glibc/work-shared/apalis-t30/kernel-source/drivers/tty/serial/tegra_hsuart.c:346 tegra_rx_dma_complete_callback+0xa8/0xc4()
[21614.649869] Modules linked in:
[21614.649919] [<c0014120>] (unwind_backtrace+0x0/0xe8) from [<c004fe40>] (warn_slowpath_common+0x54/0x64)
[21614.649950] [<c004fe40>] (warn_slowpath_common+0x54/0x64) from [<c004feec>] (warn_slowpath_null+0x1c/0x24)
[21614.649981] [<c004feec>] (warn_slowpath_null+0x1c/0x24) from [<c01ef64c>] (tegra_rx_dma_complete_callback+0xa8/0xc4)
[21614.650021] [<c01ef64c>] (tegra_rx_dma_complete_callback+0xa8/0xc4) from [<c0037380>] (tegra_dma_dequeue_req+0xa8/0x164)
[21614.650054] [<c0037380>] (tegra_dma_dequeue_req+0xa8/0x164) from [<c01efb90>] (do_handle_rx_dma+0x64/0xcc)
[21614.650085] [<c01efb90>] (do_handle_rx_dma+0x64/0xcc) from [<c01efec0>] (tegra_uart_isr+0x29c/0x318)
[21614.650129] [<c01efec0>] (tegra_uart_isr+0x29c/0x318) from [<c008ac08>] (handle_irq_event_percpu+0x64/0x174)
[21614.650164] [<c008ac08>] (handle_irq_event_percpu+0x64/0x174) from [<c008ad54>] (handle_irq_event+0x3c/0x5c)
[21614.650198] [<c008ad54>] (handle_irq_event+0x3c/0x5c) from [<c008d16c>] (handle_fasteoi_irq+0x9c/0x150)
[21614.650230] [<c008d16c>] (handle_fasteoi_irq+0x9c/0x150) from [<c008a514>] (generic_handle_irq+0x28/0x38)
[21614.650268] [<c008a514>] (generic_handle_irq+0x28/0x38) from [<c000e88c>] (handle_IRQ+0x58/0xac)
[21614.650297] [<c000e88c>] (handle_IRQ+0x58/0xac) from [<c000db38>] (__irq_svc+0x38/0xd0)
[21614.650336] [<c000db38>] (__irq_svc+0x38/0xd0) from [<c0068ccc>] (posix_ktime_get_ts+0x0/0x14)
[21614.650363] [<c0068ccc>] (posix_ktime_get_ts+0x0/0x14) from [<00000004>] (0x4)
[21614.650380] ---[ end trace ea721b0c11b1009f ]---

Do you have any idea what might be wrong here? What information do you need from us for further debugging?

Please avoid hijacking old questions. Ask a new question optionally referring to this existing one and state exactly what versions of things you are talking about. Then do include all relevant log files like serial console output or journals.