VF50 module periodically freezing

We are using VF50 modules with Colibri_VF_LinuxConsoleImageV2.5 and Kernel 4.1.15 on a custom board in a product.

The product essentially has a GPS receiver, generating a constant stream of serial data, on ttyLP2, an ethernet port and a serial console port.

We have a few hundred products in the field. However, we have a small number of customers complaining that the product freezes periodically.

When this happens the product cannot be contacted via ethernet. Also, the LED, which flashes using the Linux Kernel LED flash driver, stops flashing, which suggests the kernel may have crashed? The product starts up again OK after powering off.

At one site, the product appears particularly unstable. The product freezes every few hours. However, when the network cable is disconnected, the product seems OK, when reconnected it freezes again within a short period of time. They have the device plugged into a Gigabit ethernet switch, with the port configured for 100Mbit full duplex.

Are there any known issues with VF50 networking and Colibri_VF_LinuxConsoleImageV2.5 ?

HI Ashinton,

Do all VF50 have the same Hardware Version? When the device is freezed, can it still be contacted by serial console port or not? Is there any continuous dataflow on the ethernet port?

Thanks and best regards, Jaski

All VF50 modules are V1.2A.
We cannot reproduce the issue here, so I am not sure about serial console port contact. However, the units also has a 40x2 line LCD character display on GPIO which stops updating when the unit freezes - which suggests applications have stopped.
When operating correctly there should be a reasonably continuous flow of Ethernet data.

I did notice in the customers logs (unit connected to Gigabit switch):
kernel: fec 400d1000.ethernet eth0: Link is Up - 100Mbps/Half - flow control off

whereas in the units we have here (10/100 switch):
kernel: fec 400d1000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx

could no flow control be an issue?

Assuming the boards are fine, could it be a problem with the implementation/program? Maybe some data invasion or OoM problem under really particular circumstances?

I know that updating an already deployed equipment is a pain in many senses, but is it possible that you could test it in a more updated image? (Most recent stable version is 2.7) Maybe Angstrom patched some issues with the serial.

Hi Ashinton

It will be helpful to have a strace log on freezed device. Is the update of LCD characters done by a different application or is it the same application?

No flow control can be an issue. Read this article please.
If the flow control is disabled, it makes the most sense to disable it on both endpoints.

The LCD update application is separate from other applications. The module appears totally unresponsive - including kernel LED flashing.
Flow control is disabled on the switch. It is negotiated on the VF50, but disabled.
I should be getting the unit back from the customer next week, and so should be able to get more information.