I have a Colibri vf50 module running linux V2.5 on a custom board with two LAN ports. To test the ethernet ports, I have connected the VF50 directly to a PC (no switch or hub).
I ping the VF50 from the PC using:
sudo ping -f -i 0 -s 10000 192.168.0.80
With the toradex standard image, over a 1 hour period, no packets are lost.
However, when I utilize my own image which has PPS enabled in the kernel and device tree and dual lan as well as additional packages such as nginx and ntp, the module starts to lose packets over a period of 10 to 15 minutes:
$ sudo ping -f -i 0 -s 10000 192.168.0.80
PING 192.168.0.80 (192.168.0.80) 10000(10028) bytes of data.
--- 192.168.0.80 ping statistics ---
354929 packets transmitted, 354926 received, 0% packet loss, time 853831ms
rtt min/avg/max/mdev = 2.241/2.366/23.681/0.176 ms, pipe 2, ipg/ewma 2.405/2.406 ms
I suspect that this is not a board fault, since it seems fine with the Toradex image. Any ideas what may be causing this?
You mentioned that you have two NICs configured - is this issue experienced with both ports? Please share the device tree, any configs you add/changed in the kernel and the output of dmesg. Do you know which V2.5 image this image is based on (which rev/tag your meta-toradex was checkout’d out when you built the image)?
I get the same issue on both NIC’s.
The kernel is modified to enable “PPS support”, enable “Periodic Timer Tick (constant rate, no dynticks)” and enable “NXP PCF8523” RTC (see attached).
I see the same issues using the Toradex device tree (vf500-colibri-eval-v3.dtb → /boot/devicetree-zImage-vf500-colibri-eval-v3.dtb). (which disables the PPS support and external NIC).
dmesg output attached,
I’m a bit of a newby - not sure how to obtain rev/tag info, building does display “console-trdx-v2.5-r0”.link text
First of all, Ethernet is and was always best-effort. That said, in a typical full-duplex environment nowadays it might be quite common that you don’t loose any packages “ever”… But there is no guarantee.
The Ethernet controller buffers frames, but only so many. After that, the controller relys on Linux to respond and take care of the frames. If more frames arrive before Linux answers, frames will get lost. VF50 with the lack of its L2 cache and just 400MHz CPU freuqncy is not exactly a high performance module, hence it does not surprise me that from time to time the kernel can’t keep up with the amount of work pending which ultimately leads to a dropped frame. That said, it seems to be minimal, in your example we talk about a packet loss of 0.0008%!
I guess you see the difference because the additional driver(s) and/or Ethernet controller adds some additional interrupt load which ultimately lead to such situations described above. You might be able to take counter measures. Things you can try is using a different preemption model (e.g. CONFIG_PREEMPT) or assign higher priority to the kernel threads involved in network traffic (its a bit tricky to figure that out, but the realtime wiki has some hints. chrt helps to configure priorities).
You can post log Linux when the problem occurs: syslog and dmesg
Blog | Linkedin | B2Open