U-Boot Driver for Ethernet Controller

No, we are not aware of any such issue in Linux.

And is it also not reproducible in u-boot and linux with v2.7b3?

As mentioned before I only saw this once using U-Boot with my misconfigured switch. Since then I have not ever seen it again. With Linux we did extensive testing in the temperature chamber without ever having seen any such issue.

After we recognized the issue we had to reproduce the behaviour by repetitive testing. Because it occurs in an unpredictable manner either one has to (a) reset the SOM within the u-boot prompt over and over again or by (b) powering the SOM off/on over and over again.

As we have no control about what kind of network our customers have it is no option for us to refer to possibly “incompatible” network components or “incompatible” network component configurations. (Especially if the behaviour occurs with widespread components.)

BTW: What log files are most valuable for you?

The Intel i210 Specification Update added an errata (April 28, 2017) on page 30 about an “Internal Clock Malfunction” which could relate to the observed behaviour…

The bug is fixed in Revision 3. I was not able to figure out the revision of the WGI210-AT but I would not expect this revision assembled on my SOM as the errata information is quite new. (The bug does not affect “Copper SKUs”. I am not sure if the SKU on the TK1 SOM is a copper SKU at all…).

Other possibly relevant issues:

Errata “25. Slow System Clock” on page 25 manifests as “No link can be established until the next power cycle.”

Errata “18. Failure to Establish PCIe Link After Power Up” on page 23 which manifests as “Failure to establish PCIe link.”

For a start the serial boot log is definitely helpful. Then the relevant journal (see journalctl) may contain further information. If it is some graphics/X11 issue the X server log /var/log/Xorg.0.log may contain further details.

The Intel i210 Specification Update added an errata (April 28, 2017) on page 30 about an “Internal Clock Malfunction” which could relate to the observed behaviour…

We are of course aware of all errata but as mentioned so far we have not seen this behaviour in any validation & verification tests we have conducted in our temperature chamber across the full rated temperature range.

The bug is fixed in Revision 3. I was not able to figure out the revision of the WGI210-AT

Look at the chip markings as indicated in the errata.

but I would not expect this revision assembled on my SOM as the errata information is quite new.

Correct, even the latest Apalis TK1 production lot still has LotCode 1634.

(The bug does not affect “Copper SKUs”. I am not sure if the SKU on the TK1 SOM is a copper SKU at all…).

Yes, we assemble the WGI210IT which is the industrial temperature copper part.

Other possibly relevant issues:

Errata “25. Slow System Clock” on page 25 manifests as “No link can be established until the next power cycle.”

Errata “18. Failure to Establish PCIe Link After Power Up” on page 23 which manifests as “Failure to establish PCIe link.”

As mentioned above we are of course aware of all errata and have taken resp. measures.

I supposed that you have taken every possible measure w.r.t. the erratas. However I am at a loss with this issue because I have no idea what else could be the root cause for the behaviour.

Again my question U-Boot only or Linux as well. Any log files you could share?

I actually just talked to an Intel representative making sure we really do have all the latest information/tools from them for the i210.

To reproduce the issue unfortunately we only had older FS108 laying around.

Is it reproducible with both managed GS108Ev3 and unmanaged GS108v4? I just want to make sure to order the proper one.

Does it happen regardless of cabling (e.g. length, type)?

Does it depend on the temperature at all?

Does it happen on all modules?

Again my question U-Boot only or Linux as well.

The physical link is not present in u-boot and Linux as well. (As I said before I could not check so far if the link is not present in linux only in cases where it is not present in u-boot before or not.)

Any log files you could share?

I provide log files to you asap.

Is it reproducible with both managed GS108Ev3 and unmanaged GS108v4?

Yes, it is reproducible with:

  • Netgear GS108Ev3 (managed, firmware: v2.00.08)
  • Netgear GS108v4 (unmanaged)

Does it happen regardless of cabling (e.g. length, type)?

It happened with different types of cables and with different lenghts of approx. min. 50 cm to max. 10 m.

Does it depend on the temperature at all?

It happened in the office during development (no special test setup, approx. 20° to max. 30° Celsius).

Does it happen on all modules?

It does happen on several TK1 modules we have currently in use. (All of them should be “Apalis TK1 2GB v1.1A”.)

I reproduced the link establishment issue in u-boot (link led did not indicate established link) and booted afterwards into Angström. The link was established in Angström (see attached serial interface boot log file). I piped journalctl’s last boot output (journalctl -b > journalctl_boot_log.txt) after booting into Angtröm into the attached journalctl boot log file . But the link was not established in Linux in some cases. This means that the issue occurs independently in u-boot as well as in Linux randomly.

I still try to get some logs when the link is not established in Angström…

I do not understand how the issue could relate to graphics/X11?

So far your log files don’t seem to show any useful information related to the issue at all.

One thing which would be crucial for any further investigation is to know whether we are talking about PCIe link issues (e.g. the i210 chip was not even detected) or gigabit Ethernet link issues (e.g. while the i210 is properly detected just its Ethernet link can not be established).

Ok. Do you have a suggestion how to proceed in detail? (We have no hardware to investigate on the “physical signal” level.)

So far I did not get any useful information from you yet at all. Just even a complete boot log file would be helpful and one which would show the actual issue would be even better.

I tried the reproduction of the behaviour in Angström after a cold start but was not able to reproduce it until now. (The procedure is really time consuming.)

I plan to write a script which reboots whenever the SOM is ping-able (physical link present) and waits when the SOM is not ping-able. However I am not sure if the link problem was present after warm-starts at all (or just after cold-starts)…

You may use e.g. Phidgets to simulate cold-starts. However be aware that especially DC switching relays will wear out quickly and you might be better off using SSRs like the ACCES I/O ones.