The physical link is not established reliably in a random manner when the TK1 SOM is powered on with the Linux LXDE Image running on it. The physical link led of the switch the SOM is connected to is not lighting up (indication for the proper establishment of a physical link). The behaviour occurs randomly.
I wrote a test script which pings the SOM every minute and switches a power supply via a Phidget DigitalOutput off and on whenever the board is pingable. If the board is not pingable the script is halted in endless loop and the SOM is kept powered on to enable its investigation over the serial interface.
The journalctl log file indicates that the link is down due to PCIE :
Jun 18 22:24:49 apalis-tk1 kernel: PCIE.C: tegra_pcie_enable_regulators : regulator hvdd_pex
Jun 18 22:24:49 apalis-tk1 kernel: PCIE.C: tegra_pcie_enable_regulators : regulator pexio
Jun 18 22:24:49 apalis-tk1 kernel: PCIE.C: tegra_pcie_enable_regulators : regulator avdd_plle
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, ignoring
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, ignoring
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: no ports detected
root@apalis-tk1:~# ethtool --show-eee enp1s0
EEE Settings for enp1s0:
EEE status: disabled
Tx LPI: disabled
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: Not reported
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full
I ran the test script again. Unfortunately the test script halted after approx. 10 minutes with a missing link (see journalctl log file with EEE disabled).
Unfortunately not. I reproduced the issue again and attached another full journalctl log file (EEE disabled). (It needs to be considered that the time stamps are not set before the time synchronization correctly.) The log file contains the same log messages for PCIE as in the other log file:
cat journalctl_EEE_disabled_again_full.log | grep PCI
Jun 18 22:24:49 apalis-tk1 kernel: PCIE.C: tegra_pcie_enable_regulators : regulator hvdd_pex
Jun 18 22:24:49 apalis-tk1 kernel: PCIE.C: tegra_pcie_enable_regulators : regulator pexio
Jun 18 22:24:49 apalis-tk1 kernel: PCIE.C: tegra_pcie_enable_regulators : regulator avdd_plle
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 0: link down, ignoring
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, retrying
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: port 1: link down, ignoring
Jun 18 22:24:49 apalis-tk1 kernel: PCIE: no ports detected
Jun 18 22:24:49 apalis-tk1 kernel: ehci-pci: EHCI PCI platform driver
I ran the test script over the long weekend (from last Friday on). Unfortunately the test script halted again.
But as the device is kept powered on after script halt I was able to check the physical link led on the switch which is turned on like it should be. lspci lists the ethernet controller Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) as it should be. And the device is pingable with ping as it should be. That means that either (a) some network issue over the weekend, (b) a too short time delay between power cycles in the test script or (c) we missed to consider your refinement of the patches is likely as root cause for the script halting. I think more (a) is more probable.
I check if we considered your patch refinement and I increase the power cycle time delay in the test script and let the test script run again until tomorrow.
OK, thanks for letting us know. We actually also run a collection of modules during Assumption of Mary in our temperature chamber over the full temperature range (e.g. -25 to 85 deg C). However first just with stock 2.7b3 to have a baseline. I am about to update them all with my fixes and we will see…
As mentioned in your other thread we did successfully test the U-Boot fix and therefore validated the hardware to be fine. Further validation & verification running the full BSP will be conducted over time also with the upcoming V1.2A hardware revision.
What of the available power supply alternatives shall I use to prevent from an power supply impact onto the test Apalis Eval Board data sheet, p. 11? The power supply alternative must support continuous power supply, see this related question.
I believe @diego.tx answered that one already, right? Please note that we are just wrapping up the Q3 2.7b4 release and should have it available shortly.