U-Boot Driver for Ethernet Controller

Thanks a lot for that hint.

I wrote the script and let it run over the weekend for the warm-start scenario: Within 70 hours the issue was not reproducible.

I am blocked right now because the DC power switch has not arrived yet and which I need for the cold-start scenario. I let you know about any status changes…

FYI: The script to warm-start/reboot the SOM until it is not ping able

I reproduced the behaviour in Linux and attached the log file of /var/log/kern.log and the log file of /var/log/dmesg and the output of for sudo ethtool eth0. The good case (physical link is present) output is attached here.

If one tries to establish a link in u-boot

setenv autoload false; if env exists ethaddr; then; else setenv ethaddr 00:14:2d:00:00:00; fi; pci enum; dhcp; run setethupdate;

and if it is not established

e1000: no NVM
e1000: e1000#0: ERROR: Valid Link not detected: -8

executing reset multiple times and trying to establish the link again each time does always fail then… A poweroff/on cycle is required to get a link again.

If the link is not established in u-boot the pci devices are listed as usual:

Apalis TK1 # pci
Scanning PCI devices on bus 0
BusDevFun  VendorId   DeviceId   Device Class       Sub-Class
_____________________________________________________________
00.02.00   0x10de     0x0e13     Bridge device           0x04

Apalis TK1 # pci header 00.02.00
  vendor ID =                   0x10de
  device ID =                   0x0e13
  command register ID =         0x0007
  status register =             0x0010
  revision ID =                 0xa1
  class code =                  0x06 (Bridge device)
  sub class code =              0x04
  programming interface =       0x00
  cache line =                  0x08
  latency time =                0x00
  header type =                 0x01
  BIST =                        0x00
  base address 0 =              0x00000000
  base address 1 =              0x00000000
  primary bus number =          0x00
  secondary bus number =        0x01
  subordinate bus number =      0x01
  secondary latency timer =     0x00
  IO base =                     0x11
  IO limit =                    0x11
  secondary status =            0x0000
  memory base =                 0x1300
  memory limit =                0x1300
  prefetch memory base =        0x2001
  prefetch memory limit =       0x1ff1
  prefetch memory base upper =  0x00000000
  prefetch memory limit upper = 0x00000000
  IO base upper 16 bits =       0x0000
  IO limit upper 16 bits =      0x0000
  expansion ROM base address =  0x00000000
  interrupt line =              0x00
  interrupt pin =               0x01
  bridge control =              0x0000

But within linux in more rare cases a colleague missed the pci device…

“Energy Efficient Ethernet” (EEE) is mentioned as another possible root cause for the observed behaviour and is enabled in Angström per default:

root@apalis-tk1:~# ethtool --show-eee enp1s0                                                                                                                                                                                                                   
EEE Settings for enp1s0:                                                                                                                                                                                                                                       
        EEE status: enabled - active                                                                                                                                                                                                                           
        Tx LPI: 0 (us)                                                                                                                                                                                                                                         
        Supported EEE link modes:  100baseT/Full                                                                                                                                                                                                               
                                   1000baseT/Full                                                                                                                                                                                                              
        Advertised EEE link modes:  100baseT/Full                                                                                                                                                                                                              
                                    1000baseT/Full                                                                                                                                                                                                             
        Link partner advertised EEE link modes:  100baseT/Full                                                                                                                                                                                                 
                                                 1000baseT/Full 

I disabled EEE with ethtool --set-eee enp1s0 eee off termporarily. I connected/reconnected the ethernet cable to from/to the switch over and over again and was not able to reproduce the missing link (the behaviour observed of my colleague).

Unfortunately I am not able to test the same in the warm-start and cold-start scenarios because the EEE configuration is enabled again after every reset or power cycle.

Do you know how to disable EEE persistently over reboots?

I open another question because that’s a different topic…

I received the Phidgetes “Digital Output” to control the DC power supply yesterday. I wrote a test script and was able to reproduce the issue with the Apalis TK1 Linux BSP v2.7b3 for the cold-start scenario as well (after 2 1/2 hours cyclic power cycles every approx. 45 seconds). I will be able to reproduce that again. Please let me know what types of log files will be valuable for you. I can provide them to you then.

I will run the test script over the weekend with our image (which has EEE disabled). Hopefully the issue is not reproducible with that change anymore…

In an approx. 60 hour run with EEE disabled during kernel boot in our image over the weekend the issue did not occur again. I can run the same script with your image v2.7b3 with EEE disabled in u-boot as well over night till tomorrow morning (approx. 12 hours).

I create new question specific to the user space link establishment issue.

It turns out that the current PCIe reset implementation in the PCIe board init function is not quite working reliably due to PCIe reset timing violations. Fix this by overriding the tegra_pcie_board_port_reset() function.

Please find resp. patches on our U-Boot -next branch.

Great. Thanks a lot for the patch. Does this patch fix issue on the linux (kernel/user) level as well? (related forum question)

I guess that depends. Most possibly yes should one already bring up the link in U-Boot. However a regular boot won’t do that. I’m actually working on an improved solution for Linux as well and will update resp. thread shortly.

BTW: Please note that my -next stuff already went through multiple iterations with the latest one dating back to yesterday evening.

Ok. We will figure it out either way when we run the tests again.

I’m in the final stages of testing and will commit the Linux kernel part soon as well.

Great. Thanks.

What do you mean with “Also allow optionally bringing up the PCIe switch as found on the Apalis Evaluation board. Note however that the Apalis PCIe port is also left disabled in the device tree by default.” in the commit message of the bugfix in u-boot exactly?

It means exactly that. One may optionally bring up the PCIe switch also in U-Boot if desired/required or whatever. But regular booting does not require any of that and in fact regular booting actually does not touch PCIe at all. Basically unless one explicitly does pci enum PCIe won’t be touched.