I can see that the ethernet connection does not recover by itself but taking the link down and up again does fix it.
verdin-imx8mm-06760593:~$ sudo rtcwake -m mem -d rtc1 -s10
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using rtc1 at Thu Jan 1 04:56:14 1970
verdin-imx8mm-06760593:~$
verdin-imx8mm-06760593:~$ ping google.co.uk
ping: bad address 'google.co.uk'
verdin-imx8mm-06760593:~$ sudo ip link set ethernet0 down
verdin-imx8mm-06760593:~$ sudo ip link set ethernet0 up
verdin-imx8mm-06760593:~$ ping google.co.uk
PING google.co.uk (216.58.206.131): 56 data bytes
64 bytes from 216.58.206.131: seq=0 ttl=117 time=12.257 ms
Maybe there is some script that runs when it comes back up which is not correct? I am not sure where to look for that. @andrecurvello.tx how does it work for you?
I did some digging and it seems this is a known issue in our backlog. I don’t see much more than that however. Let me mention your issue and see if we can get things moving again.
Understood, I’ll see what I can do. This did bring some more attention to the issue and it’s being actively looked at currently. Also just a correction this is actually an issue with the underlying BSP that has been inherited to Torizon. So future information about this will appear here when we publish it: Toradex System/Computer on Modules - Linux BSP Release
I included a work around in my code that seems ok, not well tested but bringing ethernet0 down then up seems to sort it out.
def do_sleep_now(self):
''' Set the system to sleep and bring ethernet back up afterwards '''
self.do_sleep = False
print('Going to sleep for 10 seconds')
time.sleep(1)
response = subprocess.run(['rtcwake', '-m', 'mem', '-d', 'rtc1', '-s10'], stdout=subprocess.PIPE, text=True)
if response.returncode != 0:
print('\nreturncode: {0}, {1}'.format(response.returncode, response.stdout))
# Another process triggers the sleep so we will do our own pause here
time.sleep(2)
print('Wake Up')
# Put the ethernet link down
response = subprocess.run(['ip', 'link', 'set', 'ethernet0', 'down'], stdout=subprocess.PIPE, text=True)
if response.returncode == 0:
print('\nreturncode: {0}, status: {1}'.format(response.returncode, response.stdout))
else:
print('\nreturncode: {0}, error: {1}'.format(response.returncode, response.stderr))
# Bring ethernet back up, CAN also affected but not enabled during idle
response = subprocess.run(['ip', 'link', 'set', 'ethernet0', 'up'], stdout=subprocess.PIPE, text=True)
if response.returncode == 0:
print('\nreturncode: {0}, status: {1}'.format(response.returncode, response.stdout))
else:
print('\nreturncode: {0}, error: {1}'.format(response.returncode, response.stderr))
Reposting to this thread as I have found an odd sleep problem with my setup today. After a number of sleeps the device never comes back up. I will investigate further but we have updated to the latest BSP since I did testing on sleep. Also the ethernet problem is not fixed in the new BSP,I get this message:
As far as we can tell the latest monthly should have the fix incorporated and it was tested to work well. As for these new error messages, there might be some other issue here affecting this.
I believe the last time I saw this was on the Apalis i.MX8. I think it was power related issues that caused the Ethernet PHY to not start up correctly. Which sounds somewhat related to what you’re seeing here.
Can you reliably reproduce these messages? Or is it somewhat random?
Hi @jeremias.tx,
I will put it on my list for today to try recovery from sleep without bringing the interface up and down. I don’t think I should have a power problem, the SOM is always supplied during the sleep cycle.
Ed
Please keep us updated. If you can come up with a method/process that reliably reproduces this issue, it will be a lot easier for us to start debugging/investigating.
Thanks @jeremias.tx, we are still seeing the error message I reported above when sleeping the device, my code looks like:
if do_sleep == True:
print('{0}, Going to sleep for 20 seconds'.format(datetime.datetime.now()))
response = subprocess.run(['rtcwake', '-m', 'mem', '-d', 'rtc1', '-s20'], stdout=subprocess.PIPE, text=True)
if response.returncode != 0:
print('\nreturncode: {0}, {1}'.format(response.returncode, response.stdout))
time.sleep(0.5)
print('{0}, Wake Up for 5 seconds'.format(datetime.datetime.now()))
time.sleep(5)
Are you able to replicate this? The error messages come through STDERR so you need to be connected to the A53 serial port or other place where you can see that.
So far we’ve been unable to reproduce these error messages on our side. Just to confirm you are on 5.2 correct? Can you provide the output of uname -a and cat /etc/issue, just want to make sure the versions here all correct.
Linux verdin-imx8mm-06760593 5.4.91-5.2.0-devel+git.3ae7ec26415b #1-TorizonCore SMP PREEMPT Fri Apr 9 10:59:52 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
TorizonCore 5.2.0-devel-202104+build.11 \n \l
I don’t think there is anything unusual about our setup, I think it matches the Verdin development board on the hardware side. Are you definitely monitoring STDERR and not just STDOUT?
This guy sees the same message as part of his problem, not sure if it is related:
I tested this on Bsp 5.2 (Linux verdin-imx8mm 5.4.91-5.2.0+git.6afb048a71e3) using the following instructions and it is working fine. There are these timeout messages but then eth0 interface is up again.
[ 172.659435] fec 30be0000.ethernet eth0: MDIO write timeout
[ 172.695446] fec 30be0000.ethernet eth0: MDIO write timeout
[ 172.773112] PM: resume devices took 0.716 seconds
[ 172.854780] OOM killer enabled.
[ 172.857927] Restarting tasks ... done.
[ 172.865117] PM: suspend exit
root@verdin-imx8mm:~# [ 175.562621] fec 30be0000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off