Issue resolving dgw.torizon.io through a cellular connection with a startup script

Hey all! I have a strange technical issue I would love some assistance with.

The background of what’s happening is this: my boards I’m developing will have up to three different connections: a wi-fi connection, an ethernet connection, and cellular connection. The ethernet connection is most likely just going to be used for a modbus over ethernet connection, the wi-fi connection has the potential to be either connected to the internet or not, and the cellular connection will always have internet access. For this reason, we need to carefully manage outgoing traffic to use the wi-fi connection if its connected to internet, then use the cellular network if not as a fallback.

The primary bit of outgoing traffic for our purposes is traffic to the torizon cloud. Preferably, if the wi-fi connection has internet connectivity, it would use this interface to transmit data to the cloud. Otherwise, it would fall back on the cellular network to transmit this data. To accomplish this, the team made a script on startup to manipulate fluent-bit.

One of the main parts of the script is using a curl command to see if a specific interface has connection. When I use mlan0, my wi-fi interface, this script works and reports data to the cloud. The same is not true if wwan0, my cell-network, is the only interface I have connected. This confuses me a bit as we have had a cellular connection transferring data to the torizon cloud before we did this file output measure. Something I wanted to try was to isolate the curl command to see if that was the issue, and I have narrowed it down to that definitely being the issue.

When running the following command, which is part of the script:

curl --interface wwan0 --cacert /usr/lib/sota/root.crt --key /var/sota/import/pkey.pem --cert /var/sota/import/client.pem --head https://dgw.torizon.io

this gives me a resolve error ONLY when mlan0, my wi-fi connection, is not connected. When I do have mlan0 connected and run the curl command through wwan0, it works just fine. Using the ping command with the -I flag for wwan0 seems to always work for 8.8.8.8, regardless if the mlan0 interface is connected or not.

Does anyone have any knowledge of the next steps I should take to get this to work? Thanks!

EDIT: I feel I was sharing a bit too much about the background, and not enough about the problem itself, so I edited the post to make this clearer. Part of a startup script we have in order to pass outgoing data through different interfaces includes a curl command over several interfaces (mlan0 for wifi and wwan0 for cell-network). We are trying to see if mlan0 would have internet connection before using wwan0 as a fallback with the curl command. The problem we are seeing is that wwan0 only successfully uses the curl command when mlan0 is also connected, despite us using the interface flag for wwan0. These should work independently of each other and I’m unsure of the next steps to take to diagnose this issue.

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Based on the context provided, it seems that the issue is related to DNS resolution when using the cellular interface (wwan0) without the Wi-Fi interface (mlan0) being active. Here are some steps you can take to troubleshoot and potentially resolve the issue:

  1. Check DNS Configuration for wwan0: Ensure that the cellular interface has proper DNS settings. You can check the DNS configuration by looking at the /etc/resolv.conf file or using the nmcli command if NetworkManager is managing your connections. Make sure that the DNS servers are correctly set for the cellular connection.

  2. Test DNS Resolution Independently: Try using the dig or nslookup command to resolve dgw.torizon.io using the cellular interface. This can help determine if the issue is specifically with DNS resolution on that interface.

  3. Review Routing Table: Use the ip route command to check the routing table when only the cellular interface is active. Ensure that there is a default route set for the cellular interface that allows it to handle outgoing traffic properly.

  4. Check for DNS Caching Issues: If there is a DNS caching service running on the system, it might be caching incorrect DNS information. Try flushing the DNS cache or temporarily disabling the caching service to see if it resolves the issue.

  5. Test with a Different DNS Server: Temporarily configure the cellular interface to use a public DNS server like Google’s (8.8.8.8) to see if that resolves the issue.

  6. Community Support: If the above steps do not resolve the issue, consider reaching out to community forums or technical support for Torizon. Other users or support staff may have encountered similar issues and can provide additional insights.

By addressing the DNS configuration and ensuring proper routing, you should be able to resolve the connectivity issue with the cellular interface.

Please, also check if these links can help you solve your problem:

Best Approach to Handle Priority of 3 Network Interfaces.

Greetings @jeatontenney,

That’s some strange behavior you are observing. Let’s see if we can narrow down what might be causing this.

First of all, can you use curl with your wwan0 interface on some URL other than https://dgw.torizon.io. I’m curious if there is also an issue with other URLs or just ours for some reason.

this gives me a resolve error ONLY when mlan0, my wi-fi connection, is not connected.

Could you share the exact log of the resolve error you got from curl. Also can you try running curl with verbose logs to see if maybe there’s more information there to examine.

This confuses me a bit as we have had a cellular connection transferring data to the torizon cloud before we did this file output measure.

Just to clarify. Your cellular connection was sending data to our cloud just fine via fluent-bit even without the wi-fi connection present, did I understand that correctly?

Which would mean it’s only with curl you see this behavior, which would be strange. But let’s see what information we can uncover.

Best Regards,
Jeremias

Hey @jeremias.tx !

Here’s some of the answers to your questions. If i try to curl google via:

 curl -L  --interface "wwan0" http://google.com

a very similar effect happens. If mlan0 (my wi-fi connection) is connected, this works just fine. But if I disconnect my mlan0 connection, this command will stall for a while, then eventually return:

/var/rootdirs/home/torizon# curl -L  --interface "wwan0" http://google.com
curl: (6) Could not resolve host: google.com

and to show the verbose logs for this command and for the command that was in my script:

root@machinepulse-15207084:/var/rootdirs/home/torizon# curl -L -v  --interface "wwan0" http://google.com
* Could not resolve host: google.com
* Closing connection
curl: (6) Could not resolve host: google.com
root@machinepulse-15207084:/var/rootdirs/home/torizon# curl --interface "wwan0" --cacert "/usr/lib/sota/root.crt" --key "/var/sota/import/pkey.pem" --cert "/var/sota/import/client.pem" --head "https://dgw.torizon.io"
curl: (6) Could not resolve host: dgw.torizon.io
root@machinepulse-15207084:/var/rootdirs/home/torizon#

This “Could not resolve host” makes me think this is a dns issue, but I’m a bit unsure of my next steps. Worth noting, while the mlan0 connection is down, if I run the command:

ping -I wwan0 8.8.8.8

packets are sent and returned successfully. So, clearly my cell connection can reach the greater internet.

For your question "Your cellular connection was sending data to our cloud just fine via fluent-bit even without the wi-fi connection present, did I understand that correctly?, yes you are understanding that correctly. Before we went down this route, if the cellular connection was the only connection available, it would still transmit successfully. But like I was saying in the beginning of my first post, we couldn’t control whether or not it would use cellular data if a wifi connection was established, which may or may not be connected to the internet.

Any ideas as to what the next steps might be? Thanks!

Justin.

Hey @jeatontenney,

There are quite a few things to look into here, and i’ll disclaim at the top that having multiple network connections with varying levels of connectivity with hopes of tuning specific traffic to use specific interfaces under specific conditions will quickly push you into the world of linux networking, somewhere between beginner and advanced topics :slight_smile:

I am not a network pro, but i’ll give you my thoughts.

The specific issue you raise in this question strikes me as a DNS issue. Seems like DNS is not working correctly over the cell modem, and i would probably start there.

Apologizes if you’ve already verified this, or run some of these commands, but can you try running some nslookup queries when you only have the cell modem connected?
The simplest:
nslookup app.torizon.io

Specifying a specific DNS:
nslookup app.torizon.io 8.8.8.8

Also listing the dns configuration can sometimes prove helpful with:
systemd-resolve --status
^^^ This could be useful to both with just the modem, but also with the modem AND wifi connections active.

Let’s start there, maybe something jumps out?

Cheers,
ben

Hey all, so here’s an update on how we fixed these problems!

For transparency, I met virtually with many of the team here at toradex to attempt to solve some of these issues.

Firstly, after a fresh install on a new toradex board, I could not get repeatable issues. Between what I thought was two identical boards, I was unable to replicate the cellular network working only when the wifi connection was up. There were a myriad of other issues, one of which included being unable to ping outward on both interfaces after sometime.

Something that was done that did work for us was making a dispatcher script. This script set out cellular ipv4 route metric to 100, wifi to 200, and ethernet to 300. Now, the issue we were discussing from above seemingly should have been a DNS issue, and it definitely seemed that way after trying to resolve it with the toradex team. Why this dispatcher script seemingly solved our issues I truly don’t know, but it’s probably worth some further investigating later. Anyone have any insights as to why this seemingly worked?

Glad to hear you were able to come up with something that works for your setup.

Now, as for why this works exactly. Honestly I couldn’t really tell you either why this works. Probably would take some network guru to explain the nuances of this.

Out of curiosity how did you eventually end up on this idea with the dispatcher script? Did something in your investigation lead you to this, or was it more of a guess and check?

Best Regards,
Jeremias

largely, this was a guess and check because we noticed that setting the route metric was not persistence across a reboot for an ethernet connection specifically.

Glad you got it working!

A theory:

The priority metric helps because the dns service (systemd-resolved) doesn’t have to stay in sync with which network interface to use when changes in those interfaces takes place. The configuration is set early enough (and since it’s a dispatcher script, it could happen before systemd-resolved even starts. Not sure about that though) Systemd-Resolved can now effectively make DNS requests over the cell modem 100% of the time instead of starting with some default interface (likely ethernet starts with the highest priority), and then having to learn the new priorities of the three interfaces as they are brought down, or learning of a change in priority via change in the metric.

I looked a little further into how dns is configured in torizonOS and it actually surprised me. /etc/resolve.conf is how the dns server is communicated to “most” applications running in linux, but it’s a legacy interface and systemd-resolved describes 4 ways to manage this file. I was expecting option 1 or 2.

/ETC/RESOLV.CONF
Four modes of handling /etc/resolv.conf (see resolv.conf(5)) are supported:

   •   systemd-resolved maintains the /run/systemd/resolve/stub-resolv.conf file for compatibility with traditional Linux programs. This file lists the 127.0.0.53 DNS stub (see above) as the only DNS server. It also contains a list of
       search domains that are in use by systemd-resolved. The list of search domains is always kept up-to-date. Note that /run/systemd/resolve/stub-resolv.conf should not be used directly by applications, but only through a symlink
       from /etc/resolv.conf. This file may be symlinked from /etc/resolv.conf in order to connect all local clients that bypass local DNS APIs to systemd-resolved with correct search domains settings. This mode of operation is
       recommended.

   •   A static file /usr/lib/systemd/resolv.conf is provided that lists the 127.0.0.53 DNS stub (see above) as only DNS server. This file may be symlinked from /etc/resolv.conf in order to connect all local clients that bypass local
       DNS APIs to systemd-resolved. This file does not contain any search domains.

   •   systemd-resolved maintains the /run/systemd/resolve/resolv.conf file for compatibility with traditional Linux programs. This file may be symlinked from /etc/resolv.conf and is always kept up-to-date, containing information
       about all known DNS servers. Note the file format's limitations: it does not know a concept of per-interface DNS servers and hence only contains system-wide DNS server definitions. Note that /run/systemd/resolve/resolv.conf
       should not be used directly by applications, but only through a symlink from /etc/resolv.conf. If this mode of operation is used local clients that bypass any local DNS API will also bypass systemd-resolved and will talk
       directly to the known DNS servers.

   •   Alternatively, /etc/resolv.conf may be managed by other packages, in which case systemd-resolved will read it for DNS configuration data. In this mode of operation systemd-resolved is consumer rather than provider of this
       configuration file.

As it turns out, TorizonOS uses option 3! and the limitation described in the man page sounds exactly like what you are hitting:

Note the file format’s limitations: it does not know a concept of per-interface DNS servers and hence only contains system-wide DNS server definitions

Cheers,
Ben

1 Like