Best Approach to Handle Priority of 3 Network Interfaces

Our application involves the use of up to 3 network interfaces: ethernet, wifi, and cellular. We use the ethernet primarily for Modbus TCP communication via a Python script, the wifi for accessing a web server on the board, and the cellular connection for transmitting data to Torizon Cloud and another cloud provider. Recently we ran into some issues caused by the 3 different use cases not selecting the appropriate interface.

The first issue we saw was that fluent-bit was trying to send data out via the local ethernet, however the ethernet connection has no outward internet connection as its only a local network.

Is there a good method to have fluent-bit try different interfaces if it is not able to connect to the Torizon Cloud server through one interface or another?

How can I ensure that my Modbus TCP communication only attempts to use ethernet and then if no connection is made over ethernet tries the wifi, but never cell?

Based on what I know right now, there is no easy way to handle this within Linux. Basic metric priority does not know if a connection to the internet is made via a specific interface which is what is causing my issues predominantly right now.

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


To address the issues with network interface prioritization in your application, you can implement the following solutions:

  1. Dynamic Interface Selection for Fluent-bit:

    • Implement a failover mechanism for fluent-bit by creating a custom script that tests connectivity to the Torizon Cloud server through each network interface in a prioritized order (ethernet, then wifi, then cellular). The script should automatically switch to the next available interface if the primary one fails, ensuring data is sent through the most appropriate connection.
  2. Modbus TCP Communication Logic:

    • In your Python script for Modbus TCP communication, implement a connection check that first attempts to establish a connection over the ethernet interface. If the connection fails, the script should then attempt to connect over the wifi interface. Ensure that the cellular interface is excluded from this logic to prevent any attempts to connect via cellular. Use exception handling to manage connection errors and retry with the next interface as needed.
  3. Addressing Linux Limitations:

    • Since basic metric priority in Linux does not account for internet connectivity status on specific interfaces, consider implementing a custom routing table that prioritizes the ethernet interface for Modbus TCP communication, followed by wifi, and excludes cellular. Additionally, use tools or scripts that can dynamically adjust the routing based on connectivity status to improve the reliability of data transmission across the different interfaces.

By implementing these strategies, you can effectively manage the priorities of your network interfaces and ensure that your application uses the most suitable interface for each specific task, thus avoiding connectivity issues.

Please, also check if these links can help you solve your problem:

Networking with Torizon OS | Toradex Developer Center.

Greetings @dleberfinger,

Regarding your first point.

Is there a good method to have fluent-bit try different interfaces if it is not able to connect to the Torizon Cloud server through one interface or another?

I see there was a similar question on the Fluent-bit Github repo: [question] control output availability · Issue #2340 · fluent/fluent-bit · GitHub

They referred to the option net.source_address documented here: https://docs.fluentbit.io/manual/dev-2.2/administration/networking#configuration-options

With this option you should be able to specify via address which network interface Fluent-bit should use for its data traffic. Now from my understanding this will tell Fluent-bit to use the specified network address. But as discussed on Github, there doesn’t seem to be a way to tell Fluent-bit to then try another address/interface if the configured address/interface has no connection.

In your case will the cellular connection always be the network interface that provides the connection to Torizon Cloud? Or can this change depending on the situation?

Moving onto your other point.

How can I ensure that my Modbus TCP communication only attempts to use ethernet and then if no connection is made over ethernet tries the wifi, but never cell?

Now I will admit I’m not very knowledgeable when it comes to Modbus as a topic. Let me try to understand what’s going on in your setup.

Right now which network interface is your Modbus communication script using by default? Is this something you specify in the script or is Modbus just automatically choosing a network interface?

Best Regards,
Jeremias

I’m not very familiar with this topic, but perhaps your problem could be resolved using Network Namespaces.

1 Like

This is kind of a basic outline of the system.

We won’t always have the cell modem installed, we might not always have an outward internet connection.

For now I think the Modbus is kind of a non-issue. We haven’t had any problems with that yet.

Our current solution is to have the network interface metrics set such that they are favored in the order below:
Cell: 1
Ethernet: 2
Wifi: 3

Since our ethernet is typically a local network with static IPs, we set our gateway to 0.0.0.0 and Linux automatically ignores this unless a specific IP on the network is used.

If the cell modem is not installed, all fluent-bit information goes over the wifi. The webserver is hosted on all available interfaces, so this does not seem to cause any issues when the cell modem is install or not.

I think at this point everything is working as intended, however the issues that we have been seeing have reduced my confidence in our network setup being as robust as I would like. I would have expected Toradex to have had some kind of information for using multiple network interfaces given the ability to have up to 4 on a board such as the Ivy which has 2 ethernet networks and in our case 3 on the Mallow board.

I am going to leave the setup as is for the time being, but intend to investigate the network name spaces in more detail to see what we can do to feel more confident.

Howdy,

Depending on if you can define your network settings via systemd then you could add an alias and use that to refer to each device.
Or configure it to use deterministic naming.

You could alternatively read the hardware path from the device tree under /sys/…/net/ which will list the actual device names.

Just a few options I have use on our imx8mp SoMs.

If that is at all helpful.

In addition to the suggestions by everyone else. I did more digging into Fluent-bit. I saw there was a request for “backup” output options: Support Backup Outputs · Issue #1632 · fluent/fluent-bit · GitHub

Sounds like the idea would be to have configurable secondary outputs configured that are only used in the case that the primary output is unable to send your data. This sounds like it would be beneficial for the use-case you just described.

Now this isn’t implemented yet in Fluent-bit. Though, I see this is marked as a potential goal for Fluent-bit: Fluent Bit v4.0.0 Milestone · GitHub

I guess this is dependent on your own timeline and whether the Fluent-bit developers actually follow-through with this milestone plan.

Another possible idea is to create a proxy/middle-man that takes in the data from Fluent-bit instead of sending it to our Cloud right away. The idea would is that data goes roughly like so:

Fluent-bit → middle-man → Torizon Cloud

The “middle-man” in this case could be some script/application or whatever that can determine through which of your network interfaces to send the data through to our Cloud. Now this is more of a thought I just had, so it probably need further refining. But, just thought I would share it in case it helps.

Best Regards,
Jeremias

I’m going down the path of using a script to redirect the fluent-bit data over a specific network interface. My script is successfully identifying which interface has a connection to the Torizon Cloud and when I send my data I am getting a 204 status code with no additional response. So it seems the server is accepting the data, but it is not appearing in the device metrics on the cloud.

Here is an example of what I am currently sending.

Sending data: {"date": 1730732723.436212928, "data": {"temperature":{"name":"thermal_zone0","type":"main0-thermal","temp":63.742}}}
HTTP Status Code: 204

What format should the data be in for Torizon Cloud to properly process the data?

What format should the data be in for Torizon Cloud to properly process the data?

This is a snippet of the output fluent-bit produces and sends to our Cloud:

[
  {
    "date": 1730762340.018996,
    "memory": {
      "Mem.total": 3793532,
      "Mem.used": 1226308,
      "Mem.free": 2567224,
      "Swap.total": 0,
      "Swap.used": 0,
      "Swap.free": 0
    }
  }
]
[
  {
    "date": 1730762340.031558,
    "docker": {
      "alive": true,
      "proc_name": "dockerd",
      "pid": 886
    }
  }
]
[
  {
    "date": 1730762340.019724,
    "temperature": {
      "name": "thermal_zone0",
      "type": "cpu-thermal0",
      "temp": 67.90000000000001
    }
  }
]
[
  {
    "date": 1730762340.073653,
    "custom": {
      "emmc_life_time_est_typ_a": "10",
      "emmc_pre_eol_info": "1"
    }
  }
]

Your json data looks close, but one thing I notice is that you have your data nested under a field named data:

{
  "date": 1730732723.436212928,
  "data": {
    "temperature": {
      "name": "thermal_zone0",
      "type": "main0-thermal",
      "temp": 63.742
    }
  }
}

Our Cloud only processes data that is nested under some predefined names (cpu, memory, temperature, and docker) for our default metrics, or it also accepts anything nested under the name of “custom”. This is so users can add whatever arbitrary metrics they want. If the data is nested under any other name I think the API will still accept it, but then it just gets ignored in the backend. That might explain why you still get a 204 response.

In any case, try nesting your data under “custom” and see if that helps.

Best Regards,
Jeremias

Thank you for the help everyone. The situation should be resolved at this point.

Perfect, glad to hear that this topic has been resolved for you.

Best Regards,
Jeremias