Need help writing a fault-tolerant analog I2C mux device tree node

So here’s the setup:

We have one 3-channel analog I2C mux at a fixed location. Access to each of the three mux channels is controlled by each of three GPIO pins, of which are located on different I/O expanders. Declaring the setup in device tree according to i2c-mux-gpio bindings allow everything to work correctly. The device tree node looks like this:

i2cmux-57226000.i2c {
    compatible = "i2c-mux-gpio";
    #address-cells = <1>;
    #size-cells = <0>;
    mux-gpios = <&pca9674_1 7 GPIO_ACTIVE_LOW &pca9674_2 7 GPIO_ACTIVE_LOW &pca9674_3 7 GPIO_ACTIVE_LOW>;
    i2c-parent = <&i2c0_mipi1>;

    i2c@1 {
        reg = <1>; /* 0 0 1 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
    i2c@2 {
        reg = <2>; /* 0 1 0 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
    i2c@4 {
        reg = <4>; /* 1 0 0 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
};

The problem:

During runtime, the third I/O expander, pca9674_3, which is necessary for the enumeration of the mux may not exist. This situation is not specific to a single setup, and can change from boot to boot depending on circumstances which can only be determined during runtime. We would like to account for this volatility by allowing the mux to work with or without pca9674_3. In the case where pca9674_3 does not exist, we want a 2-channel mux to be enumerated instead of a 3-channel mux.

We attempted to solve this issue by declaring a 2-channel mux node underneath the 3-channel mux node in the device tree. This 2-channel mux node only occupies the required GPIO lines on pca9674_1 and pca9674_2. The resulting device tree looks like this:

i2cmux-threechannel-57226000.i2c {
    compatible = "i2c-mux-gpio";
    #address-cells = <1>;
    #size-cells = <0>;
    mux-gpios = <&pca9674_1 7 GPIO_ACTIVE_LOW &pca9674_2 7 GPIO_ACTIVE_LOW &pca9674_3 7 GPIO_ACTIVE_LOW>;
    i2c-parent = <&i2c0_mipi1>;

    i2c@1 {
        reg = <1>; /* 0 0 1 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
    i2c@2 {
        reg = <2>; /* 0 1 0 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
    i2c@4 {
        reg = <4>; /* 1 0 0 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
  };
  
i2cmux-twochannel-57226000.i2c {
    compatible = "i2c-mux-gpio";
    #address-cells = <1>;
    #size-cells = <0>;
    mux-gpios = <&pca9674_1 7 GPIO_ACTIVE_LOW &pca9674_2 7 GPIO_ACTIVE_LOW>;
    i2c-parent = <&i2c0_mipi1>;

    i2c@1 {
        reg = <1>; /* 0 1 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
    i2c@2 {
        reg = <2>; /* 1 0 */
        #address-cells = <1>;
        #size-cells = <0>;
    };
};

So far this has worked for us. During each boot, if the enumeration of the 3-channel mux fails due to the absent pca9674_3 at runtime, enumeration of the 2-channel mux takes place and the mux is able to continue operating.

However, we have concerns that this hack is highly dependent on the order of declaration of the nodes in the device tree. In our testing, moving the declaration of the 2-channel mux node above the 3-channel mux node causes the enumeration of the 2-channel mux to take place prior to the 3-channel mux, which is not what we want. We have not found any documentation which defines how the order of declaration of nodes in the device tree affects the order of initialization of these devices during runtime.

Posting this thread in hopes to get some input on whether what we are currently doing is a good idea, or should be avoided.

Thanks,
Hatsy Rei

P.S. We have looked into device tree overlays, but our understanding is that we’ll still need to somehow know if the third I/O expander exists beforehand, then set the overlay accordingly. This doesn’t blend well with the unpredictable situation we are forced to work with. Modifying the i2c-mux-gpio driver may work, but unless absolutely necessary, we’d like to look at other options before going there.

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


The approach you are using to handle the fault tolerance of the I2C mux by declaring both a 3-channel and a 2-channel mux in the device tree is indeed a bit unconventional and can lead to unpredictable behavior due to the order of initialization. The device tree does not guarantee the order of initialization based on the order of declaration, which can lead to the issues you are experiencing.

A more robust approach would be to handle the dynamic nature of the I/O expander in the driver code rather than relying on the device tree. Here are a few suggestions:

  1. Custom Driver Code: Modify the I2C mux driver to dynamically check the presence of the third I/O expander (pca9674_3) at runtime. If the expander is not present, the driver can configure the mux as a 2-channel mux instead of a 3-channel mux. This way, you can handle the fault tolerance within the driver itself.

  2. Device Tree Overlays: Use device tree overlays to dynamically modify the device tree at runtime based on the presence of the I/O expander. You can create separate overlays for the 2-channel and 3-channel configurations and apply the appropriate overlay during the boot process based on the detected hardware.

  3. Deferred Probing: Utilize deferred probing in the driver to wait for the I/O expanders to be initialized before configuring the mux. This can help ensure that the driver only attempts to configure the mux once all necessary components are available.

Here is an example of how you might modify the driver to check for the presence of the third I/O expander:

static int i2c_mux_gpio_probe(struct i2c_client *client, const struct i2c_device_id *id)
{
    struct device *dev = &client->dev;
    struct gpio_desc *gpio3;
    int ret;

    // Check for the presence of the third I/O expander
    gpio3 = devm_gpiod_get_optional(dev, "mux-gpio3", GPIOD_OUT_LOW);
    if (IS_ERR(gpio3)) {
        dev_err(dev, "Failed to get mux-gpio3\n");
        return PTR_ERR(gpio3);
    }

    if (gpio3) {
        // Configure as 3-channel mux
        ret = configure_3_channel_mux(client);
    } else {
        // Configure as 2-channel mux
        ret = configure_2_channel_mux(client);
    }

    return ret;
}

In this example, configure_3_channel_mux and configure_2_channel_mux are functions that configure the mux as a 3-channel or 2-channel mux, respectively, based on the presence of the third I/O expander.

By handling the fault tolerance in the driver code, you can ensure a more predictable and reliable behavior regardless of the order of declaration in the device tree.

Please, also check if these links can help you solve your problem:

https://community.nxp.com/t5/i-MX-Processors/i-MX7D-USB-HSIC-Hub-USB3503-enumeration-variability/m-p/675043.

Hey @HatsyRei

How do you know is the 2-channel mux taking place and not the 3-channel one? Is the kernel complaining?

Checking [the kernel docs]: (https://www.kernel.org/doc/Documentation/devicetree/bindings/i2c/i2c-mux-gpio.txt), there is no mention about missing childs, just idle states.

Does enumeration completely fails if you have 2 devices and just the 3-channel mux? Or if you invert the order, it won’t detect the 3rd channel?

Kind regards,
Alvaro.

Hi @alvaro.tx, we’ve been looking at dmesg output to determine which mux gets enumerated and when it happens.

The enumeration of the 3-channel mux does fail completely if one of the I/O expanders are missing. Here’s what happens if only the 3-channel mux is declared in the device tree, and one of the I/O expanders are missing.

root@my-apalis-imx8:~# dmesg | grep gpio
...
[    2.861033] i2c-mux-gpio: probe of i2cmux-threechannel-57226000.i2c failed with error -16

If we declare a 2-channel mux after the 3-channel mux in the device tree, the 2-channel mux gets enumerated after the 3-channel mux fails to enumerate:

root@my-apalis-imx8:~# dmesg | grep gpio
...
[    2.860023] i2c-mux-gpio: probe of i2cmux-threechannel-57226000.i2c failed with error -16
[    2.865227] i2c-mux-gpio i2cmux-twochannel-57226000.i2c: mux-locked i2c mux
[    2.871164] i2c-mux-gpio i2cmux-twochannel-57226000.i2c: 2 port mux on 57226000.i2c adapter

Thanks. And if you invert the order in the DTB (first the 2-channel and then the 3) with the 3-channel mux will it only pick 2 channels, right?

Unfortunately, there is no absolute guarantee that the node higher in the device tree will always load first.

There is no easy solution, only cleaner ones like the 3 suggestions made by @ToradexAI or adding some additional HW switch to select either HW.

Yes, that is the case with the inverted DT node order. Thanks for the replies! I’ll be sure to look into the provided suggestions.

Hi

I wonder how may this happen, unless SCK/SDA pull up resistors are missing along with missing I/O expander on some channel, which could lead to endless bus busy on given channel and perhaps lead to expander probe failure?

My apologies if that was confusing. I think “not connected” would be a more accurate description compared to “missing”. Basically, our setup involves the third I/O expander living on a modular component which may be connected to / disconnected from the core component which holds the Apalis.

@alvaro.tx When it comes to device tree overlays, I suppose the order of declaration in TEZI_EXTERNAL_KERNEL_DEVICETREE_BOOT also does not matter? E.g.,

TEZI_EXTERNAL_KERNEL_DEVICETREE_BOOT = " overlay1.dtbo overlay2.dtbo "

does not guarantee that overlay1 is applied prior to overlay2?

@HatsyRei in this case the order it is guaranteed as per overlays.txt since we defined it in the Uboot arguments, but depending if you are using FIT and non-FIT, the loading process changes slightly.

From the U-boot documentation, overlays are applied in order when using FIT images.

For non-FIT images, we load the DTBOs one by one.

Alvaro.

1 Like

You didn’t answer regarding pull up resistors. When you disconnect one I2C branch, pull up resistors should be still present on that branch on your Apalis carrier board, so that when Apalis selects broken branch, both SDA and SCK on Apalis side are high. I think you miss those pull ups, which perhaps leads to constant busy state when that I2C branch is disconnected, which perhaps leads to switch probe failure. Try fixing it hardware wise.

Hm, I am fairly certain that the pull up resistors exist (as they are on the Apalis carrier instead of the modular component) and the I2C bus is not stuck (as probe of other I2C devices on the same bus works just fine). Nonetheless, I’ll be sure to forward this note regarding pull ups to the HW team. Thanks.

With mux you actually have several buses, not the same single one. Mux just makes one of those muxable buses connected to your Apalis bus. Each bus should be pulled up properly. At least you should verify what happens to your Apalis bus signals when empty/broken bus is selected.