Devicetree multi-sensor thermal zone not working

I’m attempting to implement a thermal management scheme for an imx8mp device which takes readings from multiple sensors to control a cooling device.

I’m using the reference document here as my example:
https://git.toradex.com/cgit/linux-toradex.git/tree/Documentation/devicetree/bindings/thermal/thermal.txt?h=toradex_5.4.y

However, I cannot get it to work as expected. Only the first thermal-sensor affects the zone temperature.
My dts specification looks like this, though I’ve tried various different values for coefficients with no change.

	thermal-zones {
		system-thermal {
			polling-delay-passive = <250>;
			polling-delay = <2000>;
			thermal-sensors = <&at30tse_temp>, <&tboard_thermistor>;
			coefficients = <500 500>;
			trips {
				sys_fan0: trip0 {
					temperature = <40000>;
					hysteresis = <2000>;
					type = "active";
				};
			};

			cooling-maps {
				map0 {
					trip = <&sys_fan0>;
					cooling-device = <&fan0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
				};
			};
		};
	};

I can observe the problem by looking at /sys/class/thermal/thermal_zone2/temp which mirrors the at30 value exactly (as reported in /sys/class/hwmon/hwmon1/temp1_input, and the fan does nothing even if the board temperature reading exceeds 40C. If I swap the order of the thermal sensors then only the board temperature is taken into account and the at30 value is ignored.

Any idea why this isn’t working as expected/documented?

Greetings @bw908,

Well your device tree snippet here looks more or less in line with what the in-kernel documentation recommends. That said this comment:

If I swap the order of the thermal sensors then only the board temperature is taken into account and the at30 value is ignored.

Is really throwing me off, since the order shouldn’t matter especially since you’re using the same coefficients for both sensors. So the calculated relationship between these sensors should be the same no matter the order in that case. This observation is strange and doesn’t seem consistent with how this is suppose to work.

Another thought I have is that according to your observations no matter the order of thermal-sensors it sounds like the at30 value is never taken into account. Is that correct? Perhaps this could indicate an issue with this specific sensor or how it’s being configured/declared.

All that said, it’s not very obvious to me what is wrong or where to look first. Though I do have some thoughts/suggestions.

If you only have one or the other temperature sensor listed in thermal-sensors is the behavior fine in that case? Also how are you calculating your coefficients?

Best Regards,
Jeremias

Thanks for the reply!

If you only have one or the other temperature sensor listed in thermal-sensors is the behavior fine in that case?

Yes, if I only have a single sensor the fan control works as expected.

Also how are you calculating your coefficients?

Right now I’m just experimenting and trying to get a simple 1:1 relationship to work. I’ve tried things like <1 1> ,<10 10>, <1, 1, 0> etc without any change.

Another thought I have is that according to your observations no matter the order of thermal-sensors it sounds like the at30 value is never taken into account. Is that correct?

Not in that I can tell, it seems to very much just be only the first sensor in the list that is having the effect as though the coefficients are <1 0> . I actually have a third “sensor” I can play with which is actually a dummy sensor to which I can manually write temperature values. If I add this sensor in the mix (and change its values) then it becomes readily apparent the temp reading of the thermal zone tracks purely the first one in the list as well. For example, if my fake_thermal sensor is first, I will see the exact value that has been written to it last. If the at30tse (lm75 equivalent) is first, then the thermal zone’s temp reports values identical to it only.

I started to get suspicious regarding the behavior you’re describing here, so I checked the driver that controls the thermal management in Linux. My findings are a bit surprising. First I checked the part of the driver where it parses thermal-sensors from the device tree, which is here: of-thermal.c « thermal « drivers - linux-toradex.git - Linux kernel for Apalis, Colibri and Verdin modules

As you can see from the comment:

/* For now, thermal framework supports only 1 sensor per zone */

It seems that this is a lie that a thermal zones can have more than 1 sensor assigned. Furthermore if you check in the driver how it handles coefficients: of-thermal.c « thermal « drivers - linux-toradex.git - Linux kernel for Apalis, Colibri and Verdin modules

There’s another comment that further backs this:

/*
 * REVIST: for now, the thermal framework supports only
 * one sensor per thermal zone. Thus, we are considering
 * only the first two values as slope and offset.
*/

A quick look at the code surrounding the comments seems to show that this comment is accurate. Now I’m not sure why the documentation states that multiple sensors are possible when the driver source code itself contradicts this.

Finally if you check the master branch of Linux and check this driver there: https://github.com/torvalds/linux/blob/master/drivers/thermal/thermal_of.c#L257

It still seems like it’s not capable of handling more than 1 sensor per thermal zone.

This would explain your observations and why the order here matters and why it seems like only the first sensor listed has any effect.

Best Regards,
Jeremias

Well that’s unfortunate and also surprising that the devicetree documentation was approved/merged to the mainline kernel repo while being completely wrong.

Fortunately it’s easy to make a simple systemd service/bash script to read various thermal sensors from SysFS and manage cooling.

I found it quite surprising as well to be honest. Though I suppose no documentation is immune to being wrong.

That said, unfortunately you will have to work around this then as you said.

Best Regards,
Jeremias