Thermal Zone Failed issue in SMARC imx95

We are using the toradex imx95 SOM for the last three months, the BSP source we are using the reference multimedia image.

We have recently updated the kernel version 6.6.119, recently we are continuously throwing the error like,
(Error reading temperature for gpu thermal zone: -16 [ 32.510193] thermal thermal_zone0: failed to read out thermal zone (-16) ), We haven’t change or add any entries in the dts files.

Please help us to resolve this issue and let us know if any further details are needed,

Kernel Version - 6.6.119
Distro - 7.5.0 devel
meta-toradex-nxp - scarthgap

Hi, I’m the Toradex AI Assistant.

I might make mistakes or occasionally get you wrong. Our support team will step in if needed.


Based on the error you’re seeing, this appears to be a thermal zone readout issue with the GPU thermal zone on your iMX95 SoM. The error code -16 typically indicates a device or resource busy error in Linux.

Understanding the Error

The error message indicates that the thermal subsystem is failing to read temperature data from thermal_zone0, which appears to be associated with the GPU thermal zone (1). This is a common issue that can occur when thermal sensors are not properly initialized or when there are conflicts in accessing the thermal hardware.

Thermal Management Context

According to Toradex documentation, thermal management involves both hardware and software components, where the operating system monitors temperature through internal SoC sensors and implements thermal throttling mechanisms to prevent overheating (1).

Troubleshooting Steps

1. Check Thermal Zone Status

First, examine the thermal zones available on your system:

ls /sys/class/thermal/thermal_zone*
cat /sys/class/thermal/thermal_zone*/type

2. Verify Thermal Driver Loading

Check if the thermal drivers are properly loaded:

dmesg | grep -i thermal
lsmod | grep thermal

3. Check Device Tree Configuration

Since you mentioned no changes to DTS files, verify that the thermal configuration in your device tree matches the expected setup for iMX95. The thermal zones should be properly defined with correct sensor references (1).

4. Kernel Configuration Review

For BSP 7.x.y, ensure your kernel configuration includes the necessary thermal drivers. The thermal driver configuration should specify CONFIG_IMX8M_THERMAL or equivalent for iMX95 (1).

Potential Solutions

Option 1: Kernel Rebuild

If you’re building from source, you may need to rebuild the kernel with proper thermal driver support. The build process involves configuring the kernel with the appropriate thermal management options (2).

Option 2: Check BSP Version Compatibility

Since you’re using distro 7.5.0 devel with kernel 6.6.119, ensure that this combination is fully supported and that all necessary patches are applied for iMX95 thermal management.

Option 3: Verify Hardware Initialization

The error might indicate that the GPU thermal sensor hardware is not being properly initialized during boot. This could be related to power management or clock initialization sequences.

Additional Information Needed

To provide more specific guidance, could you share:

  1. Complete dmesg output focusing on thermal and GPU initialization
  2. Contents of /sys/class/thermal/thermal_zone*/temp (if accessible)
  3. Your specific iMX95 module part number and carrier board configuration
  4. Whether this error appeared immediately after the kernel update or developed over time

The thermal management system is critical for preventing hardware damage, so resolving this issue should be prioritized (1).

**


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.

Hi,

Gentle Reminder..!!

Thermal Zone Error Analysis

The error you’re experiencing with thermal zone reading (-16) on your iMX95 SoM is related to the thermal management subsystem failing to access GPU temperature sensors(1).

Understanding the Thermal Management System

According to Toradex documentation, thermal management involves both hardware and software components where the operating system monitors temperature through internal SoC sensors and implements mechanisms like Dynamic Voltage and Frequency Scaling (DVFS) and thermal throttling(1).

For iMX 8/8X/8M Mini/8M Plus systems (which includes iMX95), the power management and temperature monitoring are handled by the System Controller Firmware (SCFW)(1). The thermal driver can be accessed through /sys/class/thermal/thermal_zoneX interfaces(1).

Troubleshooting Steps

1. Check Thermal Zone Configuration

The thermal zones have trip points that can be accessed and modified. For iMX 8M series modules, there are files named trip_point* in the thermal zone directories where you can read the type, current temperature, and hysteresis(1).

2. Verify Kernel Configuration

Since you’re building from source with kernel 6.6.119, ensure your kernel configuration includes the necessary thermal drivers. For iMX 8M series, the configuration should specify CONFIG_IMX8M_THERMAL in defconfig(1).

3. Check Temperature Trip Points

The temperature trip points for thermal zones may be edited by u-boot considering the SoC detected at runtime, or they may be set in the kernel device tree for the NXP “wait for cooling” feature(1).

Potential Solutions

Option 1: Kernel Module Dependencies

When building a custom kernel, you must build and deploy the kernel modules together with the kernel, as the kernel refuses to load modules with a version that doesn’t match its own(2). This could be causing the thermal driver initialization issues.

Option 2: Device Tree Verification

Even though you haven’t changed DTS files, verify that your device tree compilation includes the thermal zone definitions. The compilation process should use the DTC_FLAGS="-@" flag for proper device tree overlay support(2).

Option 3: BSP Component Alignment

Your system includes multiple components: kernel, bootloader, BSP, device trees, and modules(3). Ensure all components are properly aligned for your BSP 7.5.0 devel version.

Immediate Actions

  1. Check if thermal modules are properly loaded: lsmod | grep thermal
  2. Verify thermal zone accessibility: ls /sys/class/thermal/thermal_zone*
  3. Review dmesg for thermal initialization errors during boot
  4. Ensure your kernel modules match your kernel version exactly

The thermal management system is critical for preventing hardware damage through thermal throttling and system shutdown mechanisms(1), so this issue should be resolved promptly.

Best regards, Toradex AI Assistant.


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.