Neutron NPU Internal Fault (Code 442) on Verdin i.MX95 with TensorFlow Lite INT8 Model

I would like to report a major issue affecting the use of the Neutron NPU on a Verdin i.MX95 module when running TensorFlow Lite inference.
The model being used is a fully-quantized INT8 MobileNetV2 (full INT8, per-channel), fully compliant with Neutron delegate requirements. The model loads correctly through libneutron_delegate.so, and the delegate reports that it successfully offloads 66 operators out of 69, confirming that quantization, graph structure, and operator support are correct.

However, despite the delegate initialization appearing normal, the very first call to invoke() systematically triggers an internal NPU driver fault. The logs show:

fail to create neutron inference job – internal fault code=442
Node number 69 (NeutronDelegate) failed to invoke.

A few milliseconds later, the Python process terminates with a segmentation fault. This prevents any graceful fallback to CPU execution. The crash clearly occurs inside the native Neutron delegate or driver layer, not in the Python code, which makes the error impossible to handle at the application level.

It is important to highlight that the exact same model runs flawlessly on CPU with XNNPACK—both in float32 and INT8—without any memory or execution issues. The crash only happens when the Neutron delegate is enabled. The problem is 100% reproducible, even with a minimal script performing a single invoke() call.

All possible model-side checks (quantization parameters, scale/zero-point ranges, operator support, input size 224×224×3, graph integrity, absence of unsupported ops) have been verified and confirmed correct. The behavior strongly suggests an internal fault in the Neutron driver or delegate implementation rather than a TFLite or model conversion issue. At this point, it is not possible to use the NPU reliably, and the application must be forced into CPU-only mode to ensure stability.

I would like to know whether Neutron driver error code 442 is a known issue, whether a fix is planned for the Neutron delegate or NPU kernel driver, and whether an updated BSP, firmware, or version of libneutron_delegate.so is available. I can provide the .tflite model, full logs, a minimal reproducible script, or kernel traces to assist with debugging.

Thank you in advance for your help.

Hi, I’m the Toradex AI Assistant.

I might make mistakes or occasionally get you wrong. Our support team will step in if needed.


Neutron NPU Driver Error Code 442 Analysis

Thank you for the detailed report regarding the Neutron NPU driver fault on your Verdin i.MX95 module. Based on the information available, this appears to be related to known issues with the Neutron delegate implementation.

Known Issues with Neutron NPU

The community has reported similar issues with the Neutron NPU delegate (1). Specifically, there have been reports of internal errors during model conversion and execution, particularly affecting fully connected layers and complex classifier models like InceptionV3, ResNet50, and MobileNetV3Small (1).

Neutron Delegate Implementation Details

The Neutron delegate is designed to capture operators and aggregate them as neutron graph nodes for offloading to the Neutron-S NPU. For offline compilation, models should be converted through the eIQ toolkit first, where the neutronGraph node is already generated (2). The neutron delegate then captures the neutronGraph node and offloads work to Neutron-S.

Troubleshooting Steps

Based on similar community reports (3), here are some steps to investigate:

  1. Verify Library Installation: Ensure that libovxlib and related NPU libraries are properly installed in your container environment (3).

  2. Check Delegate Loading: Confirm that the delegate library path /usr/lib/libneutron_delegate.so is accessible and properly linked (3).

  3. Model Conversion Verification: Since you mentioned the model is fully quantized INT8, verify that it was properly converted using the eIQ Toolkit’s neutron converter, as offline compilation is required for optimal NPU execution.

Environment Configuration

For NPU execution, ensure your environment includes the proper configuration (3):

USE_HW_ACCELERATED_INFERENCE=1
USE_GPU_INFERENCE=0
ACCEPT_FSL_EULA=1

Current Status and Recommendations

The error code 442 you’re encountering appears to be an internal driver fault that has been observed by other users in the community (1). While the exact root cause and timeline for fixes aren’t specified in the available documentation, the issue seems to be related to the Neutron delegate’s interaction with specific model architectures.

Next Steps

  1. Model Conversion: Try re-converting your MobileNetV2 model using the latest eIQ Toolkit to ensure proper neutronGraph generation
  2. BSP Version: Verify you’re using the latest BSP version that includes Neutron software stack updates (2)
  3. Fallback Strategy: Continue using CPU execution with XNNPACK as your stable fallback until the NPU driver issues are resolved

The community has noted that while single fully connected layers may work, complex classifiers often encounter conversion and execution issues (1). Your experience with the systematic invoke() failure aligns with these reported patterns.

Best regards, Toradex AI Assistant.


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.

Hi @LinuxMan,

Have you already converted the model with EIQ Neutron SDK for NPU delegate?

Hello, rudhi,
Not yet, but,

Following my discussion with NXP, I can confirm that the Neutron Converter is indeed available in the NXP eIQ Toolkit environment as a standalone binary.

It is located in the following directory:

C:\nxp\eIQ_Toolkit_v1.xx.xx\bin\neutron-converter\MCU_SDK_xx.xx.xx+Linux_x.x.xx_x.x

It can be used directly from the command line, for example:

neutron-converter.exe --input <model_quant_int8.tflite> --output <model_neutron.tflite> --target imx95

This confirms that the full Neutron conversion toolchain is available in eIQ and can be used to generate NPU-optimized models for i.MX95.

Hello @LinuxMan,

I think the conversion with neutron converter is what younare missing. You need to do that, then copy the converted model into the EVK, and run it with (an example):

./label_image -m mobilenet_v1_1.0_224_quant_neutron.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libneutron_delegate.so

i rudhi,
I have made progress on this issue with the help of NXP.
I confirm, model on Eiq use classification Model Mobilenet_v2 ( input Size 224,224,3)

My environment

  • Hardware: Verdin iMX95
  • BSP: Toradex Linux
  • Kernel:
Linux 6.6.94-7.4.0-devel
  • Neutron driver: built into kernel
  • Neutron delegate: libneutron_delegate.so

Confirmed with NXP

NXP confirmed to me:

  • neutron-converter is not required for Linux 6.6
  • It is only mandatory starting from Linux 6.12
  • Running a standard .tflite model with libneutron delegate should work on i.MX95
  • They successfully run inference on their i.MX95 EVK
  • But if i use neutron converter (C:\nxp\eIQ_Toolkit_v1.17.0\bin\neutron-converter\Linux_6.6.36_2.1.0) i have the same problem ( error 442)
    I successfully exported my model in INT8 PTQ from eIQ Portal and ran Neutron converter using:
neutron-converter.exe --input <model>.tflite --target imx95

The conversion completes, but the report shows:

Number of operators imported      = 65
Number of operators optimized     = 90
Number of operators converted     = 81
Number of operators NOT converted = 9
Neutron graphs                    = 4
Operator conversion ratio         = 0.9

So 9 operators are not converted by Neutron.


What is working on my side

  • Standard TFLite model runs correctly on CPU
  • Neutron delegate loads correctly
  • TFLite tensors are correct (INT8)
  • Graph is delegated successfully:
NeutronDelegate delegate: 62 nodes delegated out of 65

Issue on Verdin iMX95

When inference is executed on NPU:

fail to create neutron inference job
internal fault 442
Node number XX (NeutronDelegate) failed to invoke

This happens:

  • after delegate loading,
  • after graph partition,
  • at first inference execution.

Fallback to CPU works immediately.


Since NXP confirms that Neutron works on i.MX95 EVK without neutron-converter on Linux 6.6, this seems specific to the Toradex BSP integration.

Could you please advise:

  1. Is Neutron officially validated on Verdin iMX95 with Linux 6.6?
  2. Which Neutron version / firmware / BSP combination is supported?
  3. Is there any known issue with Neutron runtime on Toradex BSP?
  4. Is a microcode / firmware update required?

If i use converted neutron tflite or not converted, there is the same error 442.
log error :slight_smile:

[BACKEND] AUTO mode: NPU if available, fallback to CPU on error
[NPU] Attempting to load Neutron delegate

**[NPU] :check_mark: Neutron delegate loaded successfully.**
**INFO: NeutronDelegate delegate: 62 nodes delegated out of 65 nodes with 2 partitions.**

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[TFLITE] Backend actif : NPU
[TFLITE][Tensor] input0: idx=0 shape=[ 1 224 224 3] dtype=<class ‘numpy.int8’> quant_scale=0.007843137718737125 quant_zp=-1
[TFLITE][Tensor] output0: idx=172 shape=[ 1 16] dtype=<class ‘numpy.int8’> quant_scale=0.00390625 quant_zp=-128
[TFLITE] Ops délégués : 0 / 68

Track 0
**[TFLITE][Invoke:NPU] shape=(1, 224, 224, 3) dtype=int8**
**fail to create neutron inference job**
**Error: component=‘Neutron Driver’, category=‘internal fault’, code=442**

** [TFLITE ERROR] backend=NPU, exception=/usr/src/debug/tensorflow-lite-neutron-delegate/2.16.2/neutron_delegate.cc:261 neutronRC != ENONE (113203 != 0)Node number 65 (NeutronDelegate) failed to invoke.**
→ Fallback CPU : rĂ©initialisation de l’interprĂ©teur sans delegate.
[TFLITE] Backend actif : CPU
[TFLITE][Tensor] input0: idx=0 shape=[ 1 224 224 3] dtype=<class ‘numpy.int8’> quant_scale=0.007843137718737125 quant_zp=-1
[TFLITE][Tensor] output0: idx=172 shape=[ 1 16] dtype=<class ‘numpy.int8’> quant_scale=0.00390625 quant_zp=-128
[TFLITE] Ops délégués : 0 / 66
[TFLITE][Invoke:CPU] shape=(1, 224, 224, 3) dtype=int8

other,
Even the official Toradex TensorFlow Lite example using
mobilenet_v1_1.0_224_quant.tflite fails on our i.MX95 board with :

  • Neutron internal fault 442
  • fail to create neutron inference job

The delegate loads correctly and nodes are delegated, but inference fails at runtime.

root@verdin-imx95-12594079:/usr/bin/tensorflow-lite-2.16.2/examples# python3 label_image.py -i grace_hopper.bmp -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt -e /usr/lib/libneutron_delegate.so

Loading external delegate from /usr/lib/libneutron_delegate.so with args: {} INFO: NeutronDelegate delegate: 29 nodes delegated out of 31 nodes with 1 partitions. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. fail to create neutron inference job Error: component=‘Neutron Driver’, category=‘internal fault’, code=442 Traceback (most recent call last): File “/usr/bin/tensorflow-lite-2.16.2/examples/label_image.py”, line 120, in interpreter.invoke() File “/usr/lib/python3.12/site-packages/tflite_runtime/interpreter.py”, line 941, in invoke self._interpreter.Invoke() RuntimeError: /usr/src/debug/tensorflow-lite-neutron-delegate/2.16.2/neutron_delegate.cc:261 neutronRC != ENONE (113203 != 0)Node number 31 (NeutronDelegate) failed to invoke.

Hello @LinuxMan,

Thanks for all this information.

NXP confirmed to me:

  • neutron-converter is not required for Linux 6.6
  • It is only mandatory starting from Linux 6.12

I was not aware of this.

Anyway, from your output logs I see that you have a Verdin iMX95 SoM and NOT a Verdin iMX95 EVK. Could you please tell me how you built the BSP with the machine learning libraries for this SoM? I need to know your steps to know what is going wrong.

The reason I am asking this is also that I could not reproduce the error that you are seeing. I built a BSP reference multimedia image for Verdin iMX95 with the tensorflow libraries and tried to run the example mobilenet model without converting it with Neutron converter. Here is my output:


root@verdin-imx95-12593795:~# cat /etc/os-release 
ID=tdx-xwayland
NAME="TDX Wayland with XWayland"
VERSION="7.4.0-devel-20251203102051+build.0 (scarthgap)"
VERSION_ID=7.4.0-devel-20251203102051-build.0
VERSION_CODENAME="scarthgap"
PRETTY_NAME="TDX Wayland with XWayland 7.4.0-devel-20251203102051+build.0 (scarthgap)"
CPE_NAME="cpe:/o:openembedded:tdx-xwayland:7.4.0-devel-20251203102051+build.0"

root@verdin-imx95-12593795:~# uname -a
Linux verdin-imx95-12593795 6.6.101-7.4.0-devel #1 SMP PREEMPT Thu Sep 25 07:49:28 UTC 2025 aarch64 GNU/Linux

root@verdin-imx95-12593795:~# 
root@verdin-imx95-12593795:~# cd /usr/bin/tensorflow-lite-2.16.2/examples/
root@verdin-imx95-12593795:/usr/bin/tensorflow-lite-2.16.2/examples# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libneutron_delegate.so 
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: EXTERNAL d[  380.166724] remoteproc remoteproc0: powering up neutron-rproc
[  380.174777] remoteproc remoteproc0: Booting fw image NeutronFirmware.elf, size 42612

[  380.182920] remoteproc remoteproc0: remote processor neutron-rproc is now up
INFO: NeutronDelegate delegate: 29 nodes delegated out of 31 nodes with 1 partitions.

INFO: Applied EXTERNAL delegate.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: invoked
INFO: average time: 1.675 ms
INFO: 0.729412: 653 military uniform
INFO: 0.164706: 907 Windsor tie
INFO: 0.0196078: 458 bow tie
INFO: 0.00784314: 835 suit
INFO: 0.00784314: 466 bulletproof vest
root@verdin-imx95-12593795:/usr/bin/tensorflow-lite-2.16.2/examples# [  457.153052] remoteproc remoteproc0: stopped remote processor neutron-rproc

As you can see, it runs correctly here on my side. So I need to see your steps for how you built the image to help you further.

Hi rudy,
Could you please tell us which repo / branch / tag contains the kernel
6.6.101-7.4.0-devel that you are using on your side?

We tried to align our build with your setup, but we could not find this kernel version in the public Toradex Yocto layers.

We attempted to force the version with:

PREFERRED_VERSION_linux-imx = "6.6.101%"

or

PREFERRED_VERSION_linux-imx = "6.6.101"

but Yocto reports:

WARNING: preferred version 6.6.101% of linux-toradex not available (for item kernel-module-x-tables)
WARNING: versions of linux-toradex available: 6.6.94+git

So it looks like kernel 6.6.101-7.4.0-devel is not available in the public layers yet.
We also checked the public kernel repository directly:

git ls-remote --tags https://git.toradex.com/linux-toradex.git

and we could not find any tag matching:

6.6.101

So it seems that kernel 6.6.101-7.4.0-devel is not available in the public linux-toradex repository.
Could you please confirm:

  • where this kernel is published (repo / branch / tag)?
  • if it is not public yet, when it will be available in the BSP?

for imformation, about my os:

root@verdin-imx95-12594079:~# cat /etc/os-release
ID=tdx-xwayland
NAME="TDX Wayland with XWayland"
VERSION="7.4.0-devel-20251001122306+build.0 (scarthgap)"
VERSION_ID=7.4.0-devel-20251001122306-build.0
VERSION_CODENAME="scarthgap"
PRETTY_NAME="TDX Wayland with XWayland 7.4.0-devel-20251001122306+build.0 (scarthgap)"
CPE_NAME="cpe:/o:openembedded:tdx-xwayland:7.4.0-devel-20251001122306+build.0"
root@verdin-imx95-12594079:~#

About EVK vs SoM

I am using a Verdin i.MX95 SoM mounted on a Toradex carrier board (Dahlia or Mallow).
I just realized something important:

The i.MX95 on the NXP EVK is not the same hardware revision as the i.MX95 used on the Toradex Verdin SoM.
kernel 6.6.101-7.4.0-devel is for IMX95 EVK
kernel `6.6.94 is for IMX 95 SOM
At this point, based on the kernel and firmware differences, it seems that Neutron is not usable yet on Verdin i.MX95 SoM .
Thanks in advance for your help.

Best regards,

Hi @LinuxMan,

Did you follow my instructions for building the image with NPU libraries that I answered you on this thread?

With that, you should be able to build a BSP image of version 7.4.0 using Linux kernel 6.6.101.

At this point, based on the kernel and firmware differences, it seems that Neutron is not usable yet on Verdin i.MX95 SoM .

This is not true. I am using the same hardware type and revision as yours:

Toradex 0089 Verdin iMX95 Hexa 8GB WB IT V1.0A

From my results above of testing the mobilenet model from the examples on the BSP, you can see that it is working.

Hello Rudhi,

Yes, I tried the solution you suggested in thread #2, but I still get the same 442 error with the neutron delegate.

You also mentioned:

“From your output logs I see that you have a Verdin iMX95 SoM and NOT a Verdin iMX95 EVK. Could you please tell me how you built the BSP with the machine learning libraries for this SoM? I need to know your steps to understand what is going wrong.”

I think there is indeed a difference between the i.MX95 SoM and the i.MX95 EVK, especially regarding the memory configuration:

  • i.MX95 EVK: eMMC 64 GB, RAM 16 GB LPDDR5 @ 6400 MT/s
  • i.MX95 SoM: eMMC 32 GB, RAM 8 GB LPDDR4 @ 4266 MT/s

To build another, BSP to tested LINUX_VERSION = “6.6.101” , I created a .bbappend to change the LINUX_VERSION and SRCREV so it points to the git revision containing the 6.6.101 kernel:

# meta-rick/recipes-kernel/linux/linux-toradex_%.bbappend
# LINUX_VERSION = "6.6.101"
# SRCREV_machine = "5de717b694170588dd2f792be367ae56386e8cdb"

The image builds correctly and the flashing process works,
but the boot fails completely with this kernel version.

Here is the full consolidated information extracted from my boot logs, as requested.
The system boots, but then freezes with RCU stall / SMP deadlock / CPU soft & hard lockups when using kernel 6.6.101 on a Verdin iMX95 SoM (8GB LPDDR4).


1. U-Boot version and hardware information

U-Boot 2024.04-7.4.0-devel+git.cf50ed467107 (Aug 05 2025 - 15:22:26 +0000)
CPU: i.MX95 rev1.1 at 1800MHz
Model: Toradex 0089 Verdin iMX95 Hexa 8GB WB IT V1.0A
DRAM: 7.8 GiB
Serial#: 12594079

2. DTB / overlays / bootargs used

Loading Device Tree: imx95-verdin-wifi-dev.dtb
Applying Overlay: verdin-imx95_dsi-to-hdmi_overlay.dtbo
Applying Overlay: verdin-imx95_spidev_overlay.dtbo

Bootargs:
root=PARTUUID=8f008152-02 ro rootwait console=tty1 console=ttyLP2,115200

3. Reserved-memory configuration detected (critical section)

Reserved memory: created CMA memory pool at 0x00000000c0e00000, size 960 MiB
OF: reserved mem: 0x00000000c0e00000..0x00000000fcdfffff (983040 KiB) reusable linux,cma

OF: reserved mem: 0x000000008c000000..0x000000008dffffff (32768 KiB) optee_core
OF: reserved mem: 0x000000008e000000..0x000000008e1fffff (2048 KiB) optee_shm

Reserved memory: created DMA memory pool at 0x000000009c300000, size 1 MiB
OF: reserved mem: 0x000000009c300000..0x000000009c3fffff (1024 KiB)

OF: reserved mem: 0x00000000a0000000..0x00000000a00fffff (1024 KiB) vpu_boot

4. Kernel memory zones

Zone ranges:
  DMA    [mem 0x0000000090000000-0x00000000ffffffff]
  Normal [mem 0x0000000100000000-0x000000027fffffff]

Memory: 6898740K/8126464K available (983040K cma-reserved)

5. PCIe / MSI / IOMMU warnings during boot

imx6q-pcie 4c300000.pcie: Phy link never came up
pci-host-generic 4ca00000.pcie: no msi-map translation
pci-host-generic 4ca00000.pcie: no iommu-map translation

6. Kernel tainted by WLAN mlan module

mlan: loading out-of-tree module taints kernel

7. RCU stall / SMP lockup (system freeze)

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu_preempt kthread timer wakeup didn't happen
rcu_preempt kthread starved for 5240 jiffies
Possible timer handling issue on cpu=3
watchdog: Watchdog detected hard LOCKUP on cpu 3

8. CPU backtrace – deadlock in SMP / memory invalidation

CPU: 2 PID: 325 Comm: lsblk Tainted: G O
Call trace:
smp_call_function_many_cond
invalidate_bh_lrus
blkdev_flush_mapping
blkdev_put
blkdev_release
__fput

This is where the system permanently freezes.

  • The SoM boots correctly.
  • With kernel 6.6.101, the system freezes after boot due to a RCU stall / SMP hard lockup.
  • The backtrace shows a deadlock inside smp_call_function_many_cond during block layer cache invalidation.

If you need additional logs (full dmesg, .config, reserved-memory DTS, or reproducer), I can provide them.

Best regards,

Hello @LinuxMan,

You did a nice analysis and comparison between the iMX95 EVK and Verdin iMX95 module.

I still don’t understand why the image that you built did not boot on your Verdin iMX95. I have the binary that I built and tested. I can send it to you here. Could you please try to boot this one: Download - Toradex File Sharing Platform

Hi rudhi,
Thanks,

I loaded your image using Teasy Installer. The flashing process completed successfully, however the image does not boot, and I do not get any debug output on the UART, unlike with my own images.

regards ,
Rick

Hello @LinuxMan,

I am taking this over from Rudhi as she is on vacation.

I think the problem with the boot is related to the silicon revision on your board.
If using a Verdin iMX95 V1.0A, it will have an i.MX95 A1 SoC.
If using a Verdin iMX95 V1.0B, it will have an i.MX95 B0 SoC.

The image sent by Rudhi is for B0 SoCs and will therefore not boot on A1 silicon.
To build images targeting A1 silicon after BSP 7.4.0, you need to set the following on your local.conf file:

IMX_SOC_REV:${MACHINE} = "A0"

When it comes to the actual issue in this thread, we spoke about it before her vacation.
My main suspicion is the use of mismatched NXP and Toradex BSP versions.
On the Toradex BSP 7, we base on NXP’s Linux BSP 6.6.*_2.x.y.

Therefore, the recipes for machine learning libraries should be taken from the respective NXP BSP release.
In particular, I would recommend taking them from scarthgap-6.6.52-2.2.2: meta-imx/meta-imx-ml at scarthgap-6.6.52-2.2.2 · nxp-imx/meta-imx · GitHub

If you are already doing that or continue to face issues after getting the recipes from there, please let me know.

Best Regards,
Bruno

Hello Bruno,

Thanks for the clarification.

I can confirm that my board is a Verdin iMX95 V1.0A, so it is indeed an i.MX95 A1 SoC.

This explains why the image provided (built for B0 silicon) does not boot on my board and why there is no UART output.

I will rebuild the image with the following setting in local.conf to target A1 silicon:

IMX_SOC_REV:${MACHINE} = "A0"

Regarding the machine learning stack, I will also make sure that all ML recipes are taken from the matching NXP BSP:
scarthgap-6.6.52-2.2.2 (meta-imx / meta-imx-ml), to avoid any mismatch with Toradex BSP 7 (based on NXP Linux BSP 6.6.*_2.x.y).

I’ll get back to you once I’ve rebuilt and tested the image, or if I still encounter issues.

Best regards,

1 Like