Building Machine Learning Software with Verdin IMX95

Hi,

Is it possible to use “Building Machine Learning Software with Reference Images for Yocto Project”with the i.MX95?
Can I just change the machine in local.conf, like:

MACHINE ?= "verdin-imx95"

when following this guide:

Thanks!

Joe.

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Machine Learning Support for Verdin iMX95

Yes, you can use the “Building Machine Learning Software with Reference Images for Yocto Project” guide with the Verdin iMX95 by setting the machine in your local.conf file (1).

Machine Configuration

The Verdin iMX95 is supported as a machine target in the Toradex BSP (2). You can set it in your local.conf file:

MACHINE ?= "verdin-imx95"

(2)

OpenCL Dependency Adjustment

The machine learning guide includes a specific adjustment for OpenCL that’s particularly relevant for iMX95. The documentation states that “OpenCL is only used for GPU accelerated inference in the i.MX95” (1). The guide provides the following adjustment:

$ sed -i '/^RDEPENDS_OPENCL/s|= "opencl-icd-loader-dev"|= ""\nRDEPENDS_OPENCL:mx95-nxp-bsp = "opencl-icd-loader-dev"|' ../layers/meta-imx-ml/recipes-libraries/tensorflow-lite/tensorflow-lite_2.16.2.bb

(1)

This modification ensures that OpenCL dependencies are properly configured for the iMX95, while removing them for other iMX8 modules where they’re not needed (1).

Machine Learning Packages

Following the guide, you’ll add the machine learning recipes to your image:

$ echo 'IMAGE_INSTALL:append = " tensorflow-lite tensorflow-lite-vx-delegate opencv python3-pillow adwaita-icon-theme "' >> conf/local.conf

(1)

The Verdin iMX95 features NXP’s i.MX 95 applications processor with multi-core application domain including up to six Arm Cortex-A55 cores, plus an independent safety domain with Arm Cortex-M7 and Cortex-M33 CPUs (3). This architecture is well-suited for machine learning applications that require both high-performance processing and real-time capabilities.

Best regards,
Toradex AI Assistant.

Hi @LinuxMan,

Just to clarify, are you working with the Verdin iMX95 EVK from Toradex or the Verdin iMX95 SoM?

If it is the EVK, the NXP full image already has the ML libraries.
For the Verdin iMX95 SoM you need to build the Toradex BSP with some additional layers and libraries. Since these instructions are not public yet, please write an email to support@toradex.com referring to this thread, and I can send you a document with the instructions.

Hello,

Following my discussion with NXP, I can confirm that the Neutron Converter is indeed available in the NXP eIQ Toolkit environment as a standalone binary.

It is located in the following directory:

C:\nxp\eIQ_Toolkit_v1.xx.xx\bin\neutron-converter\MCU_SDK_xx.xx.xx+Linux_x.x.xx_x.x

It can be used directly from the command line, for example:

neutron-converter.exe --input <model_quant_int8.tflite> --output <model_neutron.tflite> --target imx95
```regards, Rick

This confirms that the full Neutron conversion toolchain is available in eIQ and can be used to generate NPU-optimized models for i.MX95.

Hello @LinuxMan,

Indeed, the full image from NXP provides the ML libraries you need to run different ML models on the NPU and GPU of iMX95 EVK. This Machine Learning user guide from NXP will give you an overview of how to run it on different platforms, including iMX95 EVK. You also need to use the eiQ toolkit from NXP (as you have found) to convert the models for GPU or NPU delegate. After that you can run it on the EVK directly, once you have copied the converted model into the target.

Also, you need to use an SDK version corresponding to the BSP version running on the EVK. For example, if you run BSP LF_v6.12.20-2.0.0 on the module, you need to use SDK MCU_SDK_25.06.00+Linux_6.12.20_2.0.0 for the conversion. As of today, the version of BSP and SDK that are known to work on NPU delegate is Linux_6.6.36_2.1.0. So our recommendation is to try it out on this BSP and a converted model with the corresponding SDK.

The steps are:

  • Copy /usr/bin/tensorflow-lite-2.18.0/examples/mobilenet_v1_1.0_224_quant.tflite from the device to the host PC.
  • Convert the model using the neutron converter at /opt/nxp/eIQ_Toolkit_v1.16.0/bin/neutron-converter/Linux_6.6.36_2.1.0
./neutron-converter --input mobilenet_v1_1.0_224_quant.tflite --output mobilenet_v1_1.0_224_quant_neutron.tflite --target imx95
  • Run the Model on the device with the label_img example
root@imx95-19x19-verdin:/usr/bin/tensorflow-lite-2.18.0/examples# ./label_image -m mobilenet_v1_1.0_224_quant_neutron.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libneutron_delegate.so

Do you have your ML demo working now on the device or not?

Hello Toradex support team,

I have made progress on this issue with the help of NXP.

My environment

  • Hardware: Verdin iMX95
  • BSP: Toradex Linux
  • Kernel:
Linux 6.6.94-7.4.0-devel
  • Neutron driver: built into kernel
  • Neutron delegate: libneutron_delegate.so

Confirmed with NXP

NXP confirmed to me:

  • neutron-converter is not required for Linux 6.6
  • It is only mandatory starting from Linux 6.12
  • Running a standard .tflite model with libneutron delegate should work on i.MX95
  • They successfully run inference on their i.MX95 EVK
  • But if i use neutron converter (C:\nxp\eIQ_Toolkit_v1.17.0\bin\neutron-converter\Linux_6.6.36_2.1.0) i have the same problem ( error 442)
    I successfully exported my model in INT8 PTQ from eIQ Portal and ran Neutron converter using:
neutron-converter.exe --input <model>.tflite --target imx95

The conversion completes, but the report shows:

Number of operators imported      = 65
Number of operators optimized     = 90
Number of operators converted     = 81
Number of operators NOT converted = 9
Neutron graphs                    = 4
Operator conversion ratio         = 0.9

So 9 operators are not converted by Neutron.


What is working on my side

  • Standard TFLite model runs correctly on CPU
  • Neutron delegate loads correctly
  • TFLite tensors are correct (INT8)
  • Graph is delegated successfully:
NeutronDelegate delegate: 62 nodes delegated out of 65

Issue on Verdin iMX95

When inference is executed on NPU:

fail to create neutron inference job
internal fault 442
Node number XX (NeutronDelegate) failed to invoke

This happens:

  • after delegate loading,
  • after graph partition,
  • at first inference execution.

Fallback to CPU works immediately.


Since NXP confirms that Neutron works on i.MX95 EVK without neutron-converter on Linux 6.6, this seems specific to the Toradex BSP integration.

Could you please advise:

  1. Is Neutron officially validated on Verdin iMX95 with Linux 6.6?
  2. Which Neutron version / firmware / BSP combination is supported?
  3. Is there any known issue with Neutron runtime on Toradex BSP?
  4. Is a microcode / firmware update required?

log error :slight_smile:
[BACKEND] AUTO mode: NPU if available, fallback to CPU on error
[NPU] Attempting to load Neutron delegate…
[NPU] :check_mark: Neutron delegate loaded successfully.
INFO: NeutronDelegate delegate: 62 nodes delegated out of 65 nodes with 2 partitions.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[TFLITE] Backend actif : NPU
[TFLITE][Tensor] input0: idx=0 shape=[ 1 224 224 3] dtype=<class ‘numpy.int8’> quant_scale=0.007843137718737125 quant_zp=-1
[TFLITE][Tensor] output0: idx=172 shape=[ 1 16] dtype=<class ‘numpy.int8’> quant_scale=0.00390625 quant_zp=-128
[TFLITE] Ops délégués : 0 / 68

Track 0
[TFLITE][Invoke:NPU] shape=(1, 224, 224, 3) dtype=int8
fail to create neutron inference job
Error: component=‘Neutron Driver’, category=‘internal fault’, code=442

** [TFLITE ERROR] backend=NPU, exception=/usr/src/debug/tensorflow-lite-neutron-delegate/2.16.2/neutron_delegate.cc:261 neutronRC != ENONE (113203 != 0)Node number 65 (NeutronDelegate) failed to invoke.**
→ Fallback CPU : réinitialisation de l’interpréteur sans delegate.
[TFLITE] Backend actif : CPU
[TFLITE][Tensor] input0: idx=0 shape=[ 1 224 224 3] dtype=<class ‘numpy.int8’> quant_scale=0.007843137718737125 quant_zp=-1
[TFLITE][Tensor] output0: idx=172 shape=[ 1 16] dtype=<class ‘numpy.int8’> quant_scale=0.00390625 quant_zp=-128
[TFLITE] Ops délégués : 0 / 66
[TFLITE][Invoke:CPU] shape=(1, 224, 224, 3) dtype=int8

Thank you for your support,

Even the official Toradex TensorFlow Lite example using
mobilenet_v1_1.0_224_quant.tflite fails on our i.MX95 board with :

  • Neutron internal fault 442
  • fail to create neutron inference job

The delegate loads correctly and nodes are delegated, but inference fails at runtime.

root@verdin-imx95-12594079:/usr/bin/tensorflow-lite-2.16.2/examples# python3 label_image.py -i grace_hopper.bmp -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt -e /usr/lib/libneutron_delegate.so Loading external delegate from /usr/lib/libneutron_delegate.so with args: {} INFO: NeutronDelegate delegate: 29 nodes delegated out of 31 nodes with 1 partitions. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. fail to create neutron inference job Error: component=‘Neutron Driver’, category=‘internal fault’, code=442 Traceback (most recent call last): File “/usr/bin/tensorflow-lite-2.16.2/examples/label_image.py”, line 120, in interpreter.invoke() File “/usr/lib/python3.12/site-packages/tflite_runtime/interpreter.py”, line 941, in invoke self._interpreter.Invoke() RuntimeError: /usr/src/debug/tensorflow-lite-neutron-delegate/2.16.2/neutron_delegate.cc:261 neutronRC != ENONE (113203 != 0)Node number 31 (NeutronDelegate) failed to invoke.

Hi @LinuxMan,

I have replied to you here: Neutron NPU Internal Fault (Code 442) on Verdin i.MX95 with TensorFlow Lite INT8 Model - #7 by rudhi.tx

Let’s use one thread to discuss one topic. Keeping duplicate tickets is not efficient for either of us :slight_smile: