Need help running the NPU on Verdin IM8XMP

GufranThakur · June 27, 2025, 5:46am

Verdin iMX8M Plus 4GB WB IT

Greetings. We are using (the minimal reference image) TorizonOS on our Verdin IM8XMP plus. It does have an NPU, but when we run the sample examples, the NPU is not used and the python prorgam instead fallbacks to the CPU.

Our SOM does not have the libvx_delegate.so (the delegate file responsible for interacting with the NPU). We thought running with a docker container with the delegate file installed (if it is…) would be possible, but nothing worked.

We even tried using bitbake and creating our own Yocto based image with VX delegate installed, but the build takes hours and fails everytime.

Need urgent help regarding this. Is there a container which has the delegate file installed so I can run AIML programs on the NPU?

ToradexAI · June 27, 2025, 5:47am

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer?

Hello! Thank you for your detailed question.

Yes, Toradex provides a pre-built container specifically for TensorFlow Lite with NPU (VX delegate) support for the Verdin iMX8M Plus. You do not need to create a custom Yocto image—this works with the standard Torizon OS minimal reference image.

Steps to Run TensorFlow Lite with NPU on Torizon OS

Pull and Run the Pre-built Container:

Use the official pre-built TFLite container. Here is an example command:
```
docker run --rm -it \
  -e USE_HW_ACCELERATED_INFERENCE=1 \
  -e USE_GPU_INFERENCE=0 \
  torizonextras/arm64v8-sample-tflite-rtsp:3
```
This container includes the VX delegate and all dependencies for NPU inference.
Explanation of Environment Variables:
- USE_HW_ACCELERATED_INFERENCE=1: Enables hardware accelerated inference (NPU).
- USE_GPU_INFERENCE=0: Ensures the GPU delegate is not used—only the NPU.
Prerequisites:
- Make sure you are running a Verdin iMX8M Plus variant with an NPU (you have the 4GB WB IT, which includes it).
- Your Torizon OS minimal reference image should be up-to-date.
Reference Documentation & Sample:
- Running TensorFlow Lite with NPU on Verdin iMX8M Plus

Common Issues:

If you get errors regarding missing libvx_delegate.so even in this container, double-check that your device tree is correct and you are using a compatible Torizon OS version.
Not all models are supported by the NPU delegate. The sample uses a MobileNet v1 model, which works.

No Delegate Outside Container

The VX delegate (libvx_delegate.so) is not part of the base Torizon OS, but is included within this pre-built container. You don’t need to install it separately. Using this container is the recommended way.

Summary

You do not need to build your own carrier or Yocto image. Use the provided container referenced above, and you should see the NPU utilized.

Best regards, Toradex AI Assistant.

References:

GufranThakur · June 27, 2025, 6:18am

Upon running the command you mentioned, we get the following output :

**Note : The image link you provided was incorrect. We used the ```
torizonextras/arm64v8-sample-tflite-rtsp:3.0.0-bookworm


Status: Downloaded newer image for torizonextras/arm64v8-sample-tflite-rtsp:3.0.0-bookworm

(python3:1): GStreamer-WARNING **: 06:14:17.281: External plugin loader failed. This most likely means that the plugin loader helper binary was not found                                    or could not be run. You might need to set the GST_PLUGIN_SCANNER environment variable if your setup is unusual. This should normally not be required thou                                   gh.
[ WARN:0@2.199] global ./modules/videoio/src/cap_gstreamer.cpp (2401) handleMessage OpenCV | GStreamer warning: Embedded video playback halted; module v4l                                   2src0 reported: Cannot identify device '/dev/video0'.
[ WARN:0@2.199] global ./modules/videoio/src/cap_gstreamer.cpp (1356) open OpenCV | GStreamer warning: unable to start pipeline
[ WARN:0@2.199] global ./modules/videoio/src/cap_gstreamer.cpp (862) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been creat                                   ed
Vx delegate: allowed_cache_mode set to 0.
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
WARNING: Fallback unsupported op 32 to TfLite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

allan.tx · June 27, 2025, 1:58pm

Hi @GufranThakur

The instructions from ToradexAI should work.

And what’s it that you see when you connect to the RTSP?
Are you able to see the app running?

GufranThakur · June 28, 2025, 3:46am

For now we are not connecting the output to anything(like GStreamer). Judging from the error logs, I suppose the program is working fine, the issue is just that its running on the CPU, not on the NPU

allan.tx · June 30, 2025, 1:16pm

@GufranThakur

Can you test it?

Try toggling this variable to see the difference between running it on NPU and CPU.

For reference, using a MobileNet SSD v2 with input size 300 x 300
You can expect around 10 ms for inference.
For comparison, running this same model on an RPI 4, it takes around 120 ms.

In summary, the best way to know if your pipeline is hardware-accelerated is to benchmark it