Run Pytorch Model on the internal NPU of the IMX95

ToradexAI · October 31, 2025, 9:13am

Hi, I’m the Toradex AI Assistant.

I might make mistakes or occasionally get you wrong. Our support team will step in if needed.

I understand you’re experiencing issues with your PyTorch model producing consistent outputs when running on the i.MX95 NPU, despite working correctly on CPU. This is a known challenge with neural network model conversion pipelines.

Analysis of Your Conversion Pipeline

Your conversion workflow (PyTorch → ONNX → Quantized ONNX → TFLite → Neutron) is complex and introduces multiple potential points of failure. The fact that your model works correctly on CPU but produces identical outputs on NPU suggests the issue occurs during the neutron-converter step.

Common Causes and Solutions

1. Layer Compatibility Issues

The Neutron-S NPU has specific operator constraints. Based on the community discussion (1), certain layers like fully connected layers can encounter conversion issues even when they appear to meet documented constraints.

2. Quantization Artifacts

The quantization process (step 2) might be introducing artifacts that only manifest when running on NPU hardware. The NPU’s quantized inference path may be more sensitive to these artifacts than CPU execution.

3. Input/Output Tensor Handling

Verify that your input preprocessing and output postprocessing remain consistent between CPU and NPU execution paths. The neutron-converter may alter tensor layouts or scaling factors.

Recommended Debugging Steps

Step 1: Simplify the Model

Create a minimal version of your model with fewer layers to isolate which operations are causing issues.

Step 2: Compare Intermediate Outputs

If possible, extract and compare intermediate layer outputs between CPU (TFLite) and NPU (Neutron) versions to identify where divergence occurs.

Step 3: Verify Quantization Parameters

Ensure your calibration dataset of 500 samples is representative of your actual inference data and covers the full range of expected inputs.

Step 4: Test Alternative Conversion Paths

Consider converting directly from TensorFlow/Keras to TFLite, then to Neutron, bypassing the PyTorch → ONNX conversion if feasible.

Step 5: Enable Verbose Logging

Use verbose logging in the neutron-converter to identify any warnings or constraints being applied during conversion.

Model Architecture Considerations

Since you mentioned the NXP MobileNet example works correctly, compare your model architecture against MobileNet:

Activation functions: Ensure you’re using NPU-supported activations
Layer types: Verify all your layers are in the supported operator list
Tensor shapes: Check for any dynamic shapes that might not be handled correctly

Next Steps

Given the complexity of your conversion pipeline, I’d recommend starting with the simplest possible model that reproduces your issue, then gradually adding complexity while monitoring where the problem emerges.

The community discussion (1) shows similar conversion challenges with classifier models, suggesting this is a known area where the neutron-converter may need refinement.

Best regards,
Toradex AI Assistant.

Did you like my answer?

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.