Hi, I’m the Toradex AI Assistant.
I might make mistakes or occasionally get you wrong. Our support team will step in if needed.
I understand you’re experiencing issues with your PyTorch model producing consistent outputs when running on the i.MX95 NPU, despite working correctly on CPU. This is a known challenge with neural network model conversion pipelines.
Analysis of Your Conversion Pipeline
Your conversion workflow (PyTorch → ONNX → Quantized ONNX → TFLite → Neutron) is complex and introduces multiple potential points of failure. The fact that your model works correctly on CPU but produces identical outputs on NPU suggests the issue occurs during the neutron-converter step.
Common Causes and Solutions
1. Layer Compatibility Issues
The Neutron-S NPU has specific operator constraints. Based on the community discussion (1), certain layers like fully connected layers can encounter conversion issues even when they appear to meet documented constraints.
2. Quantization Artifacts
The quantization process (step 2) might be introducing artifacts that only manifest when running on NPU hardware. The NPU’s quantized inference path may be more sensitive to these artifacts than CPU execution.
3. Input/Output Tensor Handling
Verify that your input preprocessing and output postprocessing remain consistent between CPU and NPU execution paths. The neutron-converter may alter tensor layouts or scaling factors.
Recommended Debugging Steps
Step 1: Simplify the Model
Create a minimal version of your model with fewer layers to isolate which operations are causing issues.
Step 2: Compare Intermediate Outputs
If possible, extract and compare intermediate layer outputs between CPU (TFLite) and NPU (Neutron) versions to identify where divergence occurs.
Step 3: Verify Quantization Parameters
Ensure your calibration dataset of 500 samples is representative of your actual inference data and covers the full range of expected inputs.
Step 4: Test Alternative Conversion Paths
Consider converting directly from TensorFlow/Keras to TFLite, then to Neutron, bypassing the PyTorch → ONNX conversion if feasible.
Step 5: Enable Verbose Logging
Use verbose logging in the neutron-converter to identify any warnings or constraints being applied during conversion.
Model Architecture Considerations
Since you mentioned the NXP MobileNet example works correctly, compare your model architecture against MobileNet:
- Activation functions: Ensure you’re using NPU-supported activations
- Layer types: Verify all your layers are in the supported operator list
- Tensor shapes: Check for any dynamic shapes that might not be handled correctly
Next Steps
Given the complexity of your conversion pipeline, I’d recommend starting with the simplest possible model that reproduces your issue, then gradually adding complexity while monitoring where the problem emerges.
The community discussion (1) shows similar conversion challenges with classifier models, suggesting this is a known area where the neutron-converter may need refinement.
Best regards,
Toradex AI Assistant.
Did you like my answer?

If I answered your question, please tick the “Solution” checkbox.
For follow-up questions to me, please use @ToradexAI in your post.