Improving FPS with Custom TensorFlow Lite Model on Verdin Board

vishal · October 25, 2024, 8:01am

Hello Toradex Community,

I’m running a custom TensorFlow Lite model on my Verdin development board (IMX8MP). Currently, my model achieves around 3 FPS when running on the NPU, but it only runs at 0.5 FPS on the Verdin IMX8MP CPU. In contrast, the sample model provided in the example code achieves more than 30 FPS. The size of my model is 13 MB.

I have attached the link to the model I’m using for reference:tflite model [LINK REMOVED]. Are there specific optimizations or configurations that might help increase the FPS for my custom model on this hardware? Any recommendations for tuning TensorFlow Lite models for better performance on the Verdin board’s NPU would be greatly appreciated!

Thanks in advance for your insights!

ToradexAI · October 25, 2024, 8:08am

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

To improve the FPS of your custom TensorFlow Lite model on the Verdin IMX8MP, you can consider the following optimizations and configurations:

Model Optimization Techniques: Use the TensorFlow Model Optimization Toolkit to apply quantization and pruning. These techniques can help reduce the model size and improve inference speed, which is crucial for achieving higher FPS on the NPU.
Utilizing the NPU: Ensure that your model is fully optimized for the NPU by using the correct TensorFlow Lite delegate for NPU execution. This can significantly enhance performance as the NPU is designed to handle AI workloads more efficiently than the CPU.
Model Size Reduction: Since your model is 13 MB, reducing its size through quantization or pruning could potentially improve performance. Smaller models generally perform better on embedded systems like the Verdin board.
Profile the Model: Use profiling tools to identify bottlenecks in your model. This can help you understand which parts of the model are slowing down performance and may need optimization.
Community Insights: Engage with the Toradex community for shared experiences and specific optimization techniques used by others on the Verdin board. This may yield valuable insights and practical advice for improving FPS on custom models.

By implementing these recommendations, you may be able to significantly enhance the performance of your custom TensorFlow Lite model on the Verdin board’s NPU.

Please, also check if these links can help you solve your problem:

NXP i.MX 8M Plus - Verdin System on Module.

allan.tx · October 25, 2024, 12:15pm

Hi @vishal

Can you give more details about your model?

What type of task is this model used for? Is it object detection?
What is the name of the model you are using? Does it support quantization?
- And did you quantized the model?
What is the shape of the inputs and outputs?

vishal · October 25, 2024, 12:31pm

hi @allan.tx

we have to detect objects from image.
it is a quantized model that we are using
Input Shape: [ 1 640 640 3]
Output Shape: [ 1 25 4]

Best Regards,
Vishal

allan.tx · October 25, 2024, 1:45pm

Can you explain how are you running your benchmark?
are you using the same workflow from the samples?

Can you try using the same model from this comparison?

also, can you try runnin htop / gputop command when you are using the NPU?