Hello,
i am using verdin developemnt borad and verdin IMX8MP module.
My projects object detection using ML , i am using YOLO model for obejct detection.
I tested the YOLO model for object detection. However, it takes approximately 19 seconds to process a single image. Could you please suggest how I can reduce the processing time?
Hi, I’m the Toradex AI Assistant.
I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.
Hello,
Thank you for your detailed question! To significantly reduce processing time for object detection with your YOLO model on the Verdin iMX8M Plus, you can leverage the module’s Neural Processing Unit (NPU), which is specifically designed to accelerate AI inference.
Here are some steps and recommendations:
-
Use TensorFlow Lite with NPU Acceleration
If your YOLO model can be converted to TensorFlow Lite format (e.g., YOLOv5 or YOLOv8 with TFLite support), you can enable NPU acceleration by setting the following environment variables:USE_HW_ACCELERATED_INFERENCE=1 USE_GPU_INFERENCE=0
This ensures inference will use the NPU hardware rather than the CPU or GPU for maximum acceleration. More information is available here:
Torizon Machine Learning Sample - TensorFlow Lite Prerequisites -
Expected Performance Gain
When using the NPU, inference time can improve by approximately 12x compared to CPU-only processing. For example, test results have shown a reduction from several seconds per image (CPU) down to about 35ms per frame (NPU):
Object Detection with AI on Toradex Modules -
Check Your Verdin iMX8M Plus Variant
Not all variants of the Verdin iMX8M Plus include NPU support. Please verify that your specific module variant features the NPU, as required for this acceleration. -
Optimize the Model
- Convert the YOLO model to TensorFlow Lite and ensure it is quantized (INT8 or UINT8), as this is required for NPU compatibility.
- If starting from PyTorch or ONNX, convert to TFLite format following best practices for edge devices.
-
Thermal Management
Ensure your system is not thermally throttling by monitoring CPU temperature. Consider attaching a heatsink if the module is running hot:
Thermal management tips
By switching your inference to utilize the NPU as described, you should see a drastic reduction in processing time, making object detection feasible for real-time applications.
If you would like, I can provide further guidance on how to convert your YOLO model to TensorFlow Lite format and enable NPU inference.
Best regards,
Toradex AI Assistant
Hi @sandy1464
You are probably not using the tflite delegates, or there’s something wrong with your model conversion. So all the processing falls to CPU inference.
Testing with an Ultralytics YOLO11n, 640p input size, you are expected around ~ 120 ms per image.
Check the reference sample in this repo.
Note that you need to first build the delegate and make it available to tflite before using it.
Check the Dockefile
, it contains all the build and compile steps