Image processing in ML

sandy1464 · May 20, 2025, 4:21am

Hello,
i am using verdin developemnt borad and verdin IMX8MP module.
My projects object detection using ML , i am using YOLO model for obejct detection.
I tested the YOLO model for object detection. However, it takes approximately 19 seconds to process a single image. Could you please suggest how I can reduce the processing time?

ToradexAI · May 20, 2025, 4:21am

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer?

Hello,

Thank you for your detailed question! To significantly reduce processing time for object detection with your YOLO model on the Verdin iMX8M Plus, you can leverage the module’s Neural Processing Unit (NPU), which is specifically designed to accelerate AI inference.

Here are some steps and recommendations:

Use TensorFlow Lite with NPU Acceleration
If your YOLO model can be converted to TensorFlow Lite format (e.g., YOLOv5 or YOLOv8 with TFLite support), you can enable NPU acceleration by setting the following environment variables:
```
USE_HW_ACCELERATED_INFERENCE=1
USE_GPU_INFERENCE=0
```
This ensures inference will use the NPU hardware rather than the CPU or GPU for maximum acceleration. More information is available here:
Torizon Machine Learning Sample - TensorFlow Lite Prerequisites
Expected Performance Gain
When using the NPU, inference time can improve by approximately 12x compared to CPU-only processing. For example, test results have shown a reduction from several seconds per image (CPU) down to about 35ms per frame (NPU):
Object Detection with AI on Toradex Modules
Check Your Verdin iMX8M Plus Variant
Not all variants of the Verdin iMX8M Plus include NPU support. Please verify that your specific module variant features the NPU, as required for this acceleration.
Optimize the Model
- Convert the YOLO model to TensorFlow Lite and ensure it is quantized (INT8 or UINT8), as this is required for NPU compatibility.
- If starting from PyTorch or ONNX, convert to TFLite format following best practices for edge devices.
Thermal Management
Ensure your system is not thermally throttling by monitoring CPU temperature. Consider attaching a heatsink if the module is running hot:
Thermal management tips

By switching your inference to utilize the NPU as described, you should see a drastic reduction in processing time, making object detection feasible for real-time applications.

If you would like, I can provide further guidance on how to convert your YOLO model to TensorFlow Lite format and enable NPU inference.

Best regards,
Toradex AI Assistant

allan.tx · May 20, 2025, 4:54pm

Hi @sandy1464

You are probably not using the tflite delegates, or there’s something wrong with your model conversion. So all the processing falls to CPU inference.

Testing with an Ultralytics YOLO11n, 640p input size, you are expected around ~ 120 ms per image.

Check the reference sample in this repo.

github.com

toradex/torizon-samples/blob/bookworm/tflite/tflite-rtsp/demos/object-detection/object-detection.py#L80


      
          # Create factory launch string
          self.launch_string = f'appsrc name=source is-live=true format=GST_FORMAT_TIME ' \
                               f'! video/x-raw,format=BGR,width={CAPTURE_RESOLUTION_X},height={CAPTURE_RESOLUTION_Y},framerate={CAPTURE_FRAMERATE}/1 ' \
                               f'! videoconvert primaries-mode=fast n-threads=4 ' \
                               f'! video/x-raw,format=I420 ' \
                               f'! x264enc bitrate={STREAM_BITRATE} speed-preset=ultrafast tune=zerolatency threads=4 ' \
                               f'! rtph264pay config-interval=1 name=pay0 pt=96 '
          
          # Setup execution delegate, if empty, uses CPU
          if(USE_HW_ACCELERATED_INFERENCE):
              delegates = [tf.load_delegate("/usr/lib/libvx_delegate.so")]
          else:
              delegates = []
          
          # Load the Object Detection model and its labels
          with open("labelmap.txt", "r") as file:
              self.labels = file.read().splitlines()
          
          # Create the tensorflow-lite interpreter
          self.interpreter = tf.Interpreter(model_path="lite-model_ssd_mobilenet_v1_1_metadata_2.tflite",
                                            experimental_delegates=delegates)

Note that you need to first build the delegate and make it available to tflite before using it.
Check the Dockefile, it contains all the build and compile steps

github.com

toradex/torizon-samples/blob/bookworm/tflite/tflite-rtsp/Dockerfile

ARG BASE_NAME=debian
ARG IMAGE_ARCH=linux/arm64/v8
ARG IMAGE_TAG=3-bookworm
ARG DOCKER_REGISTRY=torizon

FROM --platform=$IMAGE_ARCH $DOCKER_REGISTRY/$BASE_NAME:$IMAGE_TAG AS tflite-build

## Install Python
RUN apt-get -y update && apt-get install -y \
  python3 python3-dev python3-numpy python3-pybind11 \
  python3-pip python3-setuptools python3-wheel \
  && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*

## Install build tools
RUN apt-get -y update && apt-get install -y \
    cmake build-essential gcc g++ git wget unzip patchelf \
    autoconf automake libtool curl gfortran

## Install dependencies
RUN apt-get -y update && apt-get install -y \

This file has been truncated. show original