Running TensorFlow Lite on NPU: Need Guidance

vishal · October 7, 2024, 11:16am

We are currently running a TensorFlow Lite program, but it is executing on the CPU. What steps should I take to run it on the NPU instead?

#### Install GPU Drivers ####
RUN apt-get -y update && apt-get install -y --no-install-recommends \
    imx-gpu-viv-wayland-dev \
    && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*

This is our Dockefile command
we are using imx8mp

ToradexAI · October 7, 2024, 11:17am

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

To run TensorFlow Lite on the NPU instead of the CPU, you need to ensure that your setup supports NPU execution and that the necessary drivers and configurations are in place. Here are some general steps you might consider:

Verify NPU Support: Ensure that your hardware (e.g., i.MX8 series) supports NPU execution and that the TensorFlow Lite version you are using is compatible with NPU acceleration.
Install NPU Drivers: The Dockerfile snippet you provided installs GPU drivers, but you need to ensure that NPU drivers are also installed. Check if there are specific NPU drivers or libraries required for your SoC and install them.
Use the Correct TensorFlow Lite Delegate: TensorFlow Lite uses delegates to offload computation to hardware accelerators like NPUs. Ensure that you are using the appropriate delegate for NPU execution. This might involve setting up the TensorFlow Lite interpreter to use the NPU delegate.
Modify Your Dockerfile: If there are specific packages or libraries required for NPU support, make sure to include them in your Dockerfile. This might involve adding additional packages or configuration steps.
Check Documentation: Refer to the NXP’s i.MX Machine Learning User’s Guide or any specific documentation related to NPU support for your hardware. This documentation might provide specific instructions or configurations needed to enable NPU execution.
Test and Debug: After making the necessary changes, test your setup to ensure that TensorFlow Lite is executing on the NPU. You might need to debug and adjust configurations based on the results.

If you have access to specific documentation or support resources from NXP or Toradex, they might provide more detailed guidance tailored to your hardware and software setup.

Please, also check if these links can help you solve your problem:

NPU versus CPU Results and Training for Tensorflow lite - NXP Community.

sahil.tx · October 8, 2024, 3:49pm

Hi @vishal ,

Please see the table runtime environment variables variables that can be passed to docker-run command.
eg USE_HW_ACCELERATED_INFERENCE=1

vishal · October 9, 2024, 7:16am

hello @sahil.tx ,
We have use this command USE_HW_ACCELERATED_INFERENCE=1 in Docker-compose file

version: "3.9"
services:
  project1-debug:
    build:
      context: .
      dockerfile: Dockerfile.debug
    image: ${LOCAL_REGISTRY}:5002/project1-debug:${TAG}
    ports:
      - ${DEBUG_SSH_PORT}:${DEBUG_SSH_PORT}
      - ${DEBUG_PORT1}:${DEBUG_PORT1}
    devices:
      - /dev/video2:/dev/video2  # Use the identified video device
      - /dev/gpiochip0
      
    environment:
      - DISPLAY=${DISPLAY}
      - USE_HW_ACCELERATED_INFERENCE=1
      - USE_GPU_INFERENCE=0
      - ACCEPT_FSL_EULA=1
      
    device_cgroup_rules:
      - 'c 4:* rmw'
      - 'c 13:* rmw'
      - 'c 199:* rmw'
      - 'c 226:* rmw'
      - 'c 81:* rmw'
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix

    

  project1:
    build:
      context: .
      dockerfile: Dockerfile
    image: ${DOCKER_LOGIN}/project1:${TAG}
    devices:
      - /dev/video2:/dev/video2  # Use the identified video device
      - /dev/gpiochip0
      
     
    environment:
      - CAPTURE_DEVICE=/dev/video2
      - USE_HW_ACCELERATED_INFERENCE=1
      - USE_GPU_INFERENCE=0
      - ACCEPT_FSL_EULA=1
    device_cgroup_rules:
      - 'c 4:* rmw'
      - 'c 13:* rmw'
      - 'c 199:* rmw'
      - 'c 226:* rmw'
      - 'c 81:* rmw'
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - /dev:/dev
      - /tmp:/tmp
      - /run/udev/:/run/udev/
    command: >
      gst-launch-1.0 rtspsrc location=rtsp://192.168.1.230:6502/inference ! decodebin ! xvimagesink sync=false
    privileged: true

My program get break at

 delegate = load_delegate('/usr/lib/libvx_delegate.so')

What should i do for this?
i have downloaded this package

RUN apt-get -y update && apt-get install -y \
    libovxlib \
    && apt-get clean && apt-get autoremove && rm -rf /var/lib/apt/lists/*

should i need to copy libvx_delegate.so file ?

Sudharshan · December 5, 2024, 8:34am

Hi @vishal
Any progress on the same?
How can I utilize NPU for object detection on a rtsp video stream, what framework did you used?

Regards

vishal · December 9, 2024, 5:06am

Hi Sudharshan ,
Now my python program is running on NPU.
previously my recipes SH files not running completely so that is why my tflite model is not running on NPU and also use delegate to define model.