Unable to run inference on npu ( npu not found? )

Hi all,

I’m building a custom yocto image and trying to run inference using the NPU as follows:

root@verdin-imx8mp:/usr/bin/tensorflow-lite-2.9.1/examples# USE_GPU_INFERENCE=0 python3 label_image.py -e /usr/lib/libvx_delegate.so
Loading external delegate from /usr/lib/libvx_delegate.so with args: {}
Vx delegate: allowed_cache_mode set to 0.
Vx delegate: device num set to 0.
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
Loaded delegate
Running inference?
[     1] Failed to open device: No such file or directory, Try again...
[     2] Failed to open device: No such file or directory, Try again...
[     3] Failed to open device: No such file or directory, Try again...
[     4] Failed to open device: No such file or directory, Try again...
[     5] _OpenDevice(1249): FATAL: Failed to open device, errno=No such file or directory.

The yocto recipie seems to have built fine, the delegate etc is all installced and tensorflow lite appears to be working but it’s failing with the above.

My assumption is this is because something isn’t loaded in /dev/ but I don’t know what that something is or what I am potentially missing.

So a few questions:

Should I see something npu related in dmesg?
Should I see something in /dev if everything is installed correctly?

Thank you!

p.s I basically just followed the guide here to get this far:

Repos used

header:
  version: 11

repos:
  meta-toradex-nxp:
    url: "https://git.toradex.com/meta-toradex-nxp.git"
    path: sources/meta-toradex-nxp
    branch: kirkstone-6.x.y
    commit: 92a03e8efa00234026139919789989a11bc7ed58
  
  meta-freescale-3rdparty:
    url: "https://github.com/Freescale/meta-freescale-3rdparty.git"
    path: sources/meta-freescale-3rdparty
    branch: kirkstone
    commit: "9e94b64bdfebcf7bfdf2af6447cec866a4efa814"

  meta-freescale-distro:
    url: "https://github.com/Freescale/meta-freescale-distro.git"
    path: sources/meta-freescale-distro
    branch: kirkstone
    commit: "d5bbb487b2816dfc74984a78b67f7361ce404253"

  meta-freescale:
    url: "https://github.com/Freescale/meta-freescale.git"
    path: sources/meta-freescale
    branch: kirkstone
    commit: "fb17bfb8edcc9560bc1beb966a68f1f4c08ecfb3"

  meta-toradex-bsp-common:
    url: "https://git.toradex.com/meta-toradex-bsp-common.git"
    path: sources/meta-toradex-bsp-common
    branch: kirkstone-6.x.y
    commit: "cc1fdcaf6fe8a657c3706b66faa827994ede416f"

  meta-toradex-econ:
    url: "https://github.com/toradex/meta-toradex-econ.git"
    path: sources/meta-toradex-econ
    branch: kirkstone-6.x.y
    commit: "2d6b647072daadeaa321d377d5dc1d56c3fc4c8e"
    layers:
      meta-ar0521:

  meta-imx:
    url: "https://github.com/nxp-imx/meta-imx.git"
    path: sources/meta-imx
    branch: kirkstone-5.15.71-2.2.2
    commit: "ca68ab5d25322b51f54564275c84ab8de3c74ea6"
    layers:
      meta-ml:


  meta-mender-community:
    url: "https://github.com/mendersoftware/meta-mender-community.git" 
    path: sources/meta-mender-community
    branch: kirkstone
    commit: 4dce73360532fe9b65db7759fc984b839efb37ca
    layers:
      meta-mender-toradex-nxp:

Hi, I’m the Toradex AI Assistant. I’m in early beta, and I would appreciate your feedback.
The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


Hello,

Based on the error messages you’ve encountered, it seems like the NPU device is not being recognized or accessed correctly by the TensorFlow Lite delegate. This issue typically arises due to missing or improperly configured drivers or device tree settings required for NPU operation.

  1. Checking dmesg for NPU-related logs: Yes, you should check the dmesg output for any NPU-related messages. After booting your system, look for logs that indicate the NPU hardware is initialized and recognized by the kernel. Messages related to the loading of NPU drivers or any errors/failures during their initialization can provide valuable clues.

  2. Checking /dev for NPU device nodes: If the NPU drivers are correctly loaded, you should see device nodes related to the NPU in the /dev directory. The exact name of these device nodes can vary based on the specific NPU and driver version, but they often include identifiers like npu, galcore, or similar. If you do not see any relevant device nodes, this indicates that the NPU driver is either not loaded or not functioning correctly.

To address the issue, consider the following steps:

  • Ensure NPU Drivers are Included: Verify that your Yocto build includes the necessary drivers for the NPU. This typically involves checking your Yocto configuration and ensuring that the recipes related to the NPU drivers are included and built correctly.

  • Device Tree Configuration: Ensure that your device tree properly configures the NPU. The device tree must include entries for the NPU that match your hardware configuration. Incorrect or missing device tree configurations can prevent the NPU from being recognized by the system.

  • Kernel Configuration: Check your kernel configuration to ensure that support for the NPU and any required interfaces (e.g., IOMMU) is enabled.

  • Permissions: Verify that the device nodes in /dev (once they exist) have the correct permissions to allow access by the user or process attempting to perform inference.

  • NPU Warm-up Time: Remember that the i.MX 8M Plus SoC requires a warm-up time for the NPU. Although this is unlikely to be the cause of the error messages you’re seeing, it’s important to account for this in your application logic.

If after checking these areas you still face issues, it may be helpful to revisit the documentation and resources provided by NXP and Toradex for configuring and using the NPU on the i.MX 8M Plus

Please, also check if these links can help you solve your problem:

Building Machine Learning Software with Reference Images for Yocto Project | Toradex Developer Center.

I’ve tried rebuilding everything using td-wayland as the distro ( was using poky before ) but am still getting this error?

Has anyone got a device where this works who could please show me what dmesg or the contents /dev should look like if the npu is found and working?

Thank you!

edit: running strace I get

openat(AT_FDCWD, "/dev/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/graphics/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, NULL) = 0
write(2, "[     1] Failed to open device: "..., 72[     1] Failed to open device: No such file or directory, Try again...
) = 72
openat(AT_FDCWD, "/dev/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/graphics/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, NULL) = 0
write(2, "[     2] Failed to open device: "..., 72[     2] Failed to open device: No such file or directory, Try again...
) = 72
openat(AT_FDCWD, "/dev/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/graphics/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, NULL) = 0
write(2, "[     3] Failed to open device: "..., 72[     3] Failed to open device: No such file or directory, Try again...
) = 72
openat(AT_FDCWD, "/dev/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/graphics/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, NULL) = 0
write(2, "[     4] Failed to open device: "..., 72[     4] Failed to open device: No such file or directory, Try again...
) = 72
openat(AT_FDCWD, "/dev/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/graphics/galcore", O_RDWR) = -1 ENOENT (No such file or directory)
write(2, "[     5] _OpenDevice(1249): FATA"..., 91[     5] _OpenDevice(1249): FATAL: Failed to open device, errno=No such file or directory.
) = 91
getpid()                                = 1056
exit_group(1)                           = ?
+++ exited with 1 +++

Galcore module also seems to be loaded

root@verdin-imx8mp:/dev# modinfo galcore
filename:       /lib/modules/5.15.148-6.0.0-devel+git.f437ddb7191d/extra/galcore.ko
import_ns:      VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver
license:        Dual MIT/GPL
description:    Vivante Graphics Driver
alias:          of:N*T*Cfsl,imx6q-gpuC*
alias:          of:N*T*Cfsl,imx6q-gpu
alias:          of:N*T*Cfsl,imx8-gpu-ssC*
alias:          of:N*T*Cfsl,imx8-gpu-ss
depends:
name:           galcore
vermagic:       5.15.148-6.0.0-devel+git.f437ddb7191d SMP preempt mod_unload modversions aarch64
parm:           initgpu3DMinClock:int
parm:           major:major device number for GC device (uint)
parm:           irqLine:IRQ number of GC core (int)
parm:           registerMemBase:Base of bus address of GC core AHB register (ulong)
parm:           registerMemSize:Size of bus address range of GC core AHB register (ulong)
parm:           irqLine2D:IRQ number of G2D core if irqLine is used for a G3D core (int)
parm:           registerMemBase2D:Base of bus address of G2D core if registerMemBase2D is used for a G3D core (ulong)
parm:           registerMemSize2D:Size of bus address range of G2D core if registerMemSize is used for a G3D core (ulong)
parm:           irqLineVG:IRQ number of VG core (int)
parm:           registerMemBaseVG:Base of bus address of VG core (ulong)
parm:           registerMemSizeVG:Size of bus address range of VG core (ulong)
parm:           contiguousSize:Size of memory reserved for GC (ulong)
parm:           contiguousBase:Base address of memory reserved for GC, if it is 0, GC driver will try to allocate a buffer whose size defined by contiguousSize (ullong)
parm:           externalSize:Size of external memory, if it is 0, means there is no external pool (ulong)
parm:           externalBase:Base address of external memory (ullong)
parm:           fastClear:Disable fast clear if set it to 0, enabled by default (int)
parm:           compression:Disable compression if set it to 0, enabled by default (int)
parm:           powerManagement:Disable auto power saving if set it to 0, enabled by default (int)
parm:           baseAddress:Only used for old MMU, set it to 0 if memory which can be accessed by GPU falls into 0 - 2G, otherwise set it to 0x80000000 (ulong)
parm:           physSize:Obsolete (ulong)
parm:           recovery:Recover GPU from stuck (1: Enable, 0: Disable) (uint)
parm:           stuckDump:Level of stuck dump content. (uint)
parm:           showArgs:Display parameters value when driver loaded (int)
parm:           mmu:Disable MMU if set it to 0, enabled by default [Obsolete] (int)
parm:           mmuException:use MMU Exception (1: Enable, 0: Disable) (uint)
parm:           irqs:Array of IRQ numbers of multi-GPU (array of int)
parm:           registerBases:Array of bases of bus address of register of multi-GPU (array of uint)
parm:           registerSizes:Array of sizes of bus address range of register of multi-GPU (array of uint)
parm:           chipIDs:Array of chipIDs of multi-GPU (array of uint)
parm:           type:0 - Char Driver (Default), 1 - Misc Driver (uint)
parm:           userClusterMask:User defined cluster enable mask (int)
parm:           smallBatch:Enable/disable GPU small batch feature, enable by default (int)
parm:           allMapInOne:Mapping kernel video memory to user, 0 means mapping every time, otherwise only mapping one time (int)
parm:           gpuTimeout:Timeout of operation that needs to wait for the GPU (uint)
parm:           sRAMBases:Array of base of bus address of SRAM,INTERNAL, EXTERNAL0, EXTERNAL1..., gcvINVALID_PHYSICAL_ADDRESS means no bus address (array of ullong)
parm:           sRAMSizes:Array of size of per-core SRAMs, 0 means no SRAM (array of uint)
parm:           extSRAMBases:Shared SRAM physical address bases. (array of ullong)
parm:           extSRAMSizes:Shared SRAM sizes. (array of uint)
parm:           sRAMRequested:Default 1 means AXI-SRAM is already reserved for GPU, 0 means GPU driver need request the memory region. (uint)
parm:           mmuPageTablePool:Default 1 means alloc mmu page table in virsual memory, 0 means auto select memory pool. (uint)
parm:           sRAMLoopMode:Default 0 means SRAM pool must be specified when allocating SRAM memory, 1 means SRAM memory will be looped as default pool. (uint)
parm:           mmuDynamicMap:Default 1 means enable mmu dynamic mapping in virsual memory, 0 means disable dynnamic mapping. (uint)

Hello @alan01252,

Welcome to the Toradex Community!

From your logs it seems that you are building the version 6.0.0 of the Toradex BSP 6.
This version is outdated and should not be used.

The steps on Building Machine Learning Software with Reference Images for Yocto Project | Toradex Developer Center should be used with BSP 6.4.0 or later.
I would recommend that you use the latest quarterly release, 6.6.0.

To do this, you just need to initialize the build environment with the tag 6.6.0:

repo init -u git://git.toradex.com/toradex-manifest.git -b refs/tags/6.6.0 -m tdxref/default.xml
repo sync

It is also important to note that the steps were validated for the tdx-xwayland distro, therefore I recommend that you use it at least for initial testing.

Best Regards,
Bruno

Thanks @bruno.tx

I have upgraded to the latest versions as per the toradex-manifest, and I can see I am building the same kernel modules etc, but for some reason modprobe galcore just isn’t loading the/dev

I’ve tried comparing the rootfs between our build and the reference image ( where it does seemm to work ) and I can’t see anything obvious.

Are there any other pointers you can give in the right direction to determine what might be missing/not configured?

Hello @alan01252,

Just to confirm, are you building the tdx-reference-multimedia-image image?

If not, I would recommend that you first try this image before proceeding to a custom image.
The reason for this is that we test and validate the Reference Multimedia Image, therefore it has a working configuration.

Best Regards,
Bruno

The reference image as on the Easy installer does indeed provide the /dev/galcore when the module is loaded.

I am now trying to build custom image with tensorflow lite installed to test the npu, but no matter what I do I don’t seem to get my image to behave the same way :slight_smile:

As stated above I’ve tried comparing using kdiff the rootfs generated by both the reference image and the custom image and they are as you’d expect mostly the same but I must be missing something

I just don’t know what :frowning:

I am assuming there’s nothing obvious you can point to like

make sure you have dep x/y/z or library a/b/c

Else I’ll have to keep trying to hunt for differences :slight_smile:

Hello @alan01252,

The reference image on Toradex Easy Installer is built using the BSP.
Therefore you should get the same result when building the tdx-reference-multimedia-image with Yocto.
Adding the Machine Learning Libraries should not interfere with the driver.

Can you confirm which command are you using to build your image?
I can then try to reproduce the problem here.

Best Regards,
Bruno

So it’s a bit complicated, I am trying to adapt our existing kas based pipeline to support this new hardware :slight_smile: Everything is working and being built properly apart from this dang module!

My next step is to try

And see if the /dev/galcore gets produced properly, as the above more closely resembles our existing pipeline.

Hello @alan01252,

Noted, so I think the process will be a bit different.

I would recommend that you first try to successfully build the default Reference Multimedia Image before trying to add layers to it.

Fortunately, I was able to find an example of someone converting the repo setup we use to kas.
It may be worth a read: Setting Up Yocto Projects with kas – Burkhard Stubert

When you get to the point of adding the machine learning libraries, please keep in mind that on our guide we do not add the whole meta-imx, but only meta-ml/recipes-libraries.
This is because we use the meta-freescale* layers instead.

Best Regards,
Bruno

Thanks @bruno.tx the mender repo linked is essentially the same thing as your linked article and very similar to what I’ve built, but obviously something is missing in the built image…I just have no idea what…

Just an update, I now have this working, I am still not entirely sure exactly what moving part was needed but to fix this ( hopefully should be more obvious comparing this working image to my previous incarnations ) but our custom image is working.

Thanks for your help…

Hello @alan01252,

That is great to know.
Thanks for the update.

Best Regards,
Bruno

For anyone with a similar error, I am pretty my problem was the boot overlays directory wasn’t being created.

That coupled with

    MENDER_UBOOT_POST_SETUP_COMMANDS:append = " ; setenv tdxargs \${tdxargs} \${bootargs}; "
    MENDER_UBOOT_POST_SETUP_COMMANDS:append = " ; setenv overlays_file /boot/overlays.txt ; setenv overlays_prefix boot/overlays/ "

Appears to resolved the issue.