Neon-optimised code in torizon

I’ve started experimenting with FFT code directly on the A-cores on verdin boards. I’d like to deploy the code in a docker images and to test the performance improvements you get from the SIMD instructions. What is the simplest path to getting a docker image with neon-aware c-libraries?

Greetings @alt.mattr,

I’m not sure if I quite understand your use-case. But if you are simply trying to include a library inside a Docker container image, this should be simple. If you’re familiar with Dockerfiles and building container images then you can add any arbitrary file (like a library) into the container. From here you should be able to use the resulting container build with the libraries built in.

Let me know if this helps in anyway.

Best Regards,
Jeremias

Jerimias,

For libraries to use architecture-specific features I would expect they need to be compiled specifically for that architecture. In this case I am thinking specifically about NEON instructions. If I apt-get the library on the torizon docker images, I should not expect a version optimised for the architecture I am ultimately running on because that same library runs (via docker) on a multitude of architectures. I am not a C expert, so getting the right set of compiler flags and environment variables to match the hardware I will run on is out of my skill set. I am also not sure how much such compilation relies on compile-time hardware checks, so perhaps compiling the library on another device, even with the right flags, might not get the full performance available. The docker images are not the very latest debian versions so I have extra hurdles to compile some libraries. I guess lots of people use Torizon to run NEON-aware code though, so I thought there might be some experience out there I could use as a starting point.

In summary - I guess apt-get install <dsplibname> won’t get me neon optimised routines. I guess compiling <dsplibname> myself could, if I got all the settings right. I am wondering if there is a place I can get pre-built libraries I could rely on or advice on getting the compiler flags/macros/environment variables just right for verdin boards.

I looked a bit more into this topic. In general the story regarding DSP support is kind of messy. While the Verdin SoC from NXP does have a Tensilica HiFi 4 DSP. We currently don’t provide any support on it. In fact I believe the DSP hardware is disabled by default on our software BSP.

This is because we receive a lack of support regarding this topic from NXP who owns the SoC IP. The last statement we got from NXP was that they only support a niche use-case of automotive audio in terms of DSP support. You can try reaching out to NXP or search through their documentation regarding DSP usage. But the topic seems sparse from what I saw.

Best Regards,
Jeremias

Although the domain of my code is DSP, I don’t want to use any special hardware, I only want to make full use of the NEON instructions on the ARM-A cores. I am hopeful that good use of NEON instructions on a fast general purpose core will have the performance we need.

I’m not sure if I fully understand your question. As far as I know AArch64 should utilize NEON/SIMD by default. This is stated on the ARM website here: Documentation – Arm Developer

As well as in the GCC options for AArch64 which shows that SIMD is included by default: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

I would take this to mean that any C/C++ library or application that targets AAarch64 has this optimization applied by default.

Best Regards,
Jeremias