Hello everyone,

signals.c (5.7 KB)

I’ve been working on the Verdin iMX8M Plus and recently began testing a simple convolution operation as a benchmark on the Cortex-M7. I successfully ran the application and integrated the CMSIS DSP library for optimized operations. However, I’ve encountered an interesting observation and would appreciate some insight.

I’ve noticed that when comparing my own convolutional function to the CMSIS function `arm_conv_f32()`

, there isn’t a significant difference in the cycles taken by the processor to execute them. Upon examining the assembly code, it appears that my convolution function generates approximately 143,000 instructions, and the cycles taken by the Cortex-M7 are as follows:

- For my simple convolution: 107,000 cycles, roughly equivalent to 0.13 ms.
- For the optimized CMSIS-DSP convolution function: 76,000 cycles, approximately 0.095 ms.

I’m seeking advice from those with more experience to confirm if these numbers are credible. Could it be possible that I need to enable something beyond the FPU for better performance? Perhaps there are optimizations or configurations I may have overlooked that could further improve the execution time for this specific convolution operation.

For clarity, here’s my simple convolution function:

c

```
void convolution(float result[]) {
int i, j;
for (i = 0; i < KHZ1_15_SIG_LEN + IMP_RSP_LENGTH - 1; i++) {
result[i] = 0;
for (j = 0; j < IMP_RSP_LENGTH; j++) {
if (i - j >= 0 && i - j < KHZ1_15_SIG_LEN) {
result[i] += input_signal_f32_1kHz_15kHz[i - j] * impulse_response[j];
}
}
}
}
```

Where:

```
#define KHZ1_15_SIG_LEN 320
#define IMP_RSP_LENGTH 29
```

I’ve also attached the signal as a text file to this message for completeness.

Your expertise and guidance on this matter would be greatly appreciated. Thank you in advance for your help!