The investigations take longer as expected, and I could spend less time than I wanted on that issue. Hence I come up with partial results. However, this should be sufficient to guide you into the right direction.
CPU Performance Figures
To achieve 1MSpS you will need to optimize your code, so it is all about execution time. For my descriptions below, let’s use an M4 CPU cycle as the unit of time. The M4 is clocked at 240 MHz, so
Let’s focus on what the M4 can do in this time:
A typical assembly instruction is executed in 1 cyc (if executed in TCM memory)
- The same instruction executed from OCRAM takes 10 cyc.
Enabling cache makes it worse, the instruction takes 12-16cyc to execute
Load and store operations (LDR, STR) to TCM take 2 cyc
- A load (LDR) from a peripheral register takes 60 cyc
- A store to a peripheral register takes 60 cyc. However, the write is buffered. This means the CPU continues executing instructions after 2 cyc, except if there is a read from the same peripheral, which requires the system to wait until the previous write has finished.
(I didn’t analyze the exact conditions when the system waits for a write to finish, so my previous statement might be somewhat unprecise).
As you can see the really expensive operations are reading (and writing) peripheral registers - each register read consumes 25% of the available CPU time. This major goal must be minimizing these operations!
Achieving 1 MSps
My approach would be the following:
- Setup the ADC to do cyclic measurements, generating an interrupt after each sample.
- The interrupt service routine (ISR) should
- Read the ADC sample value
- Clear the interrupt flag by Writing 0x00000000 to the ADC Status register
Don’t do a bit-clear operation, as this would require reading this register before modifying and writing it!
- Process the sample for at least 60 cyc.
This allows the flag-clear operation to take effect before you leave the ISR
With this approach you should get away with only one expensive read operation. Even the write operation time can be hidden by using it for processing the sample, so there’s approximately 160 cycles available to process each sample.
FreeRTOS Performance Impact
As far as I can estimate, there’s a few major impacts on performance caused by FreeRTOS:
- Interrupts used by FreeRTOS. In the basic configuration, there’s only the timer interrupt which occurs once every tick (typically every 1ms).
- Any pending register read must be finished and can delay the execution of the ADC interrupt by 60 cyc.
- Task switches are rather expensive operations and reduce the CPU time which is available outside of the ISR.
- There are times when interrupts are disabled during task switches. I didn’t analyze how long such interrupt-disabled-periods can last.
You may consider of not using the FreeRTOS task switching at all. If you use FreeRTOS only to initialize the system and don’t start the scheduler, there should be no performance impact at all.
If you need additional performance improvements, you could use DMA to move ADC samples to RAM, and even do some preprocessing.
The i.MX7 features a smart DMA engine which is quite powerful. However, it would add complexity to your system, and nxp suggests that you approach them in order to let them create DMA scripts.
I hope this information helps you to reach the required performance goals.