Fast GPIO Read / write in Toradex Verdin iMX8MP

I am vivek and trying to interface IDT70V28L Dual Port RAM (64KB size) with Toradex Verdin IMX8MP using GPIO interface… we are able to read DPRAM data in Yocto linux using libgpiod… But the speed of reading is very slow…We tried with M7 SDK gpio driver example… But the speed of gpio is not matching with our requirement… We want to read 64 KB data from DP RAM in 2 ms…Can you provide us any bare metal gpio code for Cortex M7 core in Toradex Verdin iMX8MP.

Thanks and Regards,
VIVEK TOL

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Hello Vivek,

Thank you for your detailed explanation. Interfacing IDT70V28L Dual Port RAM with the Verdin iMX8MP via GPIO is a challenging task, especially with your performance requirement of reading 64 KB within 2 ms (~32 MB/s throughput).

Key optimizations for maximum GPIO performance on the Cortex-M7:

  1. Direct Register Access:
    Avoid SDK/library calls. Use direct hardware register access for GPIO (e.g., GPIOx->DR). This reduces software overhead and increases speed.
    Example (pseudo-code, adjust GPIO base addresses as per your setup):

    // Set GPIO pin (output)
    GPIOx->DR |= (1 << pin_number);    // Set pin high
    GPIOx->DR &= ~(1 << pin_number);   // Set pin low
    
    // Read GPIO pin (input)
    value = (GPIOx->DR & (1 << pin_number)) ? 1 : 0;
    
  2. Place Code in TCM:
    Place your critical GPIO read/write routines in the Tightly Coupled Memory. This ensures the fastest instruction and data access.

  3. Data Alignment:
    Use 32-bit aligned data buffers for efficient memory access.

  4. Minimize Loop Overhead:
    Reduce branches and function calls. Unroll loops if possible to maximize throughput.

  5. Consider DMA:
    If the memory controller and GPIO validation allow, utilize DMA for GPIO (if supported for your hardware block), though this is often not natively available for typical SoC GPIO.

Estimated Performance:

  • Direct register writes/read on M7 achieve about 250 ns per cycle for a read-modify-write sequence (realistic) (reference).
  • Reading 64KB in 2 ms would require less than 30 ns per byte, which is extremely aggressive for GPIO access.

Example Bare-Metal GPIO Snippet for Cortex-M7:
Here is an example to manually toggle/write to a GPIO pin:

#define GPIOx_BASE_ADDRESS 0xXXXXXX    // Replace with actual base address
#define GPIO_DR_OFFSET     0x00        // Data Register offset

volatile uint32_t *gpio_dr = (volatile uint32_t *)(GPIOx_BASE_ADDRESS + GPIO_DR_OFFSET);

// Set pin
*gpio_dr |= (1U << pinNumber);

// Clear pin
*gpio_dr &= ~(1U << pinNumber);

// Read pin
uint32_t value = (*gpio_dr & (1U << pinNumber)) ? 1 : 0;

Note: Replace GPIOx_BASE_ADDRESS and offsets according to the Verdin iMX8MP Reference Manual.

Practical Considerations:

  • Reading 64KB via GPIO this quickly is at the upper physical limits of the i.MX8M Plus’s GPIO performance.
  • Further speed gains may not be achievable with GPIO due to hardware constraints.
  • If possible, consider using an interface designed for high throughput (such as SPI, QSPI, or parallel memory interfaces) instead of bit-banged GPIO.

For more details, see:

If you need further clarification on implementing the direct access or optimizing your use case, feel free to elaborate!

Best regards, Toradex AI Assistant.

Thanks for ther reply.

I have installed MCUExpresso IDE… Can I get any document or user manual to configure this IDE for bare metal gpio programming for Toradex Verdin IMX8MP Cortex M7 core.

Hi @tolvivek

You can refer to this thread’s discussion.
There seems to be some great discussion on this topic.


As previously mentioned, interfacing DPRAM via GPIO bit‑banging is inherently slow. To better assist you, could you please clarify your original goal? Why do you need to connect Dual‑Port RAM to the Verdin iMX8MP?

Understanding your end objective will help us propose a more efficient alternative—potentially involving a hardware interface like QSPI.