Changing sizes in the linker for the TCM

We managed to run out of space in the TCM for program code. I remember reading that OCRAM is about 10 times slower than the TCM, so I am planning to change the MCIMX7D_M4_tcm.ld to use decrease m_data and increase m_text.

Will that cause any issue I am not foreseeing? Or should work as long as I don’t also run out of data space?

[build] c:/program files (x86)/gnu arm embedded toolchain/9 2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/bin/ld.exe: demo_m4.elf section `.ARM' will not fit in region `m_text'
[build] c:/program files (x86)/gnu arm embedded toolchain/9 2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/bin/ld.exe: section .data LMA [20000000,200006d3] overlaps section .ARM LMA [1ffffffc,20000003]
[build] c:/program files (x86)/gnu arm embedded toolchain/9 2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/bin/ld.exe: section .init_array LMA [20000004,20000007] overlaps section .data LMA [20000000,200006d3]
[build] c:/program files (x86)/gnu arm embedded toolchain/9 2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/bin/ld.exe: region `m_text' overflowed by 12 bytes

The Cortex-M4 processor has a modified 32-bit Harvard bus architecture. Using a 32-bit
address space, low-order addresses (0x0000_0000 through 0x1FFF_FFFF) use the
Processor Code (PC) bus, and high-order addresses (0x2000_0000 through
0xFFFF_FFFF) use the Processor System (PS) bus. Processor instructions (code) accessible on the PC bus and data accessible on the PS bus.

Tightly coupled memory is split into two arrays 32KB each - SRAM_L and SRAM_U.

• SRAM_L — Accessible by the code bus of the Cortex-M4 core

• SRAM_U — Accessible by the system bus of the Cortex-M4 core

Processor Code accesses are routed to the SRAM_L if they are mapped to that space. All
other PC accesses are routed to the Code Cache Memory Controller

It means that SRAM_U cannot be used for fast code access. So the only way to remain in TCM is a code optimization.

From other hand M4 core has 16KB of I-cache and 16KB of D-cache. So if critical repeatable part(s) of your app fits in I-cache performance shouldn’t be affected significantly when your code fetched from an OCRAM.

I modified the code as below and it seems to work. Is there a hidden bug in code or some of the program is being sent to the OCRAM somehow?

Furthermore, if I allocate part of it in the OCRAM should I change something in the linux/device tree?

MEMORY
{
  m_interrupts  (RX)  : ORIGIN = OCRAM_S_code,  LENGTH = 0x00000240
  m_text        (RX)  : ORIGIN = TCML_code,     LENGTH = 0x00008000 + 0x1000
  m_data        (RW)  : ORIGIN = TCMU_system + 0x1000,   LENGTH = 0x00008000 - 0x1000
}

U_Boot/Linux use some part of OCRAM. Please check this topic. So if it’s better to use OCRAM_EPDC as it done at MCIMX7D_M4_ocram.ld.