i.MX7D: GPTA and FreeRTOS on M4 freeze also linux bootin

Dear All,
it’s been two days now that I’m working on this and I don’t know how to solve the problems I’m facing.
I’m developing on the iMX7D colibri SoM and the Viola Plus carrier board.

The aim would be to leave the GPIO and timers modules to the M4, along with PWM, UART, SPI and i2c.

For this reason I’ve modified the device tree and it seems that linux, at least, is working as expected.
I’ve also played with the GPIO, that worked as expected too.

Now it was the time of the GPT. I’ve followed the blinking demo app, adapting it to my needs since I need a timer of at least 1us.

I’ve tried using both the ‘gptClockSourceOsc’ and the ‘gptClockSourcePeriph’.
In either cases, if I use a little or equal to 0 prescaler, Linux is not able to boot, it stucks at ‘Starting kernel…’.

If I use the oscillator with a prescaler of 1 (that is the clock is divided by 2) and then I use a fixed (for testing purpose) output compare value of 12, I obtain a delay of 10us, instead of 1us as expected.

24000000 / 2 = 12000000

12000000 / 12 = 1000000

That means that the period should be 1us, not 10us (measured with a xscope, since after the interval a pin is toggled).

Then I’ve tried to use the peripheral clock, using the ‘ccmRootmuxGptEnetPll40m’.
In this case, all I’ve obtained is the freeze of the linux counterpart at ‘starting kernel…’.

Here there is the hardware_init.c part related to the timer and clock init when I tried to use the enet clock:

void hardware_init(void)
{
    /* Board specific RDC settings */
    BOARD_RdcInit();

    /* Board specific clock settings */
    BOARD_ClockInit();

    /* initialize debug uart */
    dbg_uart_init();

    RDC_SetPdapAccess(RDC, BOARD_GPTA_RDC_PDAP, 3 << (BOARD_DOMAIN_ID * 2), false, false);

    CCM_ControlGate(CCM, ccmPllGateEnet, ccmClockNeededRunWait);
    CCM_ControlGate(CCM, ccmPllGateEnet40m, ccmClockNeededRunWait);

    //CCM_UpdateRoot(CCM, BOARD_GPTA_CCM_ROOT, ccmRootmuxGptOsc24m, 0, 0);
    CCM_UpdateRoot(CCM, BOARD_GPTA_CCM_ROOT, ccmRootmuxGptEnetPll40m, 0, 0);
    CCM_EnableRoot(CCM, BOARD_GPTA_CCM_ROOT);
    CCM_ControlGate(CCM, BOARD_GPTA_CCM_CCGR, ccmClockNeededRunWait);

    //CCM_ControlGate(CCM, ccmCcgrGateSema1, ccmClockNeededRun);

}

and here there is the gpt_timer.c, again with the enet clock:

static SemaphoreHandle_t xSemaphore;

void Hw_Timer_Init(void)
{
    gpt_init_config_t config = {
        .freeRun    = false,
        .waitEnable = true,
        .stopEnable = true,
        .dozeEnable = true,
        .dbgEnable  = false,
        .enableMode = true
    };

    /* Initialize GPT module */
    GPT_Init(BOARD_GPTA_BASEADDR, &config);

    /* Set GPT clock source to 24M OSC */
    //GPT_SetClockSource(BOARD_GPTA_BASEADDR, gptClockSourceOsc);
    GPT_SetClockSource(BOARD_GPTA_BASEADDR, gptClockSourcePeriph);


    /* Set GPT interrupt priority 3 */
    NVIC_SetPriority(BOARD_GPTA_IRQ_NUM, 3);

    /* Enable NVIC interrupt */
    NVIC_EnableIRQ(BOARD_GPTA_IRQ_NUM);

    xSemaphore = xSemaphoreCreateBinary();
}

void Hw_Timer_Delay(uint32_t us)
{
	uint32_t ms_oc;
    //GPT_SetOscPrescaler(BOARD_GPTA_BASEADDR, 1);
    GPT_SetPrescaler(BOARD_GPTA_BASEADDR, 1);
    GPT_SetOutputCompareValue(BOARD_GPTA_BASEADDR, gptOutputCompareChannel1, 19);
    GPT_SetIntCmd(BOARD_GPTA_BASEADDR, gptStatusFlagOutputCompare1, true);
    GPT_Enable(BOARD_GPTA_BASEADDR);
    xSemaphoreTake(xSemaphore, portMAX_DELAY);
}

void BOARD_GPTA_HANDLER(void)
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;

    /* When GPT time-out, we disable GPT to make sure this is a one-shot event. */
    GPT_Disable(BOARD_GPTA_BASEADDR);
    GPT_SetIntCmd(BOARD_GPTA_BASEADDR, gptStatusFlagOutputCompare1, false);
    GPT_ClearStatusFlag(BOARD_GPTA_BASEADDR, gptStatusFlagOutputCompare1);


    /* Unlock the task to process the event. */
    xSemaphoreGiveFromISR(xSemaphore, &xHigherPriorityTaskWoken);

    /* Perform a context switch to wake the higher priority task. */
    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}

inside the main I have this, to test the timer:

while(1)
    {
    	Hw_Timer_Delay(1);
    	if (currentPinsValue == gpioPinClear) currentPinsValue = gpioPinSet;
    	else currentPinsValue = gpioPinClear;
    	GPIO_WritePinOutput(config_array[0]->base, config_array[0]->pin, currentPinsValue);

    }

The device tree file instead is:

&lcdif {
        status = "disabled";
};

&pwm1 {
        status = "disabled";
};
&pwm2 {
        status = "disabled";
};
&pwm3 {
        status = "disabled";
};
&pwm4 {
        status = "disabled";
};

&bl {
        status = "disabled";
};

&ecspi3 {
        status = "disabled";
};

&i2c4 {
        status = "disabled";
};

&sai1 {
        status = "disabled";
};

&uart2 {
        status = "disabled";
};
&uart3 {
        status = "disabled";
};
&uart4 {
        status = "disabled";
};
&uart5 {
        status = "disabled";
};
&uart6 {
        status = "disabled";
};
&uart7 {
        status = "disabled";
};

&uart1 {
        pinctrl-names = "default";
        pinctrl-0 = <&pinctrl_uart1>;
        assigned-clocks = <&clks IMX7D_UART1_ROOT_SRC>;
        assigned-clock-parents = <&clks IMX7D_OSC_24M_CLK>;
        //uart-has-rtscts;
        fsl,dte-mode;
};

&pinctrl_uart1 {
        fsl,pins = <
                MX7D_PAD_UART1_TX_DATA__UART1_DTE_RX    0x79
                MX7D_PAD_UART1_RX_DATA__UART1_DTE_TX    0x79
        >;
};
/ {
        gpio-keys {
                status = "disabled";
        };

};

I don’t really know what I’m not understanding.

Any help will be appreciated, since I’m in an early stage of development of a new product and I would like to start on the right foot.

Thanks and regards,
Alessandro

To debug this kind of low level Linux issue it is often helpful to enable earlyprintk. Either by using make menuconfig, and set under Kernel Hacking enable Kernel low-level debugging functions and Early printk, or the following configuration options:

CONFIG_DEBUG_LL=y
CONFIG_DEBUG_IMX7D_UART=y
CONFIG_DEBUG_IMX_UART_PORT=1
CONFIG_DEBUG_LL_INCLUDE="debug/imx.S"
CONFIG_EARLY_PRINTK=y

Then add the earlyprintk kernel boot argument (e.g. by adding it to U-Boots defargs environment variable)

The first GPT instance, GPTA, is also used by the kernel: In the base device tree arch/arm/boot/dts/imx7s.dtsi the gpt1 node is enabled (not disabled). In the Colibri iMX7 case the kernel has other timers enabled (such as the ARM architected timer), hence you can suavely disable the GPT timer, e.g. by adding the following to your device tree:

EDIT: Just for future references, disabling GPT1 freezes Linux boot. Check below comments for more information.

&gpt1 {
    status = "disabled";
};

Thank you for your prompt reply.

Actually, looking at the defines of the FreeRTOS I’m using (the one cloned from toradex git repository), GPTA refers to GPT3:

/* GPT instance A information for this board */
#define BOARD_GPTA_RDC_PDAP                   rdcPdapGpt3
#define BOARD_GPTA_CCM_ROOT                   ccmRootGpt3
#define BOARD_GPTA_CCM_CCGR                   ccmCcgrGateGpt3
#define BOARD_GPTA_BASEADDR                   GPT3
#define BOARD_GPTA_IRQ_NUM                    GPT3_IRQn
#define BOARD_GPTA_HANDLER                    GPT3_Handler
/* GPT instance B information for this board */
#define BOARD_GPTB_RDC_PDAP                   rdcPdapGpt4
#define BOARD_GPTB_CCM_ROOT                   ccmRootGpt4
#define BOARD_GPTB_CCM_CCGR                   ccmCcgrGateGpt4
#define BOARD_GPTB_BASEADDR                   GPT4
#define BOARD_GPTB_IRQ_NUM                    GPT4_IRQn
#define BOARD_GPTB_HANDLER                    GPT4_Handler

In addition, I’ve already tried to disable GPT1 in device tree in order to use it on the M4, because I’ve read a post saying what you stated, that the mcx1 timer is initialized but not used.

When I tried, the linux kernel stopper before initializing the SDHC driver. I had to re-enable GPT1 to make linux booting again, but I think we are going OT, so I will start another thread for this.

What about instead the resolution of the timer? Why, setting it up for having 1us, I obtain a 10us period, using the internal oscillator? Am I missing something?

Thanks for the early printk suggestion, I’m going to implement it.

In addition to the comment above, I’ve added the early printk and this is when the kernel stops to work:

Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.1.44eltra-00001-ga4da795-dirty (ale@YoctoHost) (gcc version 7.1.1 20170707 (Linaro GCC 7.1-2017.08) ) #1 SMP Thu Nov 9 13:52:20 CET 2017
[    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] Machine model: Toradex Colibri iMX7D on Eltra Prototype Carrier Board V0 - 20171107-1817
[    0.000000] bootconsole [earlycon0] enabled
[    0.000000] Reserved memory: created CMA memory pool at 0x98000000, size 128 MiB
[    0.000000] Reserved memory: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] PERCPU: Embedded 12 pages/cpu @97b7f000 s17356 r8192 d23604 u49152
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129536
[    0.000000] Kernel command line: clk_ignore_unused earlyprintk=serial,ttymxc0,115200 initcall_debug ubi.mtd=ubi root=ubi0:rootfs rootfstype=ubifs ubi.fm_autoconvert=1 console=tty1 console=ttymxc0,115200n8 consoleblank=0
[    0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Memory: 376440K/522240K available (6439K kernel code, 295K rwdata, 2156K rodata, 312K init, 449K bss, 14728K reserved, 131072K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0xa0800000 - 0xff000000   (1512 MB)
[    0.000000]     lowmem  : 0x80000000 - 0xa0000000   ( 512 MB)
[    0.000000]     modules : 0x7f000000 - 0x80000000   (  16 MB)
[    0.000000]       .text : 0x80008000 - 0x8086d028   (8597 kB)
[    0.000000]       .init : 0x8086e000 - 0x808bc000   ( 312 kB)
[    0.000000]       .data : 0x808bc000 - 0x80905f00   ( 296 kB)
[    0.000000]        .bss : 0x80908000 - 0x80978434   ( 450 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS:16 nr_irqs:16 16

Thanks for any hint in this, since I’m lost given that, by looking at the docs, it should work.

Just to sum things up, I would like to have clarifications about discrepancies I’ve found between documentation and obtained behaviours:

  • even if I set the GPTA timer as mentioned by i.MX7 reference manual, FreeRTOS manual, Toradex community, I’m not able to obtain a 1us timeout period. Is it possibile to obtain such a granularity?
  • you stated that GPTA refers to GPT1, when, instead, it seems that your FreeRTOS links GPTA to GPT3. Have you any comment on this? Am I missing something?
  • you stated that one can simply disable GPT1, while it seems that it is not the case, since with standard toradex kernel with default toradex configuration, disabling the GPT1 leads to linux freeze during boot
  • is there any clue within the linux boot log that could help in understanding why it stops when using the freertos code above?

A reply to these questions would be very appreciated, since we need to decide how to bring on our development of next gen solutions.

Regards,

ale

  • How do you measure the timer granularity? In the end it really depends on the clock you configure. I would recommend to configure the 24MHz clock (freertos-toradex.git - FreeRTOS for the Cortex M4 core of Heterogeneous Multicore modules). With that a resolution of 1us should be no problem.
  • GPTA: That was a wrong assumption on my side, sorry about that. Yes, GPTA is GPT3, so there should be no need to disable the GPT1 timer in Linux.
  • Just tested in myself, GPT1 is not used as clocksource, but somehow as an event source. I am not sure why the NXP 4.1 kernel still uses that, but it freezes for me too. I tried to figure out why that is the case, but it seems not trivial. Anyway, since FreeRTOS uses another timer please leave GPT1 on.
  • There is not really a clue. Usually it helps to compare with a working boot to check what should come next, that might be where the code hangs. Often when clocks are missing the kernel just hangs.
  • maybe I misused the word ‘granularity’. Actually I mean that I would like to be able to obtain a timer with a, at least, 1us period, in order to implement a sleep or delay routine. When I use the 24MHz oscillator, with the proper setting, I can only obtain an unrealiable clock of 10us. If you have the time, could you please try to develop a 1us period timer?
    In addition, I will use the timer also for driving up to 6 stepper motors, so precision is mandatory. Could you please help me in having a 1us timer?

  • noted

  • actually I think I will need to have also GPT1 on M4, since I need as much as input capture/output compare as possibile. I will try to understand the cause. It seems that the kernel stops here:

    [ 1.662580] Registering SWP/SWPB emulation handler
    [ 1.668158] registered taskstats version 1

and on a regular boot the next messages are:

[    2.011368] usb 1-1: new full-speed USB device number 2 using ci_hdrc
[    2.023221] ci_hdrc ci_hdrc.1: EHCI Host Controller
[    2.029564] ci_hdrc ci_hdrc.1: new USB bus registered, assigned bus number 2
[    2.161357] ci_hdrc ci_hdrc.1: USB 2.0 started, EHCI 1.00

.

  • when the kernel stops with an high frequency timer (as in the example of my initial question), it stops at:

    [ 0.000000] RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
    [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
    [ 0.000000] NR_IRQS:16 nr_irqs:16 16

while a regular boot continues with:

[    0.000000]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] Architected cp15 timer(s) running at 8.00MHz (phys).
[    0.000000] clocksource arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 440795202120 ns
[    0.000006] sched_clock: 56 bits at 8MHz, resolution 125ns, wraps every 2199023255500ns
[    0.008077] Switching to timer-based delay loop, resolution 125ns

Anyway, leaving aside the last two points, it would be really appreciated if you can help me with the first one. Having a 1us timer is very important, and I think also that it seems to be being also a very basic feature to have.

Thanks

As far as I can tell, configuring the Prescale Register GPTx_PR with 0x2007 (PRESCALER24M=0x2, 24MHz/3=8MHz, and PRESCALER=0x7, 8MHz/8=1MHz). With that, the counter should count in 1us steps. When now using 1 in the compare registers it should give you an interrupt every 1us. Assuming the M4 is clocked at 240MHz, it should be able to handle that rather quickly.

I don’t really understand.
This is the hardware_init():

void hardware_init(void)
{
    /* Board specific RDC settings */
    BOARD_RdcInit();
    /* Board specific clock settings */
    BOARD_ClockInit();
    /* initialize debug uart */
    dbg_uart_init();
    RDC_SetPdapAccess(RDC, BOARD_GPTA_RDC_PDAP, 3 << (BOARD_DOMAIN_ID * 2), false, false);
    RDC_SetPdapAccess(RDC, rdcPdapSemaphore1, 0xff, false, false);
    CCM_UpdateRoot(CCM, BOARD_GPTA_CCM_ROOT, ccmRootmuxGptOsc24m, 0, 0);
    CCM_EnableRoot(CCM, BOARD_GPTA_CCM_ROOT);
    CCM_ControlGate(CCM, BOARD_GPTA_CCM_CCGR, ccmClockNeededRunWait);
    CCM_ControlGate(CCM, ccmCcgrGateSema1, ccmClockNeededRun);
}

this is the Hw_Timer_Init():

void Hw_Timer_Init(void)
{
    gpt_init_config_t config = {
        .freeRun    = false,
        .waitEnable = true,
        .stopEnable = true,
        .dozeEnable = true,
        .dbgEnable  = false,
        .enableMode = true
    };
    /* Initialize GPT module */
    GPT_Init(BOARD_GPTA_BASEADDR, &config);
    /* Set GPT clock source to 24M OSC */
    GPT_SetClockSource(BOARD_GPTA_BASEADDR, gptClockSourceOsc);
    /* Set GPT interrupt priority 3 */
    NVIC_SetPriority(BOARD_GPTA_IRQ_NUM, 3);
    /* Enable NVIC interrupt */
    NVIC_EnableIRQ(BOARD_GPTA_IRQ_NUM);
    xSemaphore = xSemaphoreCreateBinary();
}

this is the HW_Start_Timer() used to start the interrupts:

void Hw_Start_Timer(uint32_t us)
{
	uint32_t ms_oc;
    GPT_SetOscPrescaler(BOARD_GPTA_BASEADDR, 2);
    GPT_SetPrescaler(BOARD_GPTA_BASEADDR, 7);
    GPT_SetOutputCompareValue(BOARD_GPTA_BASEADDR, gptOutputCompareChannel1, 1);
    GPT_SetIntCmd(BOARD_GPTA_BASEADDR, gptStatusFlagOutputCompare1, true);
    GPT_Enable(BOARD_GPTA_BASEADDR);

    cpin = gpioPinClear;

}

This is the interrupt routine:

void BOARD_GPTA_HANDLER(void)
{
	GPT_ClearStatusFlag(BOARD_GPTA_BASEADDR, gptStatusFlagOutputCompare1);
    GPT_SetOutputCompareValue(BOARD_GPTA_BASEADDR, gptOutputCompareChannel1, 1);

    GPIO_WritePinOutput(SODIMM_135->base, SODIMM_135->pin, cpin);
    cpin = 1 - cpin;
}

The main() is:

int main(void)
{
    // Initialize demo application pins setting and clock setting.

    hardware_init();
    Hw_Timer_Init();

    // Create a demo task which will print Hello world and echo user's input.
    xTaskCreate(HelloTask, "Print Task", configMINIMAL_STACK_SIZE,
                NULL, tskIDLE_PRIORITY+1, NULL);

    // Start FreeRTOS scheduler.
    vTaskStartScheduler();

    // Should never reach this point.
    while (true);
}

and the task is:

void HelloTask(void *pvParameters)
{
   Hw_Start_Timer(1);
    while(1);
}

With your configuration (0x2007) the isr is called every 3us.

With a configuration of 0x2001 the isr is called every 1.3us.

Actually the questions are:

  • if I need a 1us time delay, how can I obtain it?
  • is the timer the right way?
  • and what if I needed a 1ns time delay?

Unfortunately I am unable to do testing on my own at the moment, I would need more time for that. However, some hints which might help:

I still assume that 0x2007 should trigger 1us interrupts. I would expect that a CPU clocked at 240MHz should be able to handle that kind of interrupt load. Although, it gets in the interesting realm: you have just 240 clock cycles to handle the interrupt… It should be enough, but certainly does not allow for an excessive interrupt handler. As for a 1ns time delay: It is simple not possible with a CPU clocked at 240MHz… A simple nop instruction will delay your execution by ~4ns, assuming it takes 1 clock cycle (with pipelines etc. that calculation might not be true in practice).

Whether a timer is the right way depends on your exact requirements… If somehow possible, I would also consider using hardware capabilities as much as possible. The FlexTimer block is also very capable and might be useful too. Our demo robot TAQ uses the PWM block to generate square wave pulses.

As far as I understand you toggle the GPIO at SODIMM 135 which should lead to a 1us pulse width. I assume that your measurements (3us/1.3us) represent the pulse width measured with an oscilloscope. There are various things you can check:

  1. Setting the output compare value in the interrupt handler might be not a good idea. The interrupt handler is already delayed by some time, and resetting the value might interfere with the next compare operation. With GPTx_CR FRR cleared the timer should restart on its own.
  2. The GPIO block might delay your signal. I would assume that the GPIO block is clocked at 66MHz, so it really should not. However, to eliminate that possibility I would try using the GPT output pins. Unfortunately GPT3’s output pins are used on the module itself. However, GPT4’s compare pins are available (e.g. gpt4.COMPARE1SODIMM 31, gpt4.COMPARE2SODIMM100 or gpt4.COMPARE3SODIMM102). Setting GPTx_CR OM1/2/3 to b001 should toggle the pin on every compare event. With that you can verify that the timer is running at the expected clock/using the expected dividers and rule out any software/interrupt latency issue.
  3. CPU might be clocked too slow. If I understand the CCM block right the M4’s default clock source ( ARM_M4_CLK_ROOT) is b000, which is 24MHz… I would assume that even that should be fast enough, but you might want to try clocking it higher. demo_apps/low_power_imx7d/common/lpm_mcore.c seems to provide a mean to clock the CPU higher (by calling CCM_SetRootMux(CCM, ccmRootM4, ccmRootmuxM4SysPllDiv2);)

Actually, after some tests, I’ve seen that with the code above I have a ~2us of overhead.

In fact if I set the counter to have an interrupt at 1kHz, I obtain a 1.002ms pulse width.

If I set to have 10kHz, I obtain a 102us pulse width, and so on.

Your assumption about how I measure the pulse are right. And, of course, you’re also right when speaking about the 1ns delay, I was not thinking about the clock of the M4 being 240MHz.

I’m looking at the demo robot TAQ, and I find it very interesting.

Now that I know that my code is formally correct, I’m going to see if I can obtain a better performance by following your suggestions and maybe playing with FreeRTOS configurations.

I think that I can consider this thread closed and answered, thanks for your time and patience.