iMX8MPlus PCIE clock dropout

Hello everyone,

We need support with a problem on the iMX8MP.

System setup:
Custom Carrier Board with PI7C9X2G404EL PCIE 2.0 Packet Switch, to which a LAN7430 Gigabit MAC and LAN7431 Gigabit MAC are connected.

Error description:
Here we could observe that when using the iMX8MMini (Q 2GB WB IT V1.1D) both PCIE devices were correctly recognized on the PCIE bus.
However, if the iMX8MPlus (Q 8GB WB IT V1.1A) is used, only the LAN7430 is visible.

Error analysis:
A difference in the signals could be determined: The reference clock of the PCIE signal to the PI7C9X2G404EL has a dropout shortly after the start, the iMX8MMini does not show this behaviour.

Channel 1 shows the PCIE 1 ref clock, which connects to the pin “REFCLKI_P” of the PI7C9X2G404EL PCIE switch. As you can see, the clock starts correctly, but is then interrupted after about 410 ms for about 500 ns:

The software versions of the two modules are as follows:
-iMX8MMini: Linux version 5.4.193-5.7.2-devel+git.b60d3160fd04 (dunfell)
-iMX8MPlus: Linux version 6.6.54-7.1.0-devel-gbacd7657b9c2 (scarthgap)

What can cause this failure? Driver? Or a malfunction of the signal integrity on the board?

Best regards
Stefan

Here again the same measurement with a higher sampling rate:

Hi, I’m the Toradex AI Assistant.

I might make mistakes. Our support team will step in if needed. If I answered your question, please tick the “Solution” checkbox. For follow-up questions to me, please use @ToradexAI in your post.

Did you like my answer? :+1: :-1:


Hello Stefan,

Thank you for your detailed description and oscilloscope snapshot.

Based on your analysis and the setup, here are some points to consider regarding the PCIe reference clock dropout observed only with the Verdin iMX8M Plus:

1. Signal Integrity and Board Layout

  • PCIe interfaces are highly sensitive to signal integrity issues. Factors such as trace routing, reference clock line length, return path discontinuities, layer transitions (vias), and power stability are critical (1).
  • A 500ns dropout after 410ms may suggest momentary instability in the reference clock, potentially due to signal integrity issues or power fluctuations on the PCIe supply lines.
  • Compare your PCB layout against the PCIe high-speed layout guidelines: check for return path discontinuities, plane splits, and ensure any stitching capacitors are properly placed if the reference clock trace crosses reference planes (2).

2. SoC PCIe Reference Clock Generation

  • On the Verdin iMX8M Plus, the 100MHz PCIe reference clock is generated by the SoC and provided to the module edge connector (1).
  • The i.MX 8M Mini and Plus SoCs may have subtle differences in how the clock is managed or gated during initialization. The occurrence of a short clock dropout could be a result of a SoC-side reset, power management event, or software driver behavior.

3. Software Differences

  • Since you are using different BSP releases (dunfell for Mini, scarthgap for Plus), it’s possible that newer kernel or device tree settings in scarthgap affect PCIe behavior.
  • The driver for the PCIe controller or Device Tree configuration (such as PCIe node properties) could influence the timing and stability of the reference clock.

Suggested Next Steps:

  • Signal Integrity:
    Review the PCIe trace routing, especially the reference clock path, for the Verdin iMX8M Plus in your carrier board design, per the Toradex Layout Design Guide. Key points:

    • Avoid plane splits and ensure continuous ground return under the clock line
    • Minimize via transitions, or ensure adequate stitching vias
    • Verify the power supply to the PCIe components is free from noise and dropouts
  • Reference Clock Observation:
    Use the oscilloscope to capture any supply dips or noise coinciding with the reference clock dropout.

  • Software/Device Tree:

    • Compare the PCIe node in the Device Tree for both modules.
    • Ensure no relevant clock gating or power management features are enabled that might affect clock stability.
    • If possible, run the same (older or newer) BSP version on both Mini and Plus for comparison.
  • PCIe Switch Initialization:

    • The PI7C9X2G404EL switch might have an initialization or reset pin or require the reference clock to be stable for enumeration. Double-check the reset sequencing timing between the SoC, switch, and LAN chips.

Summary:
While a driver or software difference can contribute, the pattern and timing of the clock dropout also strongly suggest possible signal integrity or power integrity issues on the custom carrier board, especially as the Verdin iMX8M Plus reference clock is managed by the SoC. Comparison with a Toradex reference carrier board (if available) can help narrow down the source.

Please let me know if you would like further details on board layout recommendations or Device Tree configuration examples.

Best regards,
Toradex AI Assistant.

(1)
(2)

Another measurement with logic analyzer.
CH0: Reset to PI7C9X2G404EL (switch)
CH1: Reset to LAN7431 (MAC)
CH2: Ref clock to PI7C9X2G404EL (switch)
CH3: Ref clock to LAN7431 (MAC)

The dip in the clock signal is also visible. With this measurement setup, the PCIE switch is shown on the bus (“lspci”), but the LAN7431 is not (!). I suspect that this is due to the asymmetrical and capacitive load on the clock lines, to which the LAN7431 seems to be sensitive.

We found the piece of code in the kernel that causes the dropout:

drivers/pci/controller/dwc/pci-imx6.c

static int imx6_pcie_start_link(struct dw_pcie *pci)
{
  ...
		/*
		 * Start Directed Speed Change so the best possible
		 * speed both link partners support can be negotiated.
		 */
		tmp = dw_pcie_readl_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL);
		tmp |= PORT_LOGIC_SPEED_CHANGE;
		dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, tmp);
  ...  
}

When the 3 lines are removed, the drop disappears.

However, it does not fix the problem. Devices still not reliably detected.

Dear @uELKO and @klausci,

Thank you for your detailed description and analysis. We have so far not encountered this behaviour. Could you also try to reproduce the issue you are seeing on the iMX8MP on the iMX8MM? For that you would need to use the same kernel version (6.6.54). With that we can isolate, whether the issue is on the specific module or kernel driver.

Best regards,
Collin

Hi @collin.tx

We have tested on IMX8MM with same kernel version (6.6.54) and it works without any problems.

I want to add that the behavior on the IMX8MP is completely random. we have seen all combinations:

  • Both PCIe (7430 and 7431) Devices were detected
  • Only 7430 detected
  • Only 7431 detected
  • None detected

When the devices appear on PCIe, they operate normally (tested with iperf several minutes)

Dear @klausci,

Thank you for testing. I have raised this issue with our team and they will have a look into it.
I will get back to you as soon as I have further information.

Best regards,
Collin

Hello @klausci,
One difference between the iMX8MP and the iMX8MM is the fact that the iMX8MP supports PCIe gen3, while the iMX8MM only supports gen2.
Given that the clock drop you found doesn’t seem to be the root cause of this issue, could you try to limit the PCIe speed to gen2 on the iMX8MP and check if the problem is still reproducible?

You can do that by adding the following to your custom carrier board device tree:

&pcie {
         fsl,max-link-speed = <2>;
};

Best regards,
Rafael

Hello @rafael.tx,

Thanks for your reply. In the meantime, we had the chance to measure the PCIe signals on our board with high speed equipment. The quality of the signals is not very good, PCIe clock has cross-talk from data signals and data signals have bad eye diagrams. The result looks almost identical for IMX8MM and IMX8MP.

Questions:

  • Do you have PCIe signal measurements from your development boards - and can you share it with us? We want to be 100% sure that the problem is on the carrier board side and not on the SoM.
  • Can I limit the link speed also to 1 like this?
&pcie {
         fsl,max-link-speed = <1>;
};

Best Regards, Klaus

Hello @klausci

The result looks almost identical for IMX8MM and IMX8MP.

That’s interesting. Considering that the iMX8MM can only do PCIe gen2, it may have more headroom to work even with less signal quality.

Do you have PCIe signal measurements from your development boards - and can you share it with us? We want to be 100% sure that the problem is on the carrier board side and not on the SoM.

I’m checking if we executed some measurements. If we have anything we can share, I’ll post it here. I would like to point out that our SoM’s PCIe interfaces (both iMX8MM and iMX8MP) are being successfully used by several of our customers. Because of this, I think it’s unlikely that it’s the SoM that’s the main culprit here.

Can I limit the link speed also to 1 like this?

Yes, this should be possible.

Best regards,
Rafael

I just found that we conducted PCIe tests on the iMX8MP with the Mallow carrier board. All tests have passed without problems. I cannot share the report publicly, but I will send a copy to your e-mail.

Best regards,
Rafael