TorizonCore images unstable

Hi,

We have a Colibri IMX8 QuadXPlus 2GB Wi-Fi / BT computer on module with our own costum carrier board.
I can succesfully install with the toradex easy installer torizonCore and Toradex Embedded Linux images.
The Toradex Embbeded Linux images run stable, but the TorizonCore images not.
When I have a TorizonCore image installed on the module, then the module is unstable.
Wether I use ethernet or wi-fi, after some time(within a minute) the ssh connection is closed and is not possible to re-connect(Connection refused).
It’s not even possible to download a docker container for testing.
Only a reboot of the system makes it possible to connect again, but it always stays unstable.

Why is TorizonCore so unstable and what are possible solutions to solve it?

Thank you in advance for your help.

Best regards

Gerard

Greetings @skipper,

I wasn’t able to recreate these network issues on my end so I’m gonna need to ask a couple of questions to get more information.

  1. Do you see any other instabilities or is it mainly just network connectivity?
  2. You mentioned you’re using a custom carrier board. Do you have any Toradex carrier boards lying around? If possible can you do a quick check to see if these instabilities exist when using our carrier board? This is just to eliminate possible variables.
  3. Do you have a way to connect to the module via serial? If yes can you see if there’s any errors/logs thrown by the system when the network fails?

Meanwhile I’ll see if there’s any possible chance such an issue may occur. Though I do find it odd since Torizon is built on top of our normal Embedded Linux Image, so the network stack shouldn’t differ much if at all.

Best Regards,
Jeremias

Hello Jeremias,

Thank you for your response. I was able to run the module for some hours without problem.
As soon as I tried to pull a docker image it went wrong.
I was able to use the serial connection for errors and logs, I have added the screenshots from an another session.
We don’t have Toradex carriers boards in the house.

Best regards,

Gerard

[upload|/EV2yFnCLqThCZ0kF5QQVsr7o88=]

[upload|6r7v1SlbjZ+gOM0saiMPc6/DdCg=]

Those are troubling looking messages and not something I’ve seen before on Torizon. We didn’t see any such instabilities or issues before releasing the November monthly build. Let me bring this up to the team for investigation since I’m unable to reproduce any of this on my own setup.

Quick question though are you running the Torizon image with containers pre-provisioned? If yes then there should be containers running by default, can you try stopping all containers and see if that improves stability at all?

Best Regards,
Jeremias

Hi Jeremias,
I have tried out both versions, with and without pre-provisioned docker containers Torizon images and it happens to both of them(2020-11-06 versions). I also tried another module, but the same problem occurs on that module too.

I have attached serial log with a module with TorizonCore for V1.0B HW, 5.1.0-devel-202011+build4(2020-11-06)

Best regards,

Gerard

link text

Hi Jeremias,

Thanks for the update. I have checked the fuses of the module and it were all 0’s.
After updating the fuses with your command and applying power-cycle the module is stable with the Torizon Image.
So that was the issue.
I have some questions about it:

  • I saw when applying your command for changing the fuses the operation is irreversible, does that mean you can only change them once? And what happens if the value is wrong?
  • Is the fuse value 00983e91 for all imx8x module versions the same, for example for the v1.0B and v1.0C modules?
  • Are there known instabilities for the Embedded Linux versions when the fuses are not set?
  • Are the fuses for new modules we buy already set? Or is it wise always to check them?

Thank you for your help.

Best regards

Gerard

[upload|xguURWv+hOkBiNa6wBEaqKUmsog=]

[upload|7lmQEzBDYtLI5Ll9/t7WAuHX1rM=]

[upload|jMM9vL1YkZyJMhjxR5LKAQV52Fo=]

I’m glad setting the fuse values was able to resolve the problem for you!

I saw when applying your command for
changing the fuses the operation is
irreversible, does that mean you can
only change them once? And what
happens if the value is wrong?

Yes generally the fuses can only be set once, these are actual hardware fuses you are setting so they’re only meant to be used once. If you set a wrong value, well it depends on what fuse you set and what value you accidentally set it to. Worst case scenario you end up bricking the entire system.

Is the fuse value 00983e91 for all
imx8x module versions the same, for
example for the v1.0B and v1.0C
modules?

In general the fuse values are SoC specific even changing between different silicon revisions of the same SoC. It’s very highly recommended you check with someone before you go and blow such fuses on your own, unless you’re very confident. As a side note these values don’t need to be set on the 1.0C. They’re only needed on older 1.0Bs since these modules use older silicon and therefore there fuse values need to be set such that they’re compatible with our more recent software.

Are there known instabilities for the
Embedded Linux versions when the fuses
are not set?

So there can be, but as you’ve seen it’s a lot more noticeable with Torizon. We’re honestly not too sure ourselves why it’s so unstable on Torizon without proper fusing. But in either case the fuses should be set on the 1.0B for the smoothest experience.

Are the fuses for new modules we buy
already set? Or is it wise always to
check them?

As I said previously for newer Colibri i.MX8Xs like the 1.0C and 1.0D you don’t need t concern yourself with this. This is only an issue with early 1.0Bs. In almost every other case any fuses needed for normal operation will be set by us before they get shipped to customers such as you.

Best Regards,
Jeremias

I spoke with the team internally and I think I know what’s going on.

To be compatible with our more recent software releases it was the case that the i.MX8X 1.0B had certain fuses set that affect RAM timings. However on earlier versions of this hardware it might be the case that these fuses weren’t set at all. In such cases there can be quite a number of instabilities, instabilities that are more prevalent on Torizon. Given that it sounds like what you are experiencing with your modules.

First of all just to confirm, interrupt the boot process with any key press so you have access to the U-Boot console prompt. Once that is done run fuse read 0 765. If this command returns all 0s then it’s definitely the case where the proper fusing isn’t present on this module.

Luckily you can set the fusing yourself from U-Boot. If your fusing is absent then run the following command: fuse prog 0 765 00983e91

This will set the proper fuse values. After setting the values check to see if the instabilities continue or improve.

Best Regards,
Jeremias

Thanks Jeremias,
the fuse fix you suggested helped in our case as well. We used v1.0B modules with Yocto and BSP3 without any issues until decided to try BSP5. It was very confusing to see Linux crashes in kernel panic and corrupt file system while copy from SD card, compile large project or simply running service with debug traces enabled.
Although the question remains:
in reference manual (preliminary revision E) this fuse map location (765) is marked as part of reserved memory. The newest reference manual (Rev. 0, 05/2020) states that this memory is customer fuses (which should not effect any Linux critical settings?)

765 - 766 Customer fuses
(Customer_OTP_word0 and Customer_OTP_word1)Programmed by the customer for their own purposes via the SCU/SECO firmware.There are two words provided (0x26E0 and 0x26F0), each of which must be written as a single operation.

I assume that the reference manual has mistake as far the v1.0C chip has (slightly different from your suggested) value 00983eb3 preprogrammed at this address and BSP5 works fine.

 Word 0x000002f8: 00000000 00000000 00000000 00000000
 Word 0x000002fc: 00000000 00983eb3 00000000 00000000
 Word 0x00000300: 00000000 0000f600 00000000 00000000
 Word 0x00000304: 00000000 00000000 00000000 00000000
 Word 0x00000308: 00000000 00000000 00000000 00000000
 Word 0x0000030c: 00000000 80000000 dad6b59e 00000000
 Word 0x00000310: badabada 00000000 badabada badabada
 Word 0x00000314: b9a154e3 badabada 00000000 00000000
 Word 0x00000318: 00000000 00000000 00000000 00000000
 Word 0x0000031c: 00000000 00000000 00000000 00000000

Cheers,
Michael

I personally don’t know too much of the subject or how these particular values were derived but I can say the following. We set the fuse here to have a particular RAM ID that can be recognized by the SCFW used by the 8X during the early stages of booting. The issue being that without this we couldn’t easily differentiate between the various 8X silicons which use different SCFW.

As you saw without this the system can still boot but it’s highly unpredictable and unstable. Apologies I can’t provide more exact information than this, though I can inquire our R&D group if you desire about this.

Best Regards,
Jeremias