Board reverts to old devicetree after overlay

i am trying apply a device tree overlay onto my board (make GPIO7 unsed). i am following the process as described in the article here Device Tree Overlays on Torizon

i am able to successfully push the change directly to the board via its remote ip and when i reboot i can see the new devicetree overlay is applied and the gpio is unused.
but when i do a subsequent cold boot (remove the power connector) and power it up again i can see that the board has switched back to the original state. and this is the following output of the ostree admin status command. the commit 32f32f583afb0b1497a84fbc64b413ef6700d9d98aab0554731e074be84e63b1 is the one i pushed

apalis-imx8-06548593:~$ sudo ostree admin status
  torizon 32f32f583afb0b1497a84fbc64b413ef6700d9d98aab0554731e074be84e63b1.1 (pending)
    Version: 5.1.0+build.1-tcbuilder.20210212151201
    origin refspec: tcbuilder:32f32f583afb0b1497a84fbc64b413ef6700d9d98aab0554731e074be84e63b1
* torizon 27200a7a7bd3e501bf180e0f68955bf9db7bc205fb88641f0636d3ffedd7a53b.0
    Version: 5.1.0+build.1
    origin refspec: torizon:5/apalis-imx8/torizon/torizon-core-docker/release

Greetings @nkj,

I’m unable to reproduce this on my end. When I deploy an overlay it stays active no matter how I reboot or power cycle the device. Looking at your OSTree deployments it seems the commit you push is marked as “pending” which usually means it’ll be applied on next boot. Does this not happen?

Are you able to consistently reproduce this on your side?

Best Regards,

on the next reboot whether using sudo reboot -h now or directly unplugging the power cable it doesn’t get applied.
if i do the following command
torizoncore-builder deploy 32f32f583afb0b1497a84fbc64b413ef6700d9d98aab0554731e074be84e63b1 --remote-host --remote-user torizon --remote-password xxxx
than reboot the board the it gets applied. one thing to note the number at the end of the commit shown under sudo ostree admin status keeps increasing.upto no 4
i dont have a perfect way to reproduce it. i have noticed it now twice when the device has been shutdown for a couple of hrs (overnite) after wards its back to the original state. i have noticed this on two apalis imx8 boards one on ixora carrier and one on custom one. one module is 1.0B and another 1.1B.
Is there a command that needs to be run after the torizoncore-builder deploy command ? as that is the last command i am sending. maybe i am missing something

OK i think i found a way to reproduce it
in the serial console i noticed this Normal Boot Warning: Bootlimit (3) exceeded. Using altbootcmd
if i reboot the board 3 times it reverts back to the original state. can you try if this is reproducible on your side ?

I’m following the torizoncore-builder commands from the article same as you so that shouldn’t be an issue.

Regarding the Bootlimit message you’re seeing. This is a part of our rollback mechanism. This is a fail-safe for OTA updates in case a bad update was pushed. However this should only come into effect for OTA updates and not for torizoncore-builder induced updates. More details on this mechanism here:

Otherwise I’m unsure why your system is rolling back in any case. On my setup no matter how many times or how I reboot the device my overlay is still active and no rollback occurs.

Just to be sure can you share the overlay you’re applying and also provide all the commands you’re running with TorizonCore Builder from start to finish. I just want to make sure there’s nothing we’re missing here.

Best Regards,

attached is the overlay. its disables pciea and disables the reg_pcie_switch node in the device tree.I need to use GPIO7 and this is the simplest way i could think of. i dont need pciea

as for commands i am just going through that article. i created a new dts file in device-trees/overlays/ and then just apply it.

njha@G9YP09Y2E:~/custom-torizoncore$ torizoncore-builder dto apply device-trees/overlays/apalis-imx8_pcie_disable.dts   'apalis-imx8_pcie_disable.dts' compiles successfully.                                                                   /tmp/tmp_tnh3o5t: Device Tree Blob version 17, size=162829, boot CPU=0, string block size=9929, DT structure block size=152844                                                                                                                  'apalis-imx8_pcie_disable.dtbo' can successfully modify the device tree 'imx8qm-apalis-v1.1-eval.dtb'.                  Overlay apalis-imx8_pcie_disable.dtbo successfully applied.                                               

After some investigation I believe we have a bug here.

For some reason OSTree changes with TorizonCore builder are inducing our rollback mechanism. This isn’t intended behavior and it only seems to happen on our Apalis i.MX8 as on the Apalis i.MX6 this wasn’t happening which is why I didn’t see it earlier. Once I switched to an i.MX8 I was able to reproduce it easily as you said. Also this seems to happen with any overlay not just yours so don’t worry about that.

We’ll need some time to investigate the root cause and devise a fix. Thank you for reporting this.

For the time being you’ll probably need to use torizoncore-builder deploy to create an easy installer image instead of deploying the overlay over network as you have been doing. This method is slower but it doesn’t seem to activate the rollback mechanism.

Best Regards,

hi Jeremias,

good to hear that you are able to reproduce it on your end.
thanks for looking into this. for now i will use the usb drive method for flashing then.
the IP deployment is quite useful for testing and trials.