When can we get stable re-producable yocto builds?

My project has to go down the FDA regulated path. This means we can’t build on sand. Every step has to not only be documented, it has to be 100% reproducible.

So, months ago I created the instructions for successfully building a custom Yocto followed by using TorizonCore Builder to load docker images. Ran it a few times in a VM and released the resulting OS to the team to use.

Now, we are getting ready to start down the FDA approval path. I spun up a copy of the VM, gave it some additional CPU and RAM resources, kicked off a build. Ran smack dab into this.

Read through, found this:

So,

  1. I apply all updates to Ubuntu 20.04 since Git obviously has some changes, not cool.

  2. I bounce the VM and log in.

  3. Get the latest “fixes”

cd ~/yocto_work/
source yocto-setup
repo sync
nuke-yocto-build
bitbake board-verification-image

I realize many here are working on IoT and other things that really don’t matter so big guffaw when builds break, but this has to remain stable for months. Right now it is at Epic Failure state. The FDA definitely doesn’t approve you (not cheap.) They can also choose to never allow you to submit the device again when the failure is this fundamental.

A work around for “this time” is needed. What I’m also trying to stress here is there can never be a next time.

I’m not trying to sound like a jerk. It can take 6+ months between submission and the person(s) tasked with validating environment creation actually testing the process. The process cannot fail. It cannot have an update. It cannot have an Errata amendment. It must work as documented. Many years from now, during another pandemic, it must still work so the federal government can task General Motors with building yet another medical device in unprecedented quantities.

I need a yocto based TorizonCore build path that isn’t going to bust on a whim (like a Git change) or someone checking something in.

Two months ago this was clean. We could (and did) run it 50 times in a week. We parked it to look at again just in front of initial filing. Now it doesn’t work.

I don’t see how this is related to Toradex. Open source stuff updates all the time. If you need stable environment, fix versions and update never or only in a controlled fashion.

@tuomas86 as Ford found out with Firestone tires, once the Blue Oval went on the grill, they were responsible.

https://www.townfairtire.com/blog/what-you-dont-know-about-the-infamous-ford-firestone-controversy.html

We are building TorizonCore using the repo tool per the documentation. We can’t get a lasting stable build.

This is a medical device. It isn’t some IoT thing where failure is regularly laughed at. There is no AGILE here, just real Software Engineering.

CI/CD != Software Engineering
CI/CD == Continuous Catastrophe

The primary issue with the process as documented both in writing and video from Toradex is there simply is no method of “locking it down.” There is no bitbake option --local-only or --no-git where we could have a localized captive version.

There is also no good way to clean.

Everybody ends up having to create a script much like the one in the image because bitbake can get jacked up quite easily. This appears to force some new pull activity though.

The medical device world needs a stable well documented TorizonCore build process that works today, tomorrow, and 8+ years from now as-is. Not something we have to constantly redo from scratch.

It doesn’t matter what Toradex based their stuff on or what tools the company chose to employ. The product is TorizonCore. The Blue Oval is on the grill. The SOMs are being sold into the FDA regulated world. In that world one must document “How I Built This” in a perfectly reproducible manner. In general they don’t let you have a box in the middle of the diagram that says

Then a miracle happens

Right now we have a base image being used for layered on software that is no longer reproducible. We have a box in the middle of the diagram that reads

Then a miracle happened

Base image appeared.

That’s not going to play well with the regulators.

Two months ago we had a perfectly documented test ~ 50 times in different VMs process that consistently reproduced the base image. Our process diagram was complete.

Well the tools are all there, you’re just using them wrong.

In repo tool, you fix all the layers to a specific commit hash in your manifest. That way it always pulls the exact same meta layers from the exact same commit in history, from the repositories you want.

Then the recipes in the meta layers need to have their SRCREV fixed to a specific commit hash. That way you will always get exactly the same result. All the packages can be literally your own source tarballs in your local drive if you want.

If you use some wildcards or AUTOREV in recipes or use revision=“master” in repo manifest instead of commit hash, they will not stay fixed. Maintaining the host system VM is obviously your own responsibility.

BSP is only a starting point. You’re responsible for fixing versions you want to use.

1 Like

Greetings @seasoned_geek,

The issue is how you’re using the repo utility here. By calling repo sync you’re essentially updating your entire Yocto setup with the latest changes from the default Toradex Yocto setup.

If you want something reproducible you need to use repo init to lock onto a specific revision. This way you stay on a static non-changing version of the build. For example something like:
repo init -u https://git.toradex.com/toradex-manifest.git -b refs/tags/5.6.0 -m torizoncore/default.xml

You can then even copy this specific manifest and store it in your own version control to be extra safe.

For a list of versions and tags you can check the repo for our manifests here: toradex-manifest.git - Repo manifest for Toradex Embedded Linux TorizonCore and BSP layer setup for Yocto Project/Openembedded

Best Regards,
Jeremias

Technically, you’re always going to be building on sand if you are pulling source code from far-away repositories. What happens if sourceforge goes away or github is down for a week? Or a pissed-off developer spikes a library with malware on purpose? (Go read the story of node-ipc)

If you want a stable, reproducible build that’s guaranteed, fork every single project into a private repository that’s under your control. Then there’s no question, right?

1 Like

Actually, there is. I used to believe as you do. That belief was short-lived once I got to the medical device world. It’s not so simple in RISK situation. (I’m not shouting. We use RISK when referring to the formal FDA mandated RISK analysis process and documentation and risk when we are just talking about a single risk.)

The software update process in Ubuntu has become a virus unto itself. Just like Jason in the Holloween movie franchise, it cannot be killed for long. You can use every method found on-line and still it comes back to life. Windows 10 is even worse.

Back in the days of DR DOS 6 and OS/2 Warp, what you say actually worked. Really was no Internet to speak of. If you had a connection it was dial-up and the OS had a setting to block dial-on-demand or allow it. You could be absolutely certain nothing would change unless you changed it.

That hasn’t been the case in years.

As my previously posted link to the Git patch points out, even the “stable” 5.6.0 won’t build today without that patch retro-actively applied. This is due to automatic updates of underlying platforms.

Has anybody tried to run Windows 10 without an Internet connection? For a prolonged period without Internet connection? How did you get past activation and registration? Doesn’t it eventually hang when it cannot touch the Internet?

I don’t know, but I’ve heard rumors.

Same can be said of Ubuntu 20.04 (and probably earlier versions). Given they rolled NIC support into the kernel any network attempt has the potential to hang the entire system until timeout.

The traditional solution was to place all machines on a closed network that had zero connection to the outside world. It was really great if you could use an obscure network like Token Ring. Keep the machines off of anything that would connection to the Internet and completely remove all TCP/IP from them. No question your software would remain as-is and still be usable.

Today just putting it on your own Git instance doesn’t ensure that. Vast majority of machines are on the same network and it connects to the Internet. Operating systems wanting to gather all of the personal information they can about you to sell to anyone who will buy it demand an Internet connection and force updates on you.

No, this isn’t idle chatter or speculation. This is actual FDA conversation going on right now.

DRAFT -Cybersecurity Guidance (April 8, 2022) (fda.gov)

“As part of configuration management, device manufacturers should have custodial control of source code through source code escrow and source code backups. While source code is not provided in premarket submissions, if this control is not available based on the terms in supplier agreements, the manufacturer should include in premarket submissions a plan of how the third-party software component could be updated or replaced should support for the software end.”

Hardware manufacturers have long since mandated 10, 20, or 30-year supply terms in contracts with suppliers. They pay for this.

The same thing is happening with source providers now. OpenSource doesn’t cut it in the medical device world. You have to have fully controlled OpenSource.

Currently many are satisfying this requirement by creating a CMS on a machine that always gets backed up and is never updated. It sits behind both a physical security appliance and a VPN. Even if someone manages to somehow penetrate the security appliance from the regular Internet, without the extra device drivers and keys provided by the VPN front end, they see an alternate set of disks. In short, they see a virtual server, not the real thing. Virtual server goes away with the connection.

Again, people are signing contracts and paying money for this.

I don’t know what the arrangement is with my current client and this device.

The Git whammy should be a massive wake-up call. What we currently have in place is not good. If you are on Ubuntu 20.04 and haven’t slain the Software Updater virus, the locked down 5.6.0 will no longer build without retroactively applying a forward patch.

I know the IoT world will dismiss this, but the FDA regulated world cannot. It’s going to go from “best practice” to hard regulation very soon.

@jeremias.tx

Thank you for your polite and informative response. Also thank you for understanding the gravity of the situations and not responding with a “nah-ma-yob-man” response. This document must be executable by people with near zero computer skills.

There is a minor defugalty with your answer though. 5.6.0 leaves one with a Git torpedoed base. One must also manually apply the changes documented here:

Or go through the pain of trying to make repo cherrypick a forward commit.

Thank you again for your assistance.

Well herein lies one of the issues now. If you stay on a static version of the build for reproducibility, then new changes (including fixes) won’t make there way to you. Any future change will need to be judged whether it’s needed in your static build.

As for that particular issue with Git I see 2 options:

  • You create a fork of meta-toradex-torizon that is based on 5.6.0 (or whatever version you want to freeze on), and cherry pick this change.Then this fork is what you use in your builds
  • Alternatively this fix was only needed in the first place because Git was updated on multiple Linux distros changing behavior, resulting in this commit. Since your build environment is a VM it could be possible to create a VM with an older version of Git that doesn’t require this fix. This is to avoid the forking and cherry picking from the first option.

Best Regards,
Jeremias

Yes Jeremias,

That’s not as easy as it sounds. If you set up a new VM using an old ISO then try to install Git from the repo, it will install the latest one. There is a lot of hand searching through repo history to try and get and older version then many hoops to jump through excluding Git from update.

We’ve made our peace for now. I’ve manually cherry picked (made the source code change myself sans the cherrypick functions) and the documented method is currently 100% reproducible. The previous version was also 100% reproducible though.

This is a much bigger issue than just Toradex though. See my earlier response to Ikoziarz.

Toradex should be looking at this issue though because you’re selling into the medical device world (among others) and you probably want to have a VPN and hardware security device protected never upgraded ever system paying customers can get access to help prevent this.

We all need to find a non-Ubuntu desktop that won’t force updates on us and really will allow us to turn updates off so a single It-works-with-Toradex ISO can be parked somewhere and everyone “just use it.” That’s really where we have to be headed because this CI/CD stuff is just continuous carnage.