cache torizoncore-builder in CI/CD pipeline

vix · March 12, 2025, 4:11pm

I use torizoncore-builder both on my local PC and on an Azure pipleine to build a customized version of Torizon OS.
This starts downloading a Torizon OS image as base, that it builds a customized package.
This is almost the same (I mean, it changes rarely).
Then I bundle together all my containers, to produce a TEZI image.
Maybe one of the container changes, but most of them are always the same.

When I work in a pipleine, every time all the process id cone from the beginning, and it takes time.
And every minute of the pipeline costs money.
I wonder how I can cache this process. I imagine:

download the Torizon OS base only if changed
pulling the deltas of the containers (instead of the whole images) if they’re enough

Is this possible?
Is there any documentation on how to do this?

ToradexAI · March 12, 2025, 4:13pm

Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!

Did you like my answer?

Hello!

To optimize your CI/CD pipeline with TorizonCore Builder and reduce both time and cost, you can implement caching strategies for the Torizon OS base image and container layers. Here are some steps you can take:

Cache the Torizon OS Base Image: You can download and unpack the Torizon OS base image once and reuse it for subsequent builds unless a new version is available. Use the images unpack command to unpack the image into a Docker volume, which can be reused across builds. This prevents the need to download the base image every time.
Leverage Docker Layer Caching: Configure your CI/CD pipeline to use Docker’s layer caching. This way, only the layers that have changed will be pulled, significantly reducing the time spent downloading container images. Ensure your pipeline is set up to cache these layers between runs.
Use Standalone Commands for Incremental Changes: Instead of running the entire build process from scratch, use standalone commands to apply specific customizations. This allows you to reuse previously unpacked images and only apply new changes, saving time.
Consider Docker Compose: If your containers rarely change, manage them with a Docker Compose file. The bundle command can pre-bundle these images, and you can use the combine command to merge them with your custom Torizon OS image.

By implementing these strategies, you can effectively cache the necessary components and optimize your pipeline’s efficiency.

Best regards,
Toradex AI Assistant.

Please, also check if these links can help you solve your problem:

https://community.toradex.com/tag/imx6?match_all_tags=true&page=4&tags%5B%5D=imx6

First steps with Torizon.

jeremias.tx · March 12, 2025, 8:35pm

Greetings @vix,

download the Torizon OS base only if changed

Instead of downloading during the pipeline, have you tried just pre-downloading the base image, and then providing that to your pipeline? Then whenever the base image changes you just pre-download the new base image and provide that as the new input. As you said, you rarely change the base image so I imagine this is something that would not need to be done often.

pulling the deltas of the containers (instead of the whole images) if they’re enough

TorizonCore Builder doesn’t cache any information related to your container images, which is probably why you observe it pulling the container images in their entirety every time. It would require a rework of the tool’s functionality in some parts to allow for something like this.

For what is possible now is using torizoncore-builder bundle command with the --bundle-directory option. This produces on your system a directory containing the information of your container images that will be bundled to your custom image. You can then provide this directory as input to TorizonCore Builder and it will use this directory without having to re-pull your container images.

The idea then would be similar to the first point. You “pre-download” your container images and create this bundle directory and provide this directory as input to your pipeline. This would require some work pre-pipeline as a tradeoff for less time in the pipeline. But, if your container images really don’t change too often like you said, it could be worth considering.

Best Regards,
Jeremias

vix · March 14, 2025, 9:10am

This can be an option, but I need a way to handle this automatically from the CI/CD pipeline.
I mean, I cannot rely on remembering to invalidate the cache when I change something in the recipe to build the image.

torizoncore-builder bundle command with the --bundle-directory could be an option if the tool would verifiy if the budle is up-to-date or it’s not.
At the moment if the bundle directory exists, it’s take as it is, and this is bad. In the pas I asked for a way to delete the folder, because downloading it from scratch it’s time consuming but the result is correct.
Re-using a folder without any contorl is fast, but you don’t have any warranty that the output is correct. And in an automated pipeline, this is bad.

Downloading Torizon OS is quite fast (no big benefit from caching).
This is the slow part (you can see the timestamps of the last two lines):

2025-03-12T11:06:31.9328294Z Fetching container image xxxxxA in service xxxxxA
2025-03-12T11:06:36.6555531Z Fetching container image xxxxxB in service xxxxxB
2025-03-12T11:06:58.2991791Z Fetching container image xxxxxC in service xxxxxC
2025-03-12T11:07:00.4367547Z Fetching container image xxxxxD in service xxxxxD
2025-03-12T11:07:06.3050275Z Fetching container image xxxxxE in service xxxxxE
2025-03-12T11:07:16.6059173Z Fetching container image xxxxxF in service xxxxxF
2025-03-12T11:07:17.2187794Z Fetching container image xxxxxG in service xxxxxG
2025-03-12T11:07:39.0421978Z Fetching container image xxxxxH in service xxxxxH
2025-03-12T11:07:56.3429016Z Fetching container image xxxxxL in service xxxxxL
2025-03-12T11:08:32.6629232Z Saving Docker Compose file
2025-03-12T11:08:32.6747387Z Exporting storage
2025-03-12T11:08:32.6749823Z Storing container bundle into "bundle-dir"
2025-03-12T11:25:30.3438214Z Stopping DIND container

If the fetching is a docker pull over existing images, this should be incremental.
But it seems it’s not, because if the bundle directory exists, it’s taken as it is without any pulling.
And so the pipeline should start from empty bundle directory every time, and so the pulling is from scratch every time.

I don’t know if this step Storing container bundle into “bundle-dir” can benefit of docker pull over an existing folder of docker images.

jeremias.tx · March 14, 2025, 5:44pm

This can be an option, but I need a way to handle this automatically from the CI/CD pipeline.
I mean, I cannot rely on remembering to invalidate the cache when I change something in the recipe to build the image.

Re-using a folder without any contorl is fast, but you don’t have any warranty that the output is correct. And in an automated pipeline, this is bad.

Keep in mind I’m not telling you this is ideal. I realize there are faults with this method in regards to your use-case. The reality is, is that this is what is possible today without waiting. If you want something more, then you will need to wait. I would need to bring this up to our product team as it is a new feature essentially. It would have to be discussed before we can commit any resources to it. Considering this, there is also no guarantee on when or even if we will commit to do this.

I don’t know if this step Storing container bundle into “bundle-dir” can benefit of docker pull over an existing folder of docker images.

Looking at your logs it’s the creation of the bundle that is taking the most time not the pulling of the container images. This step involves storing the container image files into a compressed tarball. If you have a lot of container images that are comprised of many files the compression could be what is taking a lot of time.

Again though this would need time and resources for us to do anything on our side. Which I can inquire about, but I can’t guarantee you anything.

Best Regards,
Jeremias

vix · March 19, 2025, 7:31am

Yes, consider this as a feature suggestion from customer side.
It’s an idea for future improvements that could be useful in real-case scenarios.

I uderstand perfectly.

Is the source code for this step public? Or is it Toradex closed-source?
Can you point to it?
Maybe looking to it I can get some inspiration on how to speed up it.

Thanks

jeremias.tx · March 19, 2025, 7:26pm

Yes, consider this as a feature suggestion from customer side.
It’s an idea for future improvements that could be useful in real-case scenarios.

I can forward the request internally. Keep in mind though this isn’t a commitment to anything it still needs to be discussed.

Is the source code for this step public? Or is it Toradex closed-source?
Can you point to it?
Maybe looking to it I can get some inspiration on how to speed up it.

The entire TorizonCore Builder source is open and available here: GitHub - torizon/torizoncore-builder: TorizonCore Builder is a tool that allows the customization of TorizonCore images.

The particular logic that creates the compressed tarball that I mentioned should be this bit of code here: torizoncore-builder/tcbuilder/backend/bundle.py at bullseye · torizon/torizoncore-builder · GitHub

We do also accept contributions from those outside of Toradex on this code-base. If you have the time and are able to come up with a solution/improvement that doesn’t break current use-cases we’d be happy to review a contribution.

Best Regards,
Jeremias

vix · March 20, 2025, 7:37am

Looking to the message I see in the logs I think that the relevant portion of code is this one

github.com

torizon/torizoncore-builder/blob/d1dbac96ee4de81f90c144a44886faa3ceb7348c/tcbuilder/backend/bundle.py#L367-L432


      
          def save_tar(self, output_file):
              """Create compressed tar archive of the Docker images"""
          
              log.info(f"Storing container bundle into \"{self.bundle_dir}\"")
          
              # Get tar filename and compression command to convert to compressed tar
              # locally in a second command (we do not have the compression utils in
              # the tar container).
              (output_file_tar, compression_command) = get_compression_command(output_file)
          
              # Notice that here we mount the DIND volume containing the Docker images
              # as a read-only input so that the compression command will have access
              # to them in order to operate. The bundle directory, in turn, is where
              # the output will be generated.
              _mount_dir = "/mnt"
              _mounts = [
                  docker.types.Mount(
                      source=self.bundle_dir_host[0],
                      type=self.bundle_dir_host[1],
                      target=_mount_dir,

This file has been truncated. show original

As far as I can see, the problem is that all the docker images are bundled ina single .tar file.
And so even if onlt one image has changed, the whole .tar file must be recreated (unless .tar can have a kind of incremental behavior - I don’t know).
And this monolithic file is not cache-friendly.
I imagine that a huge rework of the tool would be necessary, with two possible approaches:

one .tar file for every docker image, but then another file is necessary to list all the necessary .tar files
starting for the existing monolithic .tar file, extracting it, updating the necessary images, re-creating the archive

jeremias.tx · March 20, 2025, 4:46pm

I imagine that a huge rework of the tool would be necessary, with two possible approaches:

Well that is why I said earlier such a change would be non-trivial and require a somewhat significant rework of the tool’s functionality. We also must take care to not break any previous use-cases and interfaces. Such changes will most certainly require in-depth discussion amongst the team.

Best Regards,
Jeremias