Torizon offline updates failing when trying to download lockbox with compose file

Hello,

I am working to perform an update with the offline updates provided via the torizon platform.

I am using windows with a wsl2 backend to run the torizoncore-builder. I am running an early access version of torizoncore-builder with network mode set to host. This was based on advice found here: TCB cannot do "platform push" if private docker registry and Lockbox creation failure with an 404 client error

I am able to successfully use the bundle command to create a canonicalized docker-compose file and upload it to the OTA server. I am then able to successfully build a lockbox using that docker-compose file using the web-based torizon platform. However, when I go to run ‘torizoncore-builder platform lockbox’ to pull the lockbox file from the torizon platform. I receive this error: Could not fetch file ‘docker-compose.lock.yml-2023-07-03’ from ‘https://api.torizon.io/repo/api/v1/user_repo/targets/docker-compose.lock.yml-2023-07-03

Unsure what is causing this as I have successfully pulled a lockbox down before with no issues using these methods. Any help is appreciated. Thanks!

Here is the verbose version of the error:

2023-07-03 16:44:34,982 - torizon.tcbuilder.backend.platform - INFO - Fetching target ‘docker-compose.lock.yml-2023-07-03’ from ‘https://api.torizon.io/repo/api/v1/user_repo/targets/docker-compose.lock.yml-2023-07-03’…
2023-07-03 16:44:34,982 - torizon.tcbuilder.backend.platform - INFO - Uptane info: target ‘docker-compose.lock.yml’, version: ‘2023-07-03’
2023-07-03 16:44:34,984 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): api.torizon.io:443
2023-07-03 16:44:35,315 - urllib3.connectionpool - DEBUG - https://api.torizon.io:443 “GET /repo/api/v1/user_repo/targets/docker-compose.lock.yml-2023-07-03 HTTP/1.1” 401 123
2023-07-03 16:44:35,320 - torizon.tcbuilder.cli.platform - INFO - Removing output directory ‘update/’ due to errors
2023-07-03 16:44:35,465 - root - ERROR - Could not fetch file ‘docker-compose.lock.yml-2023-07-03’ from ‘https://api.torizon.io/repo/api/v1/user_repo/targets/docker-compose.lock.yml-2023-07-03
2023-07-03 16:44:35,467 - root - DEBUG - Traceback (most recent call last):
File “/builder/torizoncore-builder”, line 221, in
mainargs.func(mainargs)
File “/builder/tcbuilder/cli/platform.py”, line 321, in do_platform_lockbox
platform_lockbox(
File “/builder/tcbuilder/cli/platform.py”, line 303, in platform_lockbox
raise exc
File “/builder/tcbuilder/cli/platform.py”, line 282, in platform_lockbox
fetch_offupdt_targets(
File “/builder/tcbuilder/cli/platform.py”, line 199, in fetch_offupdt_targets
platform.fetch_compose_target(**params)
File “/builder/tcbuilder/backend/platform.py”, line 689, in fetch_compose_target
compose = fetch_file_target(
File “/builder/tcbuilder/backend/platform.py”, line 347, in fetch_file_target
return fetch_validate(
File “/builder/tcbuilder/backend/platform.py”, line 276, in fetch_validate
raise FetchError(
tcbuilder.errors.FetchError: Could not fetch file ‘docker-compose.lock.yml-2023-07-03’ from ‘https://api.torizon.io/repo/api/v1/user_repo/targets/docker-compose.lock.yml-2023-07-03

Hi @MikeHA ,

Are you using the early access version of TorizonCore Builder because of the private registry support?

The latest stable version as of now (3.7.0) should have this feature. Can you try testing the stable release and see if the issue occurs there? You’ll probably need to set the network mode to host as well, given that you’re on Windows/WSL.

Best regards,
Lucas Akira

Hi Lucas,

I was able to get it working with the 3.7.0 version of torizoncore-builder. I am now able to do:
create canonicalized docker-compose file with the “bundle” command. This works perfectly and is able to pull down all the images.

I then am able to push the canonicalized docker compose to the torizon platform with the .lock extension.

However, when I go to create the lockbox It is unable to fetch the manifest of some of my containers. It is odd because it works perfectly fine when fetching all of the containers during the bundle command. Here is the traceback of the “torizoncore-builder platform lockbox” command:

Traceback (most recent call last):
File “/builder/torizoncore-builder”, line 221, in
mainargs.func(mainargs)
File “/builder/tcbuilder/cli/platform.py”, line 316, in do_platform_lockbox
platform_lockbox(
File “/builder/tcbuilder/cli/platform.py”, line 298, in platform_lockbox
raise exc
File “/builder/tcbuilder/cli/platform.py”, line 277, in platform_lockbox
fetch_offupdt_targets(
File “/builder/tcbuilder/cli/platform.py”, line 194, in fetch_offupdt_targets
platform.fetch_compose_target(**params)
File “/builder/tcbuilder/backend/platform.py”, line 688, in fetch_compose_target
manifests_per_image = fetch_manifests(images, manifests_dir)
File “/builder/tcbuilder/backend/platform.py”, line 399, in fetch_manifests
digests_saved, manifests_info = ops.save_all_manifests(
File “/builder/tcbuilder/backend/registryops.py”, line 553, in save_all_manifests
for info, resp in self.get_all_manifests(image_name, **kwargs):
File “/builder/tcbuilder/backend/registryops.py”, line 495, in get_all_manifests
assert top_res.status_code == requests.codes[“ok”],
AssertionError: Could not fetch manifest of

Wait so are you getting a different error depending on whether you use 3.7.0 or the early-access?

Could you please provide some details on the container images that need to be fetched? I gather some or all of the containers are stored in a private container registry, correct? Is the container registry Dockerhub, or something else? Is there any other important details related to these container images of yours?

Also can you list out exactly every TorizonCore Builder command you’re executing, I just want to make sure there’s nothing missing, as we may need to try and see if we can reproduce your error on our side.

Best Regards,
Jeremias

Hey Jeremias,

Yeah, I am getting a different error with the 3.7.0 tag then I was with the early-access tag. It seems the previous issue is resolved as it does know that the docker-compose file exists in the lockbox now.

There is nothing really special about the images. They are coming from a private dockerhub registry built for linux/arm64. It seems that torizoncore-builder bundle can pull all of them fine but the lockbox has issues when trying to pull the images. The images it is failing on are multi-platform images built using buildx so I’m not sure if that is causing an issue? It does seem to work fine for one of the images we have on our private registry and a torizon image.

Here is the commands I am runnning:

  • source tcb-env-setup.sh -t 3.7.0 – --network=host
  • torizoncore-builder bundle --force --platform linux/arm64 --login [docker hub info] docker-compose.yml
  • this outputs the canonicalized docker-compose file from my understanding. I then manually change the extension to include the .lock for the torizon platform so it is happy. I then have docker-compose.lock.yml
  • I then use "torizoncore-builder platform push --credentials credentials. --login [dockerhub info] docker-compose.lock.yml. This is succesfull
  • I then use the torizon web-app to build the lockbox with just the docker-compose file in it.
  • Finally, I run "torizoncore-builder platform lockbox --force --platform linux/arm64 --credentials credentials.zip --login [dockerhub info] lockbox_name.

That is when the error occurs and I get the trace in the previous message. Thanks!

The images it is failing on are multi-platform images built using buildx so I’m not sure if that is causing an issue?

This right here is the issue. The command for lockboxes handles container image manifests in a certain way. When you build a container image with buildx it uses a different type of manifest by default. This other type of manifest isn’t handled properly by the lockbox command at the moment. In fact our team is currently working on updating all our tools to be able to process this new type of image manifest. This is why you’re seeing this issue.

So until our team completes this work you can try the following:

  • Rebuild your container images using docker build instead of buildx. This will make it so that the containers are built with the type of manifest that TorizonCore Builder knows how to process.
  • Or you can still keep using buildx but you need to add the argument, --output="type=registry,oci-mediatypes=false". This should make it so that buildx will produce an image using the “good” type of manifest.

Then once our team finishes their work you can use buildx normally as expected.

Best Regards,
Jeremias

Thanks Jeremias! Good to know it’s on your teams radar.

Glad I could help clear things up.