Torizoncore-builder deploy timeout

Hello Toradex team,

I hope you are all doing great!

Hardware:

uname:

  • Linux verdin-imx8mp-XXXXXXX 5.15.148-6.8.0+git.8c5c2dcbf6ba #1-TorizonCore SMP PREEMPT Tue Aug 6 10:01:59 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Images tested:

  • torizon-core-docker-verdin-imx8mp-Tezi_6.8.4+build.40.tar (STABLE Release)

Guest OS:

  • macOS (M1 Pro ARM64)

  • Linux ubuntu (VM x86_64)

Issue:

I wanted to push the new update using the deploy command of torizoncore-builder. But it always timeout (see trace bellow).

I tried 2 differents ways:

  • Directly connected to my Mac with an ethernet cable.
  • Connecting the verdin via WiFi and doing the same command.

In both case it timeout after 90 seconds (?).

After further investigation, I can see that the code has changed recently:

Also interesting, the variable REMOTE_CMD_TIMEOUT = 30 but the error says: invoke.exceptions.CommandTimedOut: Command did not complete within 90 seconds!

I hope you will be able to help me find a solution quickly.

Best regards,

M

Trace:

$ torizoncore-builder deploy --remote-host 192.168.2.3 --remote-username torizon --remote-password 'password' --reboot
WARNING: Beware that artifacts not managed by OSTree (e.g. bootloader, container images, platform provisioning data, fuse values, U-Boot environment) will not be deployed by this operation.
Pulling OSTree with ref base (checksum d6eead88f5fd9965c75efe463d5d11b9ac7acd04818d6176efd069e51067edb1) from local archive repository...
Starting http server to serve OSTree.
OSTree server listening on "localhost:38707".
7da2d07a16cb9f4d1d1db46e21af494342baaf12d5932defbf4224b59f8d67c7
'verdin-imx8mp'
Starting OSTree pull on the device...
An unexpected Exception occurred. Please provide the following stack trace to
the Toradex TorizonCore support team:


Exception in thread Thread-548:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/builder/tcbuilder/backend/rforward.py", line 53, in handler
    chan.send(data)
  File "/usr/lib/python3/dist-packages/paramiko/channel.py", line 801, in send
    return self._send(s, m)
  File "/usr/lib/python3/dist-packages/paramiko/channel.py", line 1198, in _send
    raise socket.error("Socket is closed")
OSError: Socket is closed
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 52084)
Traceback (most recent call last):
  File "/builder/torizoncore-builder", line 232, in <module>
    mainargs.func(mainargs)
  File "/builder/tcbuilder/cli/deploy.py", line 234, in do_deploy
    do_deploy_ostree_remote(args)
  File "/builder/tcbuilder/cli/deploy.py", line 197, in do_deploy_ostree_remote
    deploy_ostree_remote(storage_dir=args.storage_directory,
  File "/builder/tcbuilder/cli/deploy.py", line 190, in deploy_ostree_remote
    dbe.deploy_ostree_remote(remote_host, remote_username, remote_password,
  File "/builder/tcbuilder/backend/deploy.py", line 560, in deploy_ostree_remote
    run_command_with_sudo(
  File "/builder/tcbuilder/backend/deploy.py", line 435, in run_command_with_sudo
    result = ssh_connection.sudo(command, pty=True, hide=True, timeout=REMOTE_CMD_TIMEOUT)
  File "/usr/local/lib/python3.9/dist-packages/decorator.py", line 235, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.9/dist-packages/fabric/connection.py", line 23, in opens
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/fabric/connection.py", line 777, in sudo
    return self._sudo(self._remote_runner(), command, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/invoke/context.py", line 232, in _sudo
    return runner.run(cmd_str, watchers=watchers, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/fabric/runners.py", line 83, in run
    return super().run(command, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/invoke/runners.py", line 395, in run
    return self._run_body(command, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/invoke/runners.py", line 451, in _run_body
    return self.make_promise() if self._asynchronous else self._finish()
  File "/usr/local/lib/python3.9/dist-packages/invoke/runners.py", line 516, in _finish
    raise CommandTimedOut(result, timeout=timeout)
invoke.exceptions.CommandTimedOut: Command did not complete within 90 seconds!

Command: "sudo -S -p '[sudo] password: ' ostree pull tcbuilder:d6eead88f5fd9965c75efe463d5d11b9ac7acd04818d6176efd069e51067edb1"

Stdout:

[sudo] password:
Receiving objects: 26% (527/1960) 708.2 kB/s 63.0 MB

Stderr: n/a (PTYs have no stderr)

On the device:

Dec 16 09:57:56 verdin-imx8mp-XXXXXX systemd[1]: Started OpenSSH Per-Connection Daemon (192.168.129.138:58338).
Dec 16 09:57:56 verdin-imx8mp-XXXXXX sshd[2179]: Accepted password for torizon from 192.168.129.138 port 58338 ssh2
Dec 16 09:57:56 verdin-imx8mp-XXXXXX sshd[2179]: pam_unix(sshd:session): session opened for user torizon(uid=1000) by (uid=0)
Dec 16 09:57:56 verdin-imx8mp-XXXXXX systemd-logind[698]: New session c5 of user torizon.
Dec 16 09:57:56 verdin-imx8mp-XXXXXX systemd[1]: Started Session c5 of User torizon.
Dec 16 09:57:56 verdin-imx8mp-XXXXXX audit[2179]: SYSCALL arch=c00000b7 syscall=64 success=yes exit=4 a0=7 a1=ffffe7413220 a2=4 a3=ffffa2a58920 items=0 ppid=1 pid=2179 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=6 comm="sshd" exe="/usr/sbin/sshd" key=(null)
Dec 16 09:57:56 verdin-imx8mp-XXXXXX audit: PROCTITLE proctitle=737368643A20746F72697A6F6E205B707269765D
Dec 16 09:57:56 verdin-imx8mp-XXXXXX kernel: audit: type=1006 audit(1765875476.924:88): pid=2179 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=6 res=1
Dec 16 09:57:56 verdin-imx8mp-XXXXXX kernel: audit: type=1300 audit(1765875476.924:88): arch=c00000b7 syscall=64 success=yes exit=4 a0=7 a1=ffffe7413220 a2=4 a3=ffffa2a58920 items=0 ppid=1 pid=2179 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=6 comm="sshd" exe="/usr/sbin/sshd" key=(null)
Dec 16 09:57:56 verdin-imx8mp-XXXXXX kernel: audit: type=1327 audit(1765875476.924:88): proctitle=737368643A20746F72697A6F6E205B707269765D
Dec 16 09:57:57 verdin-imx8mp-XXXXXX sudo[2186]:  torizon : TTY=pts/0 ; PWD=/var/rootdirs/home/torizon ; USER=root ; COMMAND=/usr/bin/ostree remote add --no-gpg-verify --force tcbuilder http://localhost:34667/
Dec 16 09:57:57 verdin-imx8mp-XXXXXX sudo[2189]:  torizon : TTY=pts/0 ; PWD=/var/rootdirs/home/torizon ; USER=root ; COMMAND=/usr/bin/ostree pull tcbuilder:d6eead88f5fd9965c75efe463d5d11b9ac7acd04818d6176efd069e51067edb1
Dec 16 09:59:27 verdin-imx8mp-XXXXXX sshd[2179]: pam_unix(sshd:session): session closed for user torizon
Dec 16 09:59:27 verdin-imx8mp-XXXXXX systemd[1]: sshd@4-192.168.129.224:22-192.168.129.138:58338.service: Deactivated successfully.
Dec 16 09:59:27 verdin-imx8mp-XXXXXX systemd[1]: session-c5.scope: Deactivated successfully.
Dec 16 09:59:27 verdin-imx8mp-XXXXXX systemd-logind[698]: Removed session c5.

Hi, I’m the Toradex AI Assistant.

I might make mistakes or occasionally get you wrong. Our support team will step in if needed.


TorizonCore Builder Deploy Timeout Issue

Based on your error trace and the GitHub commit you referenced, you’re encountering a known timeout issue with the deploy command in TorizonCore Builder. The error occurs during the OSTree pull operation, where the command times out after 90 seconds while downloading objects.

Understanding the Issue

The timeout is happening during the OSTree pull operation where the system is downloading repository objects(1). As shown in your trace, the download was progressing (Receiving objects: 26% (527/1960) 708.2 kB/s 63.0 MB) but didn’t complete within the timeout window.

Alternative Deployment Methods

Since the deploy command is experiencing issues, you can use alternative deployment methods that are documented:

1. Toradex Easy Installer

You can deploy your custom image using Toradex Easy Installer(2):

  • Copy the generated folder to a USB stick
  • Load Toradex Easy Installer on your device
  • Flash your image through the Easy Installer interface

2. Torizon Cloud

For remote deployment, Torizon Cloud provides a more robust solution for deploying images to one or more devices(2).

Workaround for Direct SSH Deployment

If you need to continue using SSH deployment, consider these approaches:

  1. Check Network Stability: Ensure your network connection between the host and device is stable, as the download was progressing but interrupted.

  2. Verify Available Space: Ensure your device has sufficient storage space for the OSTree repository objects(1).

  3. Manual OSTree Operations: You could potentially perform the OSTree operations manually on the device if needed, though this requires careful handling of the OSTree repository structure(1).

Important Note on OSTree Operations

The documentation emphasizes that invoking OSTree commands directly on a Torizon device can break the system(1). Any direct OSTree manipulation should only be done on debug devices and with specific guidance from Toradex support.

I recommend using Toradex Easy Installer as the most reliable deployment method for your custom image, especially given the timeout issues you’re experiencing with the deploy command.

Best regards,
Toradex AI Assistant.


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.

Since it’s a known issue, here my solution:

$ source tcb-env-setup.sh -t 3.13.0

Best,

M

Thank You for Sharing the Solution!

Thank you for sharing your solution! Using a specific version of TorizonCore Builder is indeed a valid approach to resolve version-specific issues.

Your Solution Explained

Your solution uses the tcb-env-setup script with the -t flag to specify version 3.13.0 (1):

$ source tcb-env-setup.sh -t 3.13.0

(1)

This approach leverages the tag mode functionality of the setup script, which automatically pulls the specified version from online (1). The -t flag allows you to select any available version tag from the TorizonCore Builder Docker registry (1).

Alternative Version Selection

For reference, the setup script also supports other modes (1):

  • Auto remote mode: source tcb-env-setup.sh -a remote (uses latest online version)
  • Auto local mode: source tcb-env-setup.sh -a local (uses latest local version)

Important Reminder

Remember that you’ll need to source the setup script every time you start a new terminal session, as the torizoncore-builder command is not retained between sessions (1).

This is a practical workaround that addresses the timeout issues you encountered with the newer versions while the underlying issue is being resolved.

Best regards,
Toradex AI Assistant.


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.

Greetings @unablesalt,

Just to confirm are you saying that this commit broke things for you on 6.8.4? That seems strange since it was suppose to fix an issue with the deploy command that started happening with Torizon OS 6.8.3.

Also interesting, the variable REMOTE_CMD_TIMEOUT = 30 but the error says: invoke.exceptions.CommandTimedOut: Command did not complete within 90 seconds!

This timeout was later bumped to 90s in a follow-up commit: backend/common: Increase timeout for remote commands sent via SSH · torizon/torizoncore-builder@9738485 · GitHub

Is it possible that your deploy command is genuinely taking longer than 90s? When you “fix” the problem by going to a past version of TorizonCore Builder does the depoy command take longer than 90s to complete?

Do you have a lot of changes to deploy and that’s why it takes a while? Or maybe the network connection between your host PC and the device is a little slow?

Best Regards,
Jeremias

1 Like

Hi @jeremias.tx,

Thank you for your reply!

It’s totally possible that it takes more than 90s, torizoncore-builder only runs on Linux/Windows x86 and I’m on mac ARM, I had to create a VM to runs all the command so that could explain why everything is always slow.

For the network part, the device is directly plugged to my Mac through a 2.5G adapter and CAT6 cable, so I can assume that the probleme doesn’t lies on the wiring.

I don’t have any change to commit despite bumping from 6.8.2 → 6.8.4 (for what I saw 182MB compressed when I checked the log).

I finally decide to push the update using the Torizon platform.

Best regards,
M

The process taking longer than 90s, would be the most reasonable explanation on why you’re seeing this issue.

If it’s not too much trouble to ask, we would appreciate it if you could confirm whether it really does take longer than 90s to perform the deploy in your setup. It would help us at least know if that is the issue or if the issue is some other underlying bug/issues that we would need to address.

Best Regards,
Jeremias