TorizonCore Builder in Azure Pipeline

Dear Toradex,
In our Azure DevOps pipeline we want to pre-provision our docker containers into our image. To do that we want to integrate the TorizonCore builder tool in the pipeline, it seems to work most of the time, however sometimes I run into the following error:

2024-08-26T10:36:13.2074689Z Starting DIND container
2024-08-26T10:36:13.2075125Z Using Docker host "tcp://127.0.0.1:22376"
2024-08-26T10:36:16.4930349Z Connecting to Docker Daemon at "tcp://127.0.0.1:22376"
2024-08-26T10:36:16.4942490Z Stopping DIND container
2024-08-26T10:36:17.6076582Z An unexpected Exception occurred. Please provide the following stack trace to
2024-08-26T10:36:17.6076932Z the Toradex TorizonCore support team:
2024-08-26T10:36:17.6076995Z 
2024-08-26T10:36:17.6077034Z 
2024-08-26T10:36:17.6193184Z Traceback (most recent call last):
2024-08-26T10:36:17.6193922Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connection.py", line 198, in _new_conn
2024-08-26T10:36:17.6194501Z     sock = connection.create_connection(
2024-08-26T10:36:17.6194850Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/util/connection.py", line 85, in create_connection
2024-08-26T10:36:17.6195050Z     raise err
2024-08-26T10:36:17.6195297Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/util/connection.py", line 73, in create_connection
2024-08-26T10:36:17.6195461Z     sock.connect(sa)
2024-08-26T10:36:17.6195604Z ConnectionRefusedError: [Errno 111] Connection refused
2024-08-26T10:36:17.6195681Z 
2024-08-26T10:36:17.6195862Z The above exception was the direct cause of the following exception:
2024-08-26T10:36:17.6195946Z 
2024-08-26T10:36:17.6202527Z Traceback (most recent call last):
2024-08-26T10:36:17.6203246Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 793, in urlopen
2024-08-26T10:36:17.6203570Z     response = self._make_request(
2024-08-26T10:36:17.6203999Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 491, in _make_request
2024-08-26T10:36:17.6204267Z     raise new_e
2024-08-26T10:36:17.6204663Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 467, in _make_request
2024-08-26T10:36:17.6205002Z     self._validate_conn(conn)
2024-08-26T10:36:17.6205462Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
2024-08-26T10:36:17.6205718Z     conn.connect()
2024-08-26T10:36:17.6206148Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connection.py", line 616, in connect
2024-08-26T10:36:17.6206429Z     self.sock = sock = self._new_conn()
2024-08-26T10:36:17.6206794Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connection.py", line 213, in _new_conn
2024-08-26T10:36:17.6206944Z     raise NewConnectionError(
2024-08-26T10:36:17.6207159Z urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fe9cc910f10>: Failed to establish a new connection: [Errno 111] Connection refused
2024-08-26T10:36:17.6207299Z 
2024-08-26T10:36:17.6207423Z The above exception was the direct cause of the following exception:
2024-08-26T10:36:17.6207489Z 
2024-08-26T10:36:17.6207579Z Traceback (most recent call last):
2024-08-26T10:36:17.6207800Z   File "/usr/local/lib/python3.9/dist-packages/requests/adapters.py", line 486, in send
2024-08-26T10:36:17.6207921Z     resp = conn.urlopen(
2024-08-26T10:36:17.6208108Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 847, in urlopen
2024-08-26T10:36:17.6208231Z     retries = retries.increment(
2024-08-26T10:36:17.6208425Z   File "/usr/local/lib/python3.9/dist-packages/urllib3/util/retry.py", line 515, in increment
2024-08-26T10:36:17.6208627Z     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
2024-08-26T10:36:17.6209034Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='127.0.0.1', port=22376): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe9cc910f10>: Failed to establish a new connection: [Errno 111] Connection refused'))
2024-08-26T10:36:17.6209211Z 
2024-08-26T10:36:17.6209356Z During handling of the above exception, another exception occurred:
2024-08-26T10:36:17.6209466Z 
2024-08-26T10:36:17.6209605Z Traceback (most recent call last):
2024-08-26T10:36:17.6209979Z   File "/usr/local/lib/python3.9/dist-packages/docker/api/client.py", line 214, in _retrieve_server_version
2024-08-26T10:36:17.6210229Z     return self.version(api_version=False)["ApiVersion"]
2024-08-26T10:36:17.6210533Z   File "/usr/local/lib/python3.9/dist-packages/docker/api/daemon.py", line 181, in version
2024-08-26T10:36:17.6210672Z     return self._result(self._get(url), json=True)
2024-08-26T10:36:17.6210866Z   File "/usr/local/lib/python3.9/dist-packages/docker/utils/decorators.py", line 46, in inner
2024-08-26T10:36:17.6210994Z     return f(self, *args, **kwargs)
2024-08-26T10:36:17.6211175Z   File "/usr/local/lib/python3.9/dist-packages/docker/api/client.py", line 237, in _get
2024-08-26T10:36:17.6211304Z     return self.get(url, **self._set_request_timeout(kwargs))
2024-08-26T10:36:17.6211509Z   File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 602, in get
2024-08-26T10:36:17.6211630Z     return self.request("GET", url, **kwargs)
2024-08-26T10:36:17.6211827Z   File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 589, in request
2024-08-26T10:36:17.6211949Z     resp = self.send(prep, **send_kwargs)
2024-08-26T10:36:17.6212146Z   File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 703, in send
2024-08-26T10:36:17.6212263Z     r = adapter.send(request, **kwargs)
2024-08-26T10:36:17.6212446Z   File "/usr/local/lib/python3.9/dist-packages/requests/adapters.py", line 519, in send
2024-08-26T10:36:17.6212576Z     raise ConnectionError(e, request=request)
2024-08-26T10:36:17.6212952Z requests.exceptions.ConnectionError: HTTPSConnectionPool(host='127.0.0.1', port=22376): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe9cc910f10>: Failed to establish a new connection: [Errno 111] Connection refused'))
2024-08-26T10:36:17.6213161Z 
2024-08-26T10:36:17.6213258Z During handling of the above exception, another exception occurred:
2024-08-26T10:36:17.6213316Z 
2024-08-26T10:36:17.6213459Z Traceback (most recent call last):
2024-08-26T10:36:17.6213864Z   File "/builder/torizoncore-builder", line 222, in <module>
2024-08-26T10:36:17.6214059Z     mainargs.func(mainargs)
2024-08-26T10:36:17.6214251Z   File "/builder/tcbuilder/cli/bundle.py", line 97, in do_bundle
2024-08-26T10:36:17.6214461Z     bundle(bundle_dir=args.bundle_directory,
2024-08-26T10:36:17.6214617Z   File "/builder/tcbuilder/cli/bundle.py", line 49, in bundle
2024-08-26T10:36:17.6214729Z     bundle_be.download_containers_by_compose_file(
2024-08-26T10:36:17.6214864Z   File "/builder/tcbuilder/backend/bundle.py", line 556, in download_containers_by_compose_file
2024-08-26T10:36:17.6214984Z     dind_client = manager.get_client()
2024-08-26T10:36:17.6215102Z   File "/builder/tcbuilder/backend/bundle.py", line 321, in get_client
2024-08-26T10:36:17.6215256Z     dind_client = docker.DockerClient(base_url=self.docker_host, tls=tls_config)
2024-08-26T10:36:17.6215469Z   File "/usr/local/lib/python3.9/dist-packages/docker/client.py", line 45, in __init__
2024-08-26T10:36:17.6215592Z     self.api = APIClient(*args, **kwargs)
2024-08-26T10:36:17.6215779Z   File "/usr/local/lib/python3.9/dist-packages/docker/api/client.py", line 197, in __init__
2024-08-26T10:36:17.6215916Z     self._version = self._retrieve_server_version()
2024-08-26T10:36:17.6216123Z   File "/usr/local/lib/python3.9/dist-packages/docker/api/client.py", line 221, in _retrieve_server_version
2024-08-26T10:36:17.6216251Z     raise DockerException(
2024-08-26T10:36:17.6216661Z docker.errors.DockerException: Error while fetching server API version: HTTPSConnectionPool(host='127.0.0.1', port=22376): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe9cc910f10>: Failed to establish a new connection: [Errno 111] Connection refused'))

The pipeline script looks like this:

 shopt -s expand_aliases # enable aliasses
 source tcb-env-setup.sh -a remote
 eval $(alias torizoncore-builder | sed "s/-it /-i --privileged /") # remove interactive mode tag in alias add privileged mode
 torizoncore-builder bundle --platform=linux/arm/v7 docker-compose.yml --bundle-directory bundle --dind-param="--privileged" --login-to docker.io * *

Hi, I’m the Toradex AI Assistant. I’m in early beta, and I would appreciate your feedback.
The Toradex Support team will follow up with you shortly!

Did you like my answer?

:+1: :-1:


:handshake:

The error you’re encountering seems to be related to the Docker-in-Docker (DIND) setup within your Azure DevOps pipeline. The Connection refused error typically indicates that the DIND container is not starting correctly or that the Docker daemon inside the DIND container is not accessible.

Here are a few steps to troubleshoot and potentially resolve this issue:

  1. Ensure DIND Container Starts Properly:
    Make sure that the DIND container is starting correctly and that the Docker daemon inside it is running. You can add some debug statements to check the status of the DIND container and the Docker daemon.

  2. Increase Timeout:
    Sometimes, the Docker daemon inside the DIND container might take a bit longer to start. You can try increasing the timeout for the connection attempts.

  3. Check Docker Daemon Logs:
    Inspect the logs of the DIND container to see if there are any errors or issues with the Docker daemon startup. This can provide more insight into why the connection is being refused.

  4. Modify Pipeline Script:
    Ensure that the DIND container is given enough privileges and resources. Your current script already includes --privileged, but you might want to ensure that other necessary parameters are also included.

Here is an updated version of your pipeline script with additional debug steps:

shopt -s expand_aliases # enable aliases
source tcb-env-setup.sh -a remote

# Start DIND container with additional debug and timeout
docker run --privileged --name dind-container -d docker:19.03.12-dind
sleep 10 # Wait for the Docker daemon to start

# Check if the Docker daemon is running
docker exec dind-container docker info
if [ $? -ne 0 ]; then
    echo "Docker daemon inside DIND container is not running."
    exit 1
fi

# Modify alias to remove interactive mode and add privileged mode
eval $(alias torizoncore-builder | sed "s/-it /-i --privileged /")

# Run the TorizonCore Builder command
torizoncore-builder bundle --platform=linux/arm/v7 docker-compose.yml --bundle-directory bundle --dind-param="--privileged" --login-to docker.io

# Stop and remove the DIND container
docker stop dind-container
docker rm dind-container

This script includes a check to ensure that the Docker daemon inside the DIND container is running before proceeding with the `torizon

I got it working again by removing the --privileged tags, no idea why

Hello @Casper,

Thanks for the update.

Is there any reason the container needs to be run as privileged?

Best Regards,
Bruno

Hello Bruno,

I added those after I got the same error before, adding them seemed to fix the issue, however it suddenly stopped working again after an unrelated change to pipeline. Is there any way I can see some more debug info from torizoncore builder?

Don’t see anything obvious from --log-level DEBUG, it seems to work fine now that I removed --dind-param=“–privileged” so for now it is solved.

Hi @Casper
your usage of TorizonCore Builder is Azure pipeline is interesting for me too.
Can I ask you how you setup the Azure DevOps .yaml file for the pipeline?

Thanks

this is basically it:

    steps:
    - script: |
        cd $TORIZON_CORE_BUILDER_DIR
        wget https://raw.githubusercontent.com/toradex/tcb-env-setup/master/tcb-env-setup.sh
        shopt -s expand_aliases # enable aliasses
        source tcb-env-setup.sh -a remote -n
        eval $(alias torizoncore-builder | sed "s/-it /-i /") # remove interactive mode tag in alias
        torizoncore-builder -v

download the script, use shopt to enable expending aliasses as it used in the script, source the script, edit the alias to work without interactive shell

but did you download the tcb container in Ubuntu in the pipeline in advance?

sourcing tcb-env-setup.sh sets everything up

Hello @Casper,

As you did, using the --log-level flag will give more information on what TorizonCore Builder is doing.

It is good to know that just removing the --privileged was able to resolve the problem.
If you run into other issues, please feel free to create another topic on the Community.

Best Regards,
Bruno