Docker-compose down with bridge networks

We are using a bridge network in our docker-compose file, like this

services:
  mqtt_broker-mosquitto:
    image: eclipse-mosquitto
    networks:
      - mosquitto
    hostname: broker0
    ports:
      - 1883:1883
    volumes:
      - ./conf:/mosquitto/config
      - ./data:/mosquitto/data
      - ./log:/mosquitto/log
    restart: always

  app1:
    depends_on:
      - mqtt_broker-mosquitto
    networks:
      - mosquitto
    image: app1
    restart: always

networks:
  mosquitto:
    name: mosquitto
    driver: bridge

When an update to the docker-compose is pushed, aktualizr tries to bring down the services using docker-compose down. The network says it still has active endpoints, even though our containers are stopped and removed before docker-compose tries to remove the network.

root@verdin-imx8mm-06944039:~# journalctl -u aktualizr-torizon -f
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[1598]: Updating containers via docker-compose
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[1598]: Running docker-compose down
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[1598]: Running command: /usr/bin/docker-compose --file /var/sota/storage/docker-compose/docker-compose.yml -p torizon down
...
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[4153]:  Network mosquitto  Removing
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[4153]:  Network mosquitto  Error
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[4153]: failed to remove network mosquitto: Error response from daemon: error while removing network: network mosquitto id 2eb853cad10eacd9e4c91eb3ec75dec044dd6d5ccbbb9685c7e4f90aa8a5b088 has active endpoints
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[1598]: docker-compose down of old image failed
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[1598]: Event: InstallTargetComplete, Result - Error
Nov 08 16:30:28 verdin-imx8mm-06944039 aktualizr-torizon[1598]: Event: AllInstallsComplete, Result - docker-compose:INSTALL_FAILED

The only way to use docker-compose down (even manually) with this bridge network is to use a workaround, such as this

docker kill $(docker ps -q)

We have a couple of ideas, but I was hoping you could suggest something better.

Is there a different way we should be configuring the bridge network so that docker-compose down will work as normal? Or some kind of settings in aktualizr that can be changed to accomodate this?

Greetings @sarah,

This seems to be a somewhat known issue on the Docker side:

From what I can tell it seems there seems to be a possible race condition of sorts. It looks like docker-compose down tries to bring down networks before the connected containers themselves have been fully stopped and removed. From other reports online it seems this is inconsistent since it depends on how long it takes to stop and remove one’s containers.

I tried to reproduce this myself but was not able to. I guess the container stack you have defined in your compose file has the right conditions for this race condition to occur.

Could you try the following. I see in compose files you can set the stop period for individual containers in your compose file: Services top-level element | Docker Docs

This is the period of time docker will wait when stopping a container before just killing it with a SIGKILL. Maybe if you decrease this time it will kill your containers more quickly and prevent the race condition from occurring.

Alternatively you can set what signal gets sent to your containers when being stopped: Services top-level element | Docker Docs

You could set this to SIGKILL which should be equivalent to the docker kill you’re doing. Either way the idea is to get your container to stop faster to get around this race condition, assuming that is the issue.

That said, since I can’t reproduce it I’m a bit limited in how much I can investigate this. It would be of great help if I could use your container stack to try and reproduce it myself, assuming you can share this.

Best Regards,
Jeremias

Thanks, I’ll try this! We are using containers that are not publicly available, but they are based on python:3.12-slim.

Thanks, I’ll try this!

Please let us know how this works out for you. It would be good to know more about this issue for our future reference.

We are using containers that are not publicly available, but they are based on python:3.12-slim.

Unfortunately to reproduce this I would probably need to your container stack exactly as you have it. If we’re going off my theory of a possible race condition, then the timing of the containers here would need to be just right. Something that would be hard to simulate manually.

Best Regards,
Jeremias