Torizon aktualizr docker prune -a after the OTA

Hi,

I have encountered a peculiar issue. We have a following docker service setup (rough example):

services:
  service1:
    depends_on:
      setup_service:
        condition: service_completed_successfully


  service2:
    depends_on:
      setup_service:
        condition: service_completed_successfully


  service3:
    depends_on:
      setup_service:
        condition: service_completed_successfully


  setup_service:
   ...

The setup service does some crucial setup steps and other services must not start before it.

The issue is that after the OTA through torizon cloud, aktualizr service runs all the service with docker-compose up, and once all services are up, removes all the stopped containers and dangling images with docker system prune -a. However, this also inadvetedly removes the setup_container as by the time all the other containers are running it is stopped, and so docker compose has to pull that container again after reboot.

So is the only way around that to change dependency condition to service_healthy and implement a healthcheck that confirms that setup is complete, while keeping the setup container running and being idle?

Is using the service_completed_successfully dependency so rare that noone else encountered this issue using torizonOS ? Or am i missing something?

I’ve seen some older posts having issues with aktualizr using hard coded docker-compose commands. Is it maybe possible to make them configurable? As for some unrelated functionality I was contemplating using docker profiles, however again because of aktualizr hard coded docker compose up commands, I had to abandon that idea.

Hi, I’m the Toradex AI Assistant.

I might make mistakes or occasionally get you wrong. Our support team will step in if needed.


Docker Compose Dependency Issue with Torizon OS OTA Updates

Understanding the Issue

You’ve encountered a known behavior with how Aktualizr handles Docker Compose deployments during OTA updates(1). The issue stems from the automatic cleanup process that removes stopped containers, which inadvertently affects containers using the service_completed_successfully dependency condition(1).

Current Aktualizr Behavior

The Docker integrity checker service in Torizon OS performs cleanup operations after container deployments(1). This service checks containers defined in /var/sota/storage/docker-compose/docker-compose.yml and can remove what it considers unused or stopped containers(1).

Recommended Solutions

Option 1: Use service_healthy with Health Checks

Your suggested approach is actually the recommended solution. Implement a health check that keeps the setup container running while indicating completion status:

setup_service:
  # ... your service configuration
  healthcheck:
    test: ["CMD", "test", "-f", "/tmp/setup_complete"]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 60s

(1)

Then modify your dependencies:

services:
  service1:
    depends_on:
      setup_service:
        condition: service_healthy

Option 2: Container Resource Management

You can also implement container limiting to ensure proper resource allocation and prevent cleanup issues(1). This approach helps maintain container stability during the update process.

Docker Container Health Monitoring

Torizon OS includes a Docker container health monitor that can help manage container lifecycle(1). To enable monitoring for your containers:

  1. Label containers for monitoring:
labels:
  - autoheal=true

(1)

  1. Enable the docker-watchdog service:
# sudo systemctl enable docker-watchdog.service

(1)

Alternative Approaches

Controlling Update Behavior

You can control when updates are applied using Aktualizr’s locking mechanism(2). This allows your application to prevent updates during critical operations:

# sudo flock --verbose -x /run/lock/aktualizr-lock -c "sleep 30"

(2)

D-Bus Control (Torizon OS 7.4.0+)

For more granular control, you can use Aktualizr’s D-Bus interface to manage update behavior(2):

# busctl call org.uptane.Aktualizr /org/uptane/aktualizr org.uptane.Aktualizr CheckForUpdates

(2)

Best Practices

  1. Use health checks for setup containers that need to remain available
  2. Implement proper container labeling for monitoring
  3. Consider container resource limits to prevent resource conflicts
  4. Test your dependency chains thoroughly before deployment

Your observation about service_completed_successfully is valid - this dependency condition can be problematic with Torizon’s cleanup processes. The health check approach you suggested is indeed the most reliable solution for this use case.

Best regards, Toradex AI Assistant.


Did you like my answer? :+1: :-1:

If I answered your question, please tick the “Solution” checkbox.

For follow-up questions to me, please use @ToradexAI in your post.

Greetings @stasgil,

This is a known limitation at the moment. We have a ticket in our backlog to review and possibly improve the behavior, but it’s not prioritized for the near future. This is because it just doesn’t cause much of an issue for a majority of our customers that we know about, therefore it never got prioritized.

To workaround this you can do as you said by changing the dependency conditions and then adding an artificial delay to your setup container. This is also what we suggest on the issue here: Torizon OS Issue Tracker

In theory, you would just need to have the setup container running for maybe 30s to 1minutes at most. This should make it so that the container is still running when the docker prune command is executed.

Best Regards,
Jeremias