[TorizonCoreBuilder] Dropped capabilities while building image

Hello,

I am facing an issue with TorizonCoreBuilder. I do have a container that needs to modify the host’s Ethernet interface. I was able to create such container, but only if I manually upload this container via ssh - if I try to bundle it to TorizonCore image with TorizonCoreBuilder it does not work (I get errors regarding permissions).

Tested workflow:

  • build image with TorizonCoreBuilder,
  • Install the image with TorizonEasyInstaller,
  • Init container will log permission errors
  • Stop docker-compose service on Apalis
  • Remove init container image from Apalis
  • provide exactly the same container image to Apalis via ssh
  • start docker-compose service
  • now everything is ok

My dockerfile:

# Make sure we don't get notifications we can't answer during building.
ENV DEBIAN_FRONTEND="noninteractive"

# your regular RUN statements here
# Install required packages

    # Install needed packages
    && apt-get -q -y update \
    && apt-get -q -y install net-tools \
                        libcap2 \
                        libcap2-bin \
                        iproute2 \
                        iptables \
    # Clear apt-get
    && rm -rf /var/lib/apt/lists/*

# Copy executive/config to container
COPY src/init.sh /init/init.sh

RUN chmod 555 /init/scripts/init.sh

# Allow to tweak with network configuration
RUN setcap cap_net_admin=eip /sbin/ifconfig
RUN setcap cap_net_admin=eip /bin/ip
RUN setcap cap_net_admin=eip /usr/sbin/xtables-nft-multi

USER torizon

WORKDIR /init

CMD ./init.sh

init.sh:

#!/bin/bash

# Set MAC address
MAC_ADDR="01:15:16:17:18:19"
ifconfig ethernet0 down
ifconfig ethernet0 hw ether $MAC_ADDR
ifconfig ethernet0 up

dokcer-compose.yml:

version: "3.8"

services: 
    init:
        image: localhost:5000/module/init_devel:0.0.3
        container_name: init
        restart: "no"
        stdin_open: true
        tty: true
        network_mode: "host"
        cap_add:
            - NET_ADMIN

tcbuild.yaml:

input:
    easy-installer:
        local: torizon-core-docker-apalis-imx6-Tezi_5.3.0+build.7.tar
customization:
    device-tree:
        include-dirs:
            - device-trees/include/
        custom: device-trees/dts-arm32/imx6q-apalis-eval.dts
    filesystem:
        - changes1/
output:
    easy-installer:
        local: torizon-core-docker-apalis-imx6-test
        bundle:
            compose-file: docker-compose.yml

I am developing on Apalis imx6,
TorizonCore 5.3.0,
ToradexEasyInstaller 5.3.0
TorizonCoreBuilder 3.2.0

Best Regards,
Kacper

Greetings @Kacper,

Unfortunately I can’t quite reproduce your issue. The Dockerfile you provided seems to be incomplete, it has no FROM statement so I can’t build it.

Despite that let me make a few comments of what I’m seeing here. First of all could you provide the exact permission error you say you’re getting here?

Also it seems weird that on initial boot this init container fails but, manually restarting it seems to work. I can’t imagine a scenario off the top of my head where this would occur.

Finally, I understand you’re doing this in order to configure the host system’s Ethernet interface to your needs. However the method you’re using seems quite odd. Why use a container to do all this? Would it not be simpler to just create a systemd service or init script that does this configuration on the host itself?

I don’t see the value of trying to do such a configuration from a container. But maybe there’s something about your use-case/requirements that I don’t know that would make this seem more logical.

Best Regards,
Jeremias

Sorry, about that - I’ve missed the first line in Dockerfile:

FROM --platform=linux/arm torizon/debian:2-bullseye

The error is: SIOCSIFFLAGS: Operation not permitted

Also it seems weird that on initial boot this init container fails but, manually restarting it seems to work. I can’t imagine a scenario off the top of my head where this would occur.

It is actually not the restart itself that helps because rebooting system or just restarting docker-compose service does not help - the container image needs to be removed and uploaded manually.

Finally, I understand you’re doing this in order to configure the host system’s Ethernet interface to your needs. However the method you’re using seems quite odd. Why use a container to do all this? Would it not be simpler to just create a systemd service or init script that does this configuration on the host itself?

First of all I would need to compile whole torizonCore myself to provide such init script/apps (what I have send you is just 1 basic operation - actual script does much more and init container provide 2 additional C apps) - I tried but it will take some time (faced some issues) so I left it for further releases. Unless there is other solution to provide own scripts/apps?
Provided apps in this container also need to communicate with other container - I use IPC so it would not really be an issue but this other container would need to use host’s IPC namespace.

First of all I would need to compile whole torizonCore myself to provide such init script/apps (what I have send you is just 1 basic operation - actual script does much more and init container provide 2 additional C apps) - I tried but it will take some time (faced some issues) so I left it for further releases. Unless there is other solution to provide own scripts/apps?

There is actually a way to do this without recompiling TorizonCore. The TorizonCore Builder tool can capture changes done on a device and create a new TorizonCore image from these changes. Say for example you create a systemd service in /etc that executes your init.sh script, which you could also store in /etc. Then with TorizonCore Builder you can isolate these changes, capture them, and create a brand new image without having to go through a lengthy recompile. This way you can have an image with your host configurations/settings without having to rely on a container to do these operations first.

The workflow is described in more detail in this article: Capturing Changes in the Configuration of a Board on TorizonCore

I would recommend this method if possible.

Best Regards,
Jeremias

Okay - you’re right I can do it that way (I’m familiar with capturing changes).

Nonetheless as I have mentioned it is not just init script, there are other apps performing ipc communication, storing internal configuration on docker volume etc. I can design workaround for all problems but still - for me it is workaround, not the solution.

May I ask you to reproduce the problem? For me it looks like some problem with TorizonCoreBuilder and it would be highly appreciated if someone in Toradex could confirm that and fix in the future.

Well I was able to reproduce this and it honestly makes no sense to me why this occurs.

For reference I did the following:

  • Build the container as you specified
  • Use TorizonCore Builder to create new image with this container bundled
  • Install image and boot with Easy Installer.
  • Check docker logs on init container. I can see the permission errors you were talking about.
  • Manually start container with a terminal inside via docker run -it --cap-add NET_ADMIN --network=host <image name>.
  • I try to execute script from the container’s terminal I still get permissions error. I even try just running ifconfig ethernet0 down still permissions error.
  • I then remove the container image from the device completely. I pull the image down again from Docker Hub.
  • I run it again it now works for some reason.

This more or less sounds like what you experienced. I’ve never seen such behavior before and it’s not obvious why it occurs. I’ll report this to the team for investigation. In the meantime you may need to implement a work-around since I’m not sure how long this will take. But I’ll keep you updated on any major updates.

Best Regards,
Jeremias

Thank you very much for checking! It sounds exactly like what I have experienced.

For now I will use some workaround and wait for any news on that topic.

Best Regards,
Kacper

Also, if you uncover any additional information regarding this issue please let us know on this thread.

Best Regards,
Jeremias

Hello @jeremias.tx

Any news on this topic? On my side I do not have any new clues.

Best regards,
Kacper

Unfortunately I don’t have any news or updates on the issue. The issue is still on the team’s list, but other issues have taken precedence since. Though initial investigations here also didn’t uncover any new useful information.

Your workarounds may just need to be the alternative design here, since I can’t guarantee you any timelines at the moment.

Best Regards,
Jeremias

@Kacper,

We still do not have a fix for the underlying issue since it seems to be a bit more complex than originally thought. But, we have found some workarounds for your specific situation here at least.

As I originally thought there seems to be some details that get lost in the bundle process. Namely the effects of setcap seem to not be retained after bundle. I was able to not have permission issues by running setcap again in the container.

A possible workaround is setting the suid for those binaries rather than using setcap. For example using chmod u+s instead of setcap seems to work as well and may carry through the bundle process.

Could you give this a try and let us know whether it’s suitable for you?

Best Regards,
Jeremias

Hello @jeremias.tx ,

Thank you very much. I have tried your approach and it works correctly. This solution is well enough for me. Thanks for your support!

Best regards

Glad to hear it works for you as well.