Real-time scheduler in TorizonCore container

techczech · October 16, 2024, 6:20pm

For the timing critical application I referred to, we have three threads that need to hand off a message from a sensor coming every 5 milliseconds. With minimal timing latency and jitter. We have those threads running at a high real-time priority.

What we’re seeing is those threads seem to get swapped out sometimes, for up to 50 milliseconds. Usually less, but up to that. Again this same application with the same threads are very timing stable on Windows CE on a Colibri T30. We may be doing something wrong. But if you could tell us that yes even a high priority thread can get swapped out for even 10 milliseconds on TorizonCore with PREEMPT_RT and there’s nothing we can do about it, it would save us a lot of time investigating something we, in the end, can’t fix.

alex.tx · October 16, 2024, 7:03pm

Even with PREEMPT_RT, there can be situations where high-priority real-time threads are preempted or experience delays. While the PREEMPT_RT patch significantly improves real-time performance by reducing latency and improving thread scheduling, certain kernel activities (e.g., interrupt handling, certain kernel calls, or drivers with long execution times) can still cause latencies that exceed 10 milliseconds in some cases. If the system’s interrupt handlers are not optimized, they may take too long to execute, delaying real-time threads. Some drivers may not be fully preemptible, leading to delays if they are running with interrupts disabled or in a long, non-preemptible section.

Potential Fixes:

Pin your real-time threads to specific cores using taskset or sched_setaffinity. This reduces the risk of thread migration and minimizes context switching.
Move non-critical interrupt handling to specific CPUs, away from the CPUs running real-time tasks.
Use techniques like isolcpus (deprecated) or cpuset to reserve certain cores for real-time tasks and isolate them from general system processes.
Ensure that watchdog is disabled.

https://www.linux.org/threads/how-to-isolate-a-cpu-core-from-other-processes.46893/

jeremias.tx · October 16, 2024, 9:43pm

@techczech

After some conversation with my colleagues I’ve learned the following. It turns out much of the UART/serial drivers in Linux were not designed to be fully preemptive for real-time purposes. I believe the UART driver is always putting the data it receives into a queue which is then processed by a work-queue which you can’t assign a priority (because the PID always changes).

This could explain why at higher priorities you are observing instability. Now why this only occurs on the lower rate device and not the higher rate one, I’m not 100% sure. Could be different device drivers or processes are being invoked between your two devices. In any case, it’s apparent that the drivers here are probably not designed with real-time in mind.

Now it is theoretically possible to adapt/patch the respective driver code to be more real-time friendly. For example processing the data directly instead of putting it into a work-queue. Though such a change is not trivial to try and test.

For your best interest would you be willing to work with a partner of ours on this? This is starting to get outside of our own areas of expertise, but we may be able to find a partner that could better assist you on this.

Best Regards,
Jeremias

techczech · October 28, 2024, 6:06pm

Here’s an update on something we found that might help somebody in the future. This isn’t regarding the serial port interference mostly discussed above, it’s another timing issue I mentioned that we were having trouble getting to work like Windows CE on a Colibri T30.

The majority of that problem was resolved when we discovered a difference between the OS thread scheduling between Windows CE on a Colibri T30 and TorizonCore PREEMPT_RT in a docker container on a Verdin iMX8M Plus.

We have a thread that reads messages from a serial port. These are coming from a sensor at 200 Hz, 5 milliseconds apart. For one particular configuration, the timing is very tight, these messages need to be pulled in and sent to the next thread as quickly as possible. And we don’t have event driving I/O for serial ports, it’s a polling thread.

What we’d done originally on Windows CE was have a Sleep(0) in this thread when there was nothing new to read. Windows Sleep only has granularity to milliseconds, and in this instance sleeping even 1 millisecond was too long. The CE scheduler allowed this, you see core use go up but not redline on all 4 cores.

TorizonCore Linux PREEMPT_RT seems to penalize this. We’re using round robin real-time scheduling. Under that the threads are sharing the 4 cores by being given time slices by the scheduler. It appears there’s some kind of cap on how many time slices a thread gets for a given interval. And when it exceeds it, it gets swapped out for awhile. We switched to a usleep(100), sleeping for 100 microseconds instead of 0. And with that it works mostly as expected. There’s still an occasional glitch where there’s a small delay, we’re looking into that next.

jeremias.tx · October 28, 2024, 8:03pm

Good to hear you got that other timing issue sorted out.

Regarding the previous serial port timing issue. Did you still want to maybe consult with a partner of ours regarding this topic?

Best Regards,
Jeremias