We are having problems with clock on the specified setup. Throughout using T20 we’ve had some issues but most of them were resolved by different quickfixes done in our software on the board. However, latest problem started after removing RTCSync.exe over a year ago. We removed it because we noticed that it was slowing down windows reboots, making them take up to minutes instead of seconds. We resolved to reading the time from RTC ourselves and syncronizing it
So right now I’ve done many tests with T20 with its clock and the last discovery is as follows: When board boots up, every 24 hours it starts moves its clock backwards by around 20 seconds. This isn’t a single board that is behaving as such but we currently have 19 with running tests, all behaving the same. So it is very clear that some software is tinkering with it. I doubt that any RTC in the world would do such a thing? It’s still possible that our software is doing it, but I’ve removed all parts I can find the WinAPI SetSystemTime/SetLocalTime in our software.
So my question is: Do You know what could be causing this? We’ve been under the impression that RTCSync.exe is the only thing that does anything with the clock, but is there another process in the image that is doing some offset-fixing?
How did I reproduce this:
Removed any possible part where our software might do anything with the clock
Disabled RTCSync.exe from registry and reboot substation.
Every 5 minutes Our own application checks time from an NTP server, but doesn’t set the time
Application counts the difference in seconds and reports it
Here are few lines from our log, where way calculation is NTP servertime minus T20 GetSystemTime difference in seconds. Device was rebooted around 4.2.2019 9:16:00 and offset started at -68 seconds.
05.02.2019 09:15:06 SNTP time update from server Ok Offset was -66.5158 sec, (Stratum=2)
05.02.2019 09:20:07 SNTP time update from server Ok Offset was -82.0391 sec, (Stratum=2)
06.02.2019 09:15:22 SNTP time update from server Ok Offset was -81.1062 sec, (Stratum=2)
06.02.2019 09:20:23 SNTP time update from server Ok Offset was -101.3768 sec, (Stratum=2)
07.02.2019 09:15:43 SNTP time update from server Ok Offset was -100.4333 sec, (Stratum=2)
07.02.2019 09:20:44 SNTP time update from server Ok Offset was -120.6950 sec, (Stratum=2)
I would be very grateful on any information on this. Debugging this is slow as it takes 24 hours for this to happen. This also makes calibrating RTC hard when every time this happens, it is also set to RTC, so I cannot go around this in any way.
You need to understand that the time is kept in 3 different places in the system:
The external RTC M41T0
Battery-backed
An on-chip RTC clocked with 32.768kHz
Clocked independently of any power-saving modes (even during sleep or varying system frequency)
A software timer clocked by the main oscillator
Fast to access by software
All three timers are clocked independently, and therefore will drift apart from each other over time.
RtcSync takes care of synchronizing between 1. and the rest of the system.
There’s another piece of software defined by Microsoft, which takes care of the synchronization between 2. and 3, sometimes referred to as “SoftRtc”. And this is what you observe - once a day, SoftRtc synchronizes the software timer with the 32kHz timer.
If I remember correctly, there would be a way to disable the SoftRtc, but accessing the on-chip RTC is crazy slow and would significantly impact the system performance.
If you can give me some more details why you need to avoid the daily synchronization, I might be able to guide you to the best solution. Be aware that twice a year there’s a time leap for the change of daylight saving time anyway.
Side Note:
We never observed RtcSync to delay the system boot. The only explanation that comes to my mind is missing pull-up resistors on the i2c signals, which would block the RtcSync.exe from communicating with the M41T0.
Thank you fot the quick reply. The confirmation you have given is really valuable so that I will not have to do anymore pinpointing.
The reason I’m conserned is that our system has many functions based on the accuracy of local time. Mainly to control different relays at specific times. So if there is no access to NTP server, the clock isn’t very accurate after a year or so. With our software the daily offset is currently less than 20 seconds but still too much. Knowing that there’s a syncronization like this enables me to improve the system. Noticing that aside from our software and this syncronization, the offset is semi-regular that every day the systemtime is off by the same amount of seconds so it could be calibrated by software.
So what would You suggest to do in order to go around the jumping? Calibration for each machine is ofcourse always required but that is beside the point. We have our own DST handling with the possibility to disable it aswell so there’s no reason to address that. My thought is to disable the softrtc syncing completely but You make it sound like a bad idea so I’m open to suggestions.
About the RTCSync.exe
The pull-ups are there.
At one point we did several tests to see how our system recovers from a power down. The idea was to automatically cut power, ping it and if the ping doesn’t go through in a few minutes after boot → test fails. I do not have access to the logs and results from back then but from memory boot slowed down for some seconds after hundred reboots. Normally systems don’t do this many reboots but we still decided to look into it.
So the symptom was easiest seen when comparing a slowed down and not slowed down device. When device boots the windows desktop renders each icon veeeery slowly one by one whereas the non-slowed device rendered them instantly. We found RTCSync.exe to blame when we turned on the serial console by bootloader and saw that the output “Started RTCsync” (or smth) to “RTCSync done” took alot longer on the slowed down system. When the rtcsync was done, the system started to work normally again. Disabling the rtcsync.exe from registry made the slow device work fast again.
So with our own routines to read the RTC, we just decided to disable the RTCSync.exe from our image. We also thought that it was responsible for this kind of jumping back then and didn’t notice it wasn’t until now.
The only way to completely avoid the time jumping is to disable SoftRTC:
[HKEY_LOCAL_MACHINE\Platform]
"SoftRTC" = 0
As mentioned earlier, this can affect the system performance - mainly the worst-case latency. The reason is that reading the 32kHz RTC is slow, and during this reading the whole scheduler is locked out. If you don’t have fast realtime reaction requirements, you might be fine with this simple solution.
Control Time Synchronization
I also analyzed the behavior of the synchronization, and where the 24h sync period comes from:
The synchronization from the software timer to the 32kHz on-chip RTC happens on every call to SetSystemTime()
The synchronization from the 32kHz on-chip RTC to the software timer happens
24h after the last synchronization (24h after any of the above events)
With this knowledge you could take control over the point in time when the synchronization happens, and take corrective actions against a time jump. For example
Read the current GetTickCount() and the current Systemtime.
Call RefreshKernelAlarm()
Read the current GetTickCount() and the current Systemtime again
Calculate the difference in TickCount minus the difference in SystemTime. This is the time jump you observed.
Another possible solution would be to call RefreshKernelAlarm() frequently like once a minute, in order to keep the time jump acceptable small.
RtcSync
With your description I have an idea what could have been the reason for the delayed boot: RtcSync.exe registers itself to be notified on every time change.
In image V1.4 there was a bug, and RtcSync duplicated these registrations, probably on every reboot.
So I suspect that after thousands of reboots there might have been thousands of registrations which could lead to a significant delay.
This is fixed in newer images.
Thank you again for your response. After weighing the options, I’ve decided to go for the latter suggestion, calling RefreshKernelAlarm() every minute to keep the offset sensible. I’ll post how it goes when I have had a few days to see what happens. I’ll post the result here if I get it done.
About the rtcsync.exe. Can you confirm whether or not the slowing down happens ONLY on reboots, or does it also happen whenever rtcsync.exe is run? I confirmed that rtcsync.exe is not started on boot, but it is still run whenever SetSystemTime() is called. The inconvenience about this would be that time set to M41T0 RTC grows to be more and more unreliable by time. Maybe I should just get the latest version of rtcsync.exe and always overwrite it in boot as a workaround?
If RtcSync.exe is registered many times, the slowing effect would occur on every time change, not only on reboot.
However, i quickly checked my system here with a V1.4 image. The error I described in my previous answer did not occur, RtcSync.exe behaved correctly - maybe it was an even older image where we had this problem.
To be sure, please use the attached program to list all registered notifications. RtcSync.exe is expected to be registered only once.
Thank you for the program. Only one instance of RTCSYnc came up:
Notification 0x30000006 (not signalled)
\windows\rtcsync.exe
Arguments = AppRunAfterTimeChange
Type:CNT_EVENT
Event:NOTIFICATION_EVENT_TIME_CHANGE (The system time changed.)
Notification Period from 0000-00-00 00:00:00 to 0000-00-00 00:00:00
So I guess that this is OK? RTCSync is not registering itself more than once.
However, these came up thousands of times:
Notification 0x330019a1 (not signalled)
repllog.exe
Arguments = AppRunAtRs232Detect
Type:CNT_EVENT
Event:NOTIFICATION_EVENT_RS232_DETECTED (An RS232 connection was made.)
Notification Period from 0000-00-00 00:00:00 to 0000-00-00 00:00:00
Should I be worried?
Edit: This isn’t the same for all devices. One had only around 20 of these entries.
Edit: Consulting my colleague, he mentioned that whenever a serial port is opened, it consumes some memory from WCE. This must be the reason?
As you expected, RtcSync is fine. The multi-registration of repllog.exe is wrong, and probably responsible for slowing down the system under certain conditions.
repllog.exe is responsible for the ActiveSync connection between the Colibri and your PC. ActiveSync is using a virtual COM port, hence the registration to NOTIFICATION_EVENT_RS232_DETECTED (ActiveSync could also be done through a regular COM port).
With the multiple registrations, repllog.exe is started thousands of times on your worst-case module, whenever a new (virtual) UART connection is detected.
We quickly checked the UART driver source code, and it seems that we didn’t include any code to detect a new RS232 connection, so I assume it is all related to the virtual COM port driver used for ActiveSync.
How to continue
You should first understand, whether a USB-to-PC connection is something you use only for development (then the issue is probably not relevant), or whether it is also used for your end product.
Up to image V1.4, notifications survive power cycles. In later images, the notification database is cleared on each reboot.
The notification database can be deleted:
rename the file \Flashdisk\Windows\Registry\default.vol to a different name (deleting is not possible because the file is in use)
reboot and delete the renamed file
As a workaround, you could write a small application which deletes the extra notification registrations on each boot. The necessary API functions are documented at Microsoft:
Thank you for all the support on these cases. I’m sorry that the thread expanded up to three different matters (Clock daily offset, RTCSync.exe and RS232 extra registrations).
RS232 Registrations
As per your suggestion, we’re going to remove it based on your suggestion. We haven’t used ActiveSync in many years.
RTCSync.exe
So far we’re going to do nothing about this. Setting the clock has -1 - +1 seconds offset whenever writing time. We might at some point do a workaround to fix it by overwriting rtcsync.exe upon boot, but for now, we’re going to live with it as reboots are rare.
Time offset/jumping
I’ve now done the testing on how the clock behaves when calling refreshkernelalarm every minute. The clock isn’t by default accurate but atleast it’s consistent. This means that it’s now possible to do some calibration to the clock and keep it more accurate.