Apalis IMX6 wince freeze

We developed a custom base board for the Apalis IMX6 SOM. Wince is based on the bsp 1.0 beta 5 WCE7.
We are running a .net application and communicate by RS485 with a servo drive.

After 15 minutes or so the operating system freeze completely. Debugging the .net application result in lost communication.

Debugging the kernel results in the next error:
DEBUGCHK failed in file d:\bt\2644\private\winceos\osshell\commctrl\cmncore\shutil.cpp at line 137

Also on the debug uart port I see the next message appear:
4254 PID:E59D3049 TID:3EE0012 NK Kernel: DEBUGCHK failed in file d:\bt\2518\private\winceos\coreos\nk\kernel\arm\mdarm.c at line 457

Does anyone has some tips to find the exact problem?

We discovered some issues in the SD driver and we hopefully fixed them. In image 1.0b5 and previous the registry was stored on eMMC, that is connected to one of the SD controllers. An issue accessing the registry could generate unexpected behaviours in the OS.
We are going to release in a few weeks 1.0beta6 that has fixes on SD driver and also uses persistant RAM-based registry (as we have on Vybrid and PXA), this should improve the situation.
If you experience the same issue on beta6, having a small application that can help narrow down the issue would be very valuable to fix it.

Can you update me on when this 1.0b6 is released?

Thank you for the support. I am currently testing with this version and running a simple application. For now it is running stable. To run our real application we need to add .net CF 3.5 and .net CF 3.5 messages to the image. Installing the cab files is not working for the .net 3.5 messages.

Is there a possibility to build our own image, or can you build one for us. Or is it better to wait for the full release?

We just released the 1.0 beta 6 image, usually .NET CF is not part of the image, what issue are you facing with the messages? If you can open a new question on this issue we can work on it and fix the installer, if needed.

At first it looked the issue was solved at beta 6. But eventually it also crashed.
Also the beta 6 is not running as smooth as beta 5. I will try to provide a simple application today, that will reproduce this problem.

We have build our own image, to solve the problem with .net and support our own display.
I needed to add the next catalog items to get .net working:

  • SYSGEN_DOTNETV35
  • SYSGEN_DOTNETV35_SR
  • SYSGEN_DOTNETV35_SUPPORT

I also can provide the image we have build.

We have ordered the T30.

For building the image based on 1.0 beta6. I made a few changes.
In IMX6_Core7 properties I removed “prj_enable_fsreghive 1” from environment variables. Else I couldn’t build with RAM hive.
I have added the next extra catalog items:
SYSGEN_DOTNETV35 - SYSGEN_DOTNETV35_SR - SYSGEN_DOTNETV35_SUPPORT

And added some registries to OSDesign.reg
osdesign.txt

To freeze wince we did a few test.
We setup 10 devices running wince 7 1.0 beta 5.
First we ran our full application and all 10 devices eventually crash within 24 hours.
Some devices after 10 minutes some after 10 hours. We powercycled all devices every few hours, because we noticed that sometimes the device don’t crash but after a powercycle they do. Also there where some devices that wouldn’t startup at first and needed a powercycle to run again.

So at first we thought the problem is our application. So we wrote a very simple application that just sends some data to the comport. But after some time wince freezes up again.

Then we setup 2 devices to just boot in explorer and do nothing. Also those boards crasht after a while. It took much longer then running the full application, but they crashed eventually.

We did the same test with the wince 7 1.0beta6 and noticed no difference.

example app

We are now debugging for a few weeks now to find the problem of our crashing devices.
The image is completely stripped from all unnecessary drivers.
We are running the IMX6D on the ixora board.

The crashes will mostly appear 5 to 10 minutes after reboot. If it didn’t crash within an hour we restart the device. We are running Colibri monitor 1.6 to check if it crashed.
We noticed that if the board is crashing, it will most likely crash within 5 after a reboot.

I have attached the nk.bin, config and log.

Image

Config and Log

Thank you for the information, we are trying to reproduce those issues.
You have some applications running?
Can you try to disable L2 cache or run single-core (you can use the pex settings in the bootloader console to do this), to check if the issues are related to those features?

When disabling the pex.l2enable bit Wince will not boot.
We can only disable the pex.mpenable bit.
Now we are running a test with 7 devices at the same time with this bit disabled. We are running the colibri monitor and a simple .net comports application
(example app http://www.toradex.com/community/comments/865/view.html).

I am very curious if you can reproduce the issue.

We have run a test for 24 hours with 8 devices with multicore disabled. And it seems everything is working stable now. Can you explain what could be the problem? Is it a OS or driver issue?

Probably (and hopefully) it’s driver-related. OS-related issues would be quite more complex to fix.
Your application uses only serial port or it uses also other devices?
Would be possible to disable the serial communication part to see if you experience the same behaviour?

We have run the test also on devices where serial port wasn’t used, but that resulted in the same errors. Even with default images where no program was running or only the colibri monitor tool we have problems.