Colibri T30 eMMC issue

Hello!

We have about 20 modules of Colibri T30 with the eMMC device issue.
The problem is the same for all of them - OS does not start.

Please look at this thread for details:

Could we investigate the reason of issue more preciosly?
What have we to do to diagnose the problem?

Dear @protasovdg

Am I right that you are asking about the modules which could be recovered by using a new loader.nb0?

The development team is currently preparing an official release of the nvflash package (where loader.nb0 is part of). I will get back to you in the next two weeks with more detailed information. Thank you for your patience.

Regards, Andy

Hi, @andy.tx!

Not quite. I’m asking about any diagnostic tools or a methodic to recognize the problem more closely. We face it after migrating our panel controller’s modules from PXA320 to T30. And it is a really big problem because about 7% of panels returned by warranty with eMMC containing a corrupted OS image. The problem arises spontaneously after a few months of normal work. So far it looks like a hardware degrading. May be is there a some way to localize or to prevent this, like a check utility?

Dear @protasovdg
I’m afraid we don’t have any diagnostic tools for this problem.

We could do some investigations here:

  • Could you please send us the serial numbers of the failing modules?
  • Which OS version was running on these modules (was it WEC7 V2.1 on all modules?)
  • How often does your application write to the eMMC? I would like to understand whether there is a write only once every few days, or whether there is any log data written ever second.

Regards, Andy

  1. I requested the serials from out technicians, but unsure if they wrote done them all. Instead, they’re using our self serial numbers of panels, but these are not useful for you. I made some screenshots with the problem and may provide at least 2 serial numbers: 5154535, 5154545. Please look at attachments link text

  2. The all modules were running on version prior WEC7 2.1. It always was the stock image on them, not updated before installation. I can say some modules were running on WEC7 1.4

]3) Well, writing in our application is event-depended. In normal (not debug) mode it writes a small amount of bytes (<100) to eMMC a few times per minute. In the debug mode it writes log files more than 10 times per second and from the different threads.

Dear @protasovdg

Chances are high that the failure is related to the eMMC chip itself (either the physical flash, or the built-in eMMC firmware).

We used 3 different eMMC types on the modules, which can be distinguished by the manufacturer’s label:

  • V1.1a : Toshiba / Kingston
  • V1.1b : Micron

It would be interesting to see whether the problem is evenly spread between all eMMC types.

Flash Usage

Your application seems to use the flash quite intensively. This might possibly lead to wear-out of the flash. I didn’t check the precise datasheets, and I can’t know how exactly the eMMC firmwares behave, but please have a look at the very rough estimations below:

  • Writing data to the flash (no matter how small they are) lead to a write of several pages (the data itself, file allocation tables, lower layer NAND mapping tables). Let’s assume it is 5 pages in average.
  • A flash page can bare 1000 write cycles
  • A page is 8kB in size
  • Let’s assume there is almost no static data on your flash, so all 4GB can be used for dynamically written data.
  • For the calculation, I assume your application writes once every second

How long can your application run before the flash’s lifecycle is over?

  • In total you are allowed to write 1000 x 4GB = 4TB
  • Each write consumes 5 x 8kB = 40kB
  • ==> You can do 4TB/40kB = 100 million writes.
  • One year is 32 million seconds
  • ==> With 1 write every second your flash would last for about (100 mio / 32mio) = 3 years

However, there are effects which are not taken into account:

  • The calculation assumes that your devices runs 24/7
  • There are additional maintenance operations which the eMMC firmware probably does, such as static / dynamic wear leveling, defragmentation etc.
  • Writes which occur shortly after each other are cached and can be combined.

Again: all the assumptions above are very rough. I just wanted to show you that you might get in a critically high use of the flash.

What to try next

There are additional effects which could degrade the flash content, for example writes to adjacent cells (which probably also happens more often in your use case compared to our average customers).

If your modules are working fine again after re-flashing them, you might consider to update the bootloader, OS, config block and registry of your modules on a regular base, in order to refresh the flash contents.


Best Regards
Andy