Colibri iMX6ULL: ECC

Hi team,

A customer is asking very thorough questions regarding “the measures against bit corruption (caused by Read Disturb)”. This question has gone all the way from “What measures Toradex takes to prevent memory corruption”, which I explained the following:

  • In the file system layer, UBIFS takes some precautions on the continuous usage of the flash through wear leveling, by avoiding repeatedly writting the same blocks again and again.
  • The MTD drivers keep control of the Bad Block tables and update accordingly if a erase or a write operation cannot be performed (Correct me if I’m wrong)

However, the next step has been to check what happens when the error is already there (to which I assume ECC should take place), and they wanted proof that any kind of correction method is applied.

I saw that the following are used in iMX6ULL config:

CONFIG_MTD_NAND_ECC = y
CONFIG_MTD_NAND = y
CONFIG_MTD_NAND_IDS = y
CONFIG_MTD_NAND_GPMI_NAND = y
CONFIG_MTD_NAND_MXC = y

I’ve been checking our code in drivers/mtd/nand where I believe that all these are covered.

I could find what I believe it is the generic ECC code called whenever a fail is found (nand_correct_data in nand_ecc.c) but I couldn’t find a further reference for when this is called:

  • Under imx6ull-colibri.dtsi, in the GPMI node I could only see the nand-ecc-mode = "hw" property
  • nand_correct_data reference can be found only under the soft_ops calls.
  • Under gpmi-nand.c, I could found BCH references in the device data ( bch_max_ecc_strength = 40 ), but I don’t believe to be used.

Sorry to ask you this but at this point a detailed answer would be required with exact points of where this is applied and used. Do we actually offer SW ECC in the iMX6ULL?

If not:

  • Do we have any error correcting measures in case an error is found while reading memory?
  • Do we have any kind of confirmation method used at read than an error is found at least (maybe with the help of the OOB or any parity bits?)?
  • Any additional memory error counter-measures that we are implementing in our BSP worth mentioning?

Many thanks and regards,
Alvaro.

A customer is asking very thorough questions regarding “the measures against bit corruption (caused by Read Disturb)”. This question has gone all the way from “What measures Toradex takes to prevent memory corruption”, which I explained the following:

In the file system layer, UBIFS takes some precautions on the continuous usage of the flash through wear leveling, by avoiding repeatedly writting the same blocks again and again.

Yes, exactly. One part of any raw NAND storage system is about wear levelling (please excuse my ignorance of using proper Cambridge English (;-p).

The MTD drivers keep control of the Bad Block tables and update accordingly if a erase or a write operation cannot be performed (Correct me if I’m wrong)

Yes, while the UBI layer does the wear levelling the MTD layer concentrates on the bad block management.

However, the next step has been to check what happens when the error is already there (to which I assume ECC should take place), and they wanted proof that any kind of correction method is applied.

Such prove could be given by running them MTD tests:

I saw that the following are used in iMX6ULL config:

CONFIG_MTD_NAND_ECC = y
CONFIG_MTD_NAND = y
CONFIG_MTD_NAND_IDS = y
CONFIG_MTD_NAND_GPMI_NAND = y
CONFIG_MTD_NAND_MXC = y
I’ve been checking our code in drivers/mtd/nand where I believe that all these are covered.

I could find what I believe it is the generic ECC code called whenever a fail is found (nand_correct_data in nand_ecc.c) but I couldn’t find a further reference for when this is called:

Under imx6ull-colibri.dtsi, in the GPMI node I could only see the nand-ecc-mode = “hw” property

Exactly, the whole ECC handling is done in hardware by the GPMI NAND controller within the i.MX 6ULL SoC.

nand_correct_data reference can be found only under the soft_ops calls.

Under gpmi-nand.c, I could found BCH references in the device data ( bch_max_ecc_strength = 40 ), but I don’t believe to be used.

Sorry to ask you this but at this point a detailed answer would be required with exact points of where this is applied and used. Do we actually offer SW ECC in the iMX6ULL?

No, why would anybody want to do slow software ECC if fully integrated hardware ECC is available?

If not:

Do we have any error correcting measures in case an error is found while reading memory?

Yes, “we” use the regular Linux kernel which employs such things as a GPMI NAND controller driver handling the lower-level details of all this.

Do we have any kind of confirmation method used at read than an error is found at least (maybe with the help of the OOB or any parity bits?)?

Yes, hardware ECC just like any ECC mechanism relies on redundancy information which is usually stored in the raw NAND OOB area.

Any additional memory error counter-measures that we are implementing in our BSP worth mentioning?

I don’t believe there currently are any. Especially concerning the read disturb thematic there was some discussion ongoing of having some kind of a top-level daemon which could do further such counter-measures. However, as far as I know, no conclusion was reached. One possible way to prevent read-disturb resp. avoid further issues caused by it would be to periodically re-read all data which would then explicitly trigger ECC failures upon which blocks could be re-written or completely abandoned once deemed too bad.

However, please note that read disturb is believed to mainly (or even only) be an issue on multi-level cell (MLC) flashes but luckily so far our raw NAND modules only use single-level cell (SLC) ones.

Hello,

one question connected to the use of the ECC on this board. We want to improve the affidability of our uboot partition. In our dtsi file we don’t have the nand-ecc-mode parameters and was happen that the uboot didn’t be able to load the linux system giving us a bad block error. This probably was connected with some ESD discharge now solved doing more attantion with the device mounting. For enable this feature on the uboot is only necessary to add the nand-ecc-mode = “hw” property and check the iMX6ULL config?

Thanks

I am unsure to what exactly you mean by affidability.

Concerning ECC, this should already all be properly configured so I am unsure what exactly you are looking for.

And please avoid hijacking old especially already answered threads but rather ask a new question exactly stating what hardware (module and carrier board) and software versions of things you are talking about. Thanks!