Issue SD Card Corruption

We are occasionally getting issues with data and directories on an SD Card becoming corrupted with WinCE7 on a Colibri T30. We haven’t found a way to reproduce the issue yet.

One thing I am speculating is that maybe sometimes we could be powering down without writes to the card having been finalised yet.

We power down from within a mixed C/C# application by toggling a discrete line (but enough time should have elapsed to allow all writes to the card to have been completed).

A couple of questions:
We have closed files e.g. via fclose(), but is there any way to make sure that any operating system cache in RAM has all been written to the SD Card, prior to powering down … i.e. a “flush everything” command?

Is there a more graceful way to control a discrete to power down - could we do this as part of the post amble if we exit our program?

SD Cards have their own life (a.k.a. controller) and it is hard to say what goes wrong. Some cards take quite a while to store data or do some internal maintenance (wear leveling, …), so it could well be that powering off may corrupts the filesystem. In my point of view this could even be the case if the card manufacturer promises to provide power fail save cards: The controller on the card basically does not know if you write real data or meta data (i.e. FAT blocks). So may it could happen that you have written some data and the FAT table gets updated or vice versa. It depends how the controller handles now a power loss, may be the last write gets reverted but the FAT table was already updated. May slide 4 and 6 of this presentation explain that a little bit better than I do.

There are a few things, other than protection against power loss, you could do:

  • Format your SD cards using TFAT.
  • Save your files in sub directories instead of the root directory. This should reduce the risk as the root directory listing has not to be refreshed that often.

Is there a Flush Command for SD Cards?

There is not such command for the external SD Memory device. You could selectively power down the disk or send a SD_CARD_DESELECT_REQUEST. According to the source code in public\common\oak\drivers\sdcard\sdclientdrivers\sdmemory, this should to the job.

Is there a more graceful way to power down?

To do that, you first need to have a way to find out if you had a power loss as well you need kind of a backup battery / power supply to run some longer after a power loss. Other than that this is useless. The same here as above: Do a card deselect or power down. If you have enough time to shutdown, you can also do a SetSystemPowerState and power down the whole system.

You would have to check this, but I believe that if the StorageManager is running you can find out what store relates to the SD card and call the IOCTL_DISK_FLUSH_CACHE function (as per the link)

http://developer.toradex.com/knowledge-base/ioctl_disk_flush_cache

You can enumerate the stores by using FindFirstStore and FindNextStore

STOREINFO storeInfo;
memset(&storeInfo, 0, sizeof(STOREINFO));
storeInfo.cbSize = sizeof(STOREINFO);

if (IsStorageManagerRunning() == TRUE)
{
    HANDLE handle = FindFirstStore(&storeInfo);
    if (handle != INVALID_HANDLE_VALUE)
    {
        wprintf(_T("Found store: %s\n"), storeInfo.szDeviceName);
        memset(&storeInfo, 0, sizeof(STOREINFO));
        storeInfo.cbSize = sizeof(STOREINFO);

        while (FindNextStore(handle, &storeInfo) == TRUE)
        {
            wprintf(_T("Found store: %s\n"), storeInfo.szDeviceName);
        }
    }

    FindCloseStore(handle);
}
else
{
    printf("Store manager is not running.\n");
}

Calling

hStore = OpenStore(L"<store name here>");
if (hStore != NULL)
{
    DeviceIoControl(hStore, IOCTL_DISK_FLUSH_CACHE, &forceFlush, sizeof(forceFlush), &res, sizeof(res), NULL, NULL);
    CloseHandle(hStore);
}

Should ensure that pending writes get flushed. I’d also check to see whether you create the file with FILE_FLAG_WRITE_THROUGH makes a difference. This is mentioned here: https://msdn.microsoft.com/en-us/library/aa516906.aspx

I’d be interested to hear the results.

Cheers

John

@JohnDr: Thanks for the hint with the FILE_FLAG_WRITE_THROUGH.

One remark about the FLUSH_CACHE: As far as I remember, this is not implemented for the SDIO memory card driver. At least I did not found it in the device. On the Tegra Family that currently is implemented for the NAND driver on T20.

Thanks to Samuel and John for your comments. At the moment we are still investigating it may or may not be due to power loss, but your comments are very useful.

@KCP: Any update, have you been able to improve the situation with any of the proposed steps?

Other issues took hold for a while …

We have increased the delays in our system after some operations, which will also increase the delay to power off (which is under software control). I am optimistic that this may have resolved the issue, but I am not certain yet.

I tried the TFAT. A couple of queries:

  • Once you have formatted as TFAT does WinCE 7 recognise it as TFAT automatically?
  • How do you know that a particular drive is formatted with TFAT (I formatted from the Control Panel in WinCE and it seemed to work, but I couldn’t find anything that actually told me it was now formatted as TFAT)?

Thanks

Unfortunately, the existing control panel does not show this information.
I can provide you a tool that does that: https://share.toradex.com/d78sqkcllb36gfw . If you want to check it in your own software, you can use CeGetVolumeInfo and check if lpVolumeInfo.dwFlags & CE_VOLUME_TRANSACTION_SAFE .

Please be aware, that these flags have some bugs on some CE versions.

Thanks again for this utility. One follow up question I have about this - if the TFATCheck.exe utility works OK (which it does); does that also mean that the flags you mention would work correctly?

Thanks.

If I format the cards as TexFAT I dont get corrupted files/filenames. I can view the files using Mobile Device Center when connected to a Windows PC.

HOWEVER if I remove the card and insert it into a PC, Windows gives me the message:

Do you want to scan and fix SD/mini-MMC/RS?
the options are Scan and fix (recommended) or Continue without scanning.

If I pick either choice, the disk is not accessible, if I attempt to open the disk in Windows Explorer, I get the message:

{driveletter} is not accessible.
The file or directory is corrupt and unreadable.

So even though the files are intact while the card is in the machine, the disk is not accessible after removing it and inserting it into a PC. I can replicate this 100% on demand.

Any suggestions?

@ed.nafziger: Desktop windows does not support TexFAT or TFAT fully. In TFAT you basically have two tables that manage the file system, where as in FAT there is only one. All “desktop” windows versions only read / write from one table. I remember having been using a TFAT formatted device in the PC and it worked fine, I have tested this on Windows 10. I did not found any official statement how this is handled. Worst case you may read old data as if you removed the device when not both TFAT tables have been in sync. May be this was also not supported earlier on desktop PCs. What version are you using?

I am using Windows 7 and Windows 10.

Both Windows 7 and Windows 10 can access a newly formatted TexFAT card, but after I write more than a few MB to them in Windows CE, then the card becomes inaccessible when put into the desktop PC.

@ed.nafziger: I used a TexFAT formatted USB stick and was able to write 426 MB without any issue reading on the PC. I only increased the amount of data step by step without deleting any data. How much data you wrote?

About 15 MB.
I’ve only seen this issue with SD cards, not USB flash drives, and I am using 2 SD cards.
For shutdown, I am closing all open file handles, sending the flush command as suggested above, waiting about 2 seconds, then “pulling the plug”. Is there anything else I should be doing as part of the shutdown sequence such as dismounting the cards? I do not go into suspend mode because I need to shut the system completely off, and I am currently not dismounting the cards because I didn’t think it was necessary.

@samuel.tx can you verify if IOCTL_DISK_FLUSH_CACHE is supported by the SDIO driver for T30? When I call DeviceIoControl as suggested it fails and GetLastError is 87 ERROR_INVALID_PARAMETER. If not, is there an alternative?

@ed.nafziger: No it is not there as well. Correct me if I am wrong, but as far as I remember disk cache is handled by the SD device itself. Some newer MMC standards (=> 4.5) support a cache control in the ext_csd register 33. We have not implemented this currently.

Updating from V1.4 to V2.0B2 has solved all the SD card issues for me. I am still using a 2nd instance of the SD card driver DLL,not sure if that makes any difference anymore. FYI: for my shutdown sequence I close all open files, call DismountStore on both SD cards, and then pull the plug. Using FAT32 vs exFAT and/or transaction-safe doesn’t seem to make any difference (all combinations seem to be working properly now without issues).

@ed.nafziger: After some more investigations we found out the issue you have seen on 1.4 is due to some caching issues we have solved in roadmap issue 18481. This fix solves also the SD Card issue. We have added an additional issue 21072 in order to document that there was also an issue with SD Cards.