Missing data when logging continuously on eMMC

The application is a controller for a vehicle. The critical part of the software runs on the M4, and some processes on the A7 take care of user interactions and logging. The end product is not meant to be easily accessible during or after the vehicle operation, so signals logged are retrieved using the SCP protocol over the Ethernet interface.

Our system performs some computation on the M4 running FreeRTOS, then forwards the results to the A7 running Linux through shared memory (rpmsg-lite), then logs it to a binary file on the eMMC. We are talking about a relatively small data rate (around 160 kB/s). Analysing the logs, we observed that some data was missing; every 30 s there would be several seconds (up to 7) of missing data. Things we have already implemented:

  • Reducing the other Linux processes to
    a minimum
  • Pre-allocating the file size (this helped a lot)

Any idea how we could achieve lossless logging of our data? Should we modify the strategy and log on an SD card?

Here is the code portion that runs in a Linux process where the data is received from the M4 through RPMSG and the log file is written.

const char dev_filename[] = {"/dev/rpmsg_ept12.1"};

const char *log_filename      = argv[1];
int32_t log_file_size_max_i32 = strtol(argv[3], NULL, 0);

// Open log file
log_fd_i32 = open(log_filename, O_WRONLY | O_CREAT | O_SYNC);
if(log_fd_i32 < 0) {
  enable_log_i = 0U;
}

// Preallocate space for log file
double ratio = (double)sizeof(APSWin_bus_obj) / (double)sizeof(APSWlog_bus_obj);
posix_fallocate(log_fd_i32, 0, (uint32_t)(log_file_size_max_i32 * (1 - ratio)));

// Open logging char device node file
dev_fd_i32 = open(dev_filename, O_RDWR);
if(dev_fd_i32 > -1) {
  fds[0].fd     = dev_fd_i32;
  fds[0].events = POLLIN;

  // Initialise communication with the M4
  write(dev_fd_i32, &fdi_ui32, sizeof(fdi_ui32));

  // Infinite loop
  while(run_i) {

    // Check if data is already present in the buffer
    dev_bytes_read_i32 = read(dev_fd_i32,
                              read_buffer_i + dev_bytes_read_tot_i32,
                              READ_SIZE_MAX - dev_bytes_read_tot_i32);
    if(dev_bytes_read_i32 == 0) {
      retval = poll(fds, 1, -1);
      if(retval > 0                    &&
          !((fds[0].revents & POLLERR)   ||
              (fds[0].revents & POLLHUP)   ||
              (fds[0].revents & POLLNVAL))) {

        dev_bytes_read_i32 = read(dev_fd_i32,
                                  read_buffer_i + dev_bytes_read_tot_i32,
                                  READ_SIZE_MAX - dev_bytes_read_tot_i32);
      }
      else {
        // FDI Polling error
        fdi_ui32 |= 8U;
        printf("LLS_LOGa: Polling Error\r\n");
      }
    }
    dev_bytes_read_tot_i32 += dev_bytes_read_i32;
    if(dev_bytes_read_tot_i32 > READ_SIZE_MAX) {
      printf("LLS_LOGa: Synchronisation lost with the m4 \r\n");
      dev_bytes_read_tot_i32 = 0;
    }

    // Receive logging stucture
    if(dev_bytes_read_tot_i32 == sizeof(APSWlog_bus_obj)) {
      receive_type_i = 0U;
      dev_bytes_read_tot_i32 = 0;
      if(enable_log_i == 1U || enable_tlm_i == 1U) {
        file_bytes_write_tot_i32 = 0;
        file_bytes_write_cur_i32 = 0;

        BytePackAPSWlog_bus_obj((APSWlog_bus_obj *) read_buffer_i,
                                write_buffer_i,
                                &file_write_len_ui32);

        if(enable_log_i == 1U) {
          // Loop to make sure all the data is written to the log file
          while(file_bytes_write_tot_i32 < file_write_len_ui32) {
            // Write to log file
            file_bytes_write_cur_i32 = write(log_fd_i32,
                                             write_buffer_i + file_bytes_write_tot_i32,
                                             file_write_len_ui32 - file_bytes_write_tot_i32);
            if(file_bytes_write_cur_i32 <= 0) {
              // FDI Write error
              fdi_ui32 |= 4U;
              printf("LLS_LOGa: Write file error\r\n");
              enable_log_i = 0U;
              enable_inp_i = 0U;
              break;
            }
            file_bytes_write_tot_i32 += file_bytes_write_cur_i32;
          }
         logfile_bytes_tot_i32 += file_bytes_write_tot_i32;
        }
        // Send back fdi to M4
        if(write(dev_fd_i32, &fdi_ui32, sizeof(fdi_ui32)) != sizeof(fdi_ui32)) {
          fdi_ui32 |= 64U;
          printf("LLS_LOGa: Send fdi to M4 failed\r\n");
        } /*-?|LLS_review_DB_001|beaudette|c3|?*/
        else {
          fdi_ui32 = 0U;
        }
      }
    }

    // Reset file if size exceeds limit
    if((logfile_bytes_tot_i32  > log_file_size_max_i32) {
      printf("LLS_LOGa: Reset file \r\n"); 
      if(lseek(log_fd_i32, APSWLOG_BUS_OBJ_HEADER_LEN, SEEK_SET) != (APSWLOG_BUS_OBJ_HEADER_LEN)) {
        // FDI
        fdi_ui32 |= 32U;
        printf("LLS_LOGa: Reset file error\r\n");
      }
      logfile_bytes_tot_i32 = APSWLOG_BUS_OBJ_HEADER_LEN;
    }
  }
}


// Close opened files
if(log_fd_i32 != -1) {
  close(log_fd_i32);
}
if(dev_fd_i32 != -1) {
  close(dev_fd_i32);
}

return 0;

As a quick test, we tried removing the rpmsg processing by having only 1 process on the A7 writing dummy data to the eMMC at a rate similar to the real application. The results were better, but we still observed 4 data gaps of almost 1 second over a few minutes, which is quite a large overrun considering that we send a file write command every 0.02 s.

It’s hard to say anything without seeing your code for storing/writing a data. May be you have some internal buffers overrun while OS is busy doing some household tasks. But anyway constant writing to eMMC with such a rate will worn out flash very quickly.

What would you recommend for a continuous logging application?

How much data you need to keep on device?

Ideally in the order of 4 GB, although it does not have to be directly on the device; it could be on the carrier board for example. The objective is really to achieve lossless logging.

Hi @davidbeaudettengc

Could you share some sample code how you are logging?

What is your application?
What is the lifetime of your device?

Best regards,
Jaski

Hi @jaski.tx ,

I have updated the question to answer yours. The sample code omits the header writing at the beginning and other details, but the main filewrite operations are there. The lifetime of the device is at least 10 years.

For small amount of data (Up to few megabytes) FRAM memory is a best solution. But if you need 4GB it will be too expensive. For large amount of data you can use mini HDD like this one - https://www.amazon.com/Toshiba-MK2533GSG-Small-Factor-Drive/dp/B00JRJKFJQ . You can connect it to your system using USB - SATA adapter either external one like this - https://www.amazon.com/Micro-SATA-1-8-Adapter-Cable/dp/B0037JACXG or you can embed required IC to your carrier board. If shock and/or vibration condition doesn’t allow to use mechnical HDD you can use flash memory device like eMMC or SSD . However to get appropriate life time its size should be significantly bigger than required log size. For your case (4GB) it should be not less than 64Gb or so. To do a proper size calculation please follow https://developer.toradex.com/knowledge-base/emmc-linux#Health_Status

Here is a recap of the comments to close the question:

Thanks to Toradex staff for their support on this topic.

Hi @davidbeaudettengc

Thanks for your input.

You need to test if there is buffer overrun or not?
As @alex.tx suggested, you should use external storage, which can be replaced in case of failure especially if the lifetime of your product is 10 years.

By the way, what is your application? If you are using the SoM just for logging, then you might change to the iMX7D version with Nand flash or even iMX7S.

Best regards,
Jaski

Hi @jaski.tx ,

The SoM is used for more than logging; the M4 runs guidance and control algorithms and its their output that is logged on the A7. It’s pretty clear now that external storage is the most viable solution in the long run.

If there are any means of monitoring file write buffering or any low-level NAND driver operations that could cause the data outages, please let me know.

We are also considering moving all logging to a separate, dedicated device that would have an SSD. Can you recommend any Toradex product that support either SATA3 or PCIe Gen 3 NVMe 1.2?

The easiest would be to use one of our Apalis family of modules where mSATA and/or PCIe would be available. Then, make sure whatever storage medium you choose not only meets your transfer rate requirements but also keep the total amount of writes during it’s lifetime in mind.