Slow file system

Gerhard · November 8, 2018, 7:13pm

Hi,
finally it turns out that storing data on a USB stick is very slow.

First I want to use a micro SD card for data mass storage, but the driver has a known bug and so I have to use a USB stick.

I try to store 630 byte blocks.

The signal GH_DBGD is 1 during pushing a new block into a queue.
The signal GH_DBGC is 1 during a task gets a block out of the queue and write it to a open file.

In the beginning it looks fine, but the data writer task is falling behind.

I played around with task priorities, but it gets worst, not better.

I also have to do lot of post processing, this means reading big amounts of data (2GB) and write results to result files. This algorithm is much slower than expected.

I used a SoM, single core, 500MHz clock and as this computing power is too less, I try to switch over to a iMX7D, dual core, 2 x 1GHz, but … doesnt work out for now.

Any ideas to speed up things?

And yes, I used the same USB stick!!!

With best regards

Gerhard

raja.tx · November 9, 2018, 11:34am

Dear @Gerhard,

We are not sure if USB is slow. But if whole CPUs are loaded 100%, the file writing (including USB driver) might be interrupted by other tasks, and therefore get slow.
We would like to suggest you, find the bottleneck in your total approach (is it CPU performance, memory bandwidth, SPI throughput and etc., to have an initial idea of what to optimize).

Also, I quickly checked copying 35MB file from SD card to USB, it took 14 seconds to copy, ~2.5MB per second.

We highly appreciate if you share a reproducible project for this kind of issue and it will help us a lot to look the issue specifically without rewriting algorithm to reproduce the issue.

Gerhard · November 14, 2018, 7:16am

Hi Raja,
I used ofstream and found some comments that this is potentially slower than using ‘fwrite’ and friends. I rewrite my code and yes, thats right. I have ~800k/s (need 400k/s minimum).
But why?
My former SoC was a single core 500Mhz type and was fast enough to write my data to a file using ofstream and friends. The dual core 1GHz SoC didn’t manage that. I have to changes file IO to get ~ the same speed.
The measurement process is still the same, so there were the same Interrupts, the same Interrupt frequency and the same software running, exept the fwe lines for writing to the file.

With best regards

Gerhard

Gerhard · November 14, 2018, 8:25am

Hi Raja,
here the latest results.

Using ofstream:
[upload|r0tUrUfcjChsWGpO44IIg5vPsP4=]
As you can see, the file -IO gets slower. The rest of my program did the same from beginning to the end. Receiving data packets of 672 bytes each via SPI, put them into a buffer and after receiving 16800 bytes, write to a file. The signal GH_DBGC is set 1 just before out.write( … ) instruction and cleared just after that instruction. A queue between the SPI receiver and the file writer task prevents data loss.
[upload|fxjzANS1c+pPtE5x69fXrye6B7o=]
Here a detailed view.
This measurement cycle only runs 2 x 960 ms, but we also have cycles up to 2 x 8 seconds!!

I changed file-IO method to fwrite and friends. The rest of the software is absolutely unchanged.
[upload|EGByYI2N8T3JEjORu+9sXW2CI9o=]
As you can see, no slow down of write instruction any more.
Detail:
[upload|qqW/cBv1Dns7lqMk0O/ERYtW8k0=]
Writing 16800 bytes now needs 14ms at the end of a cycle, this gives 1.2MB/s if I am right.

I don’t know the code differences behind ofstream class and native file IO but, as the rest of the software (and the whole system) remains the same, there must something going on at exactly that place. Sorry.
Maybe cache, administration of system memory or resources … who knows.

I tried to write a test project. I managed to have native file IO, but for what reason ever, I can’t get the streamed file IO stuff running. I am not familar with C++, but a C++ guru like you will manage that within seconds. Maybe you can tell me afterwards, whats wrong. Thanks. I prepare an upload within the next hour. I just used the Gpio_Demo project and modify it. So we can do exact run time measurements, using a scope.

With best regards

Gerhard

Gerhard · November 14, 2018, 9:05am

Hi Raja,
you can find my simple test program here

Gerhard · November 14, 2018, 10:46am

Hi Raja,
just found out a really surprising fact (of interest).

My measurement consists of two phases, between this phases data recording stops for a short time. The slave system sends a signal to mark the last block of data of a phase. After receiving this last block I do a flush of the systems file buffers for safety reasons. If there is a crash (mainly during development) I may get usful infos.
I also write a marker in the file and I see this marker in the file afterwards, so I am quite sure, that this flush is only done once at the end of a phase.

I found out, that after this little break the write cycles to the file ( I left open all the time) need double the time as before.

During code review I only found one difference: I do a flush.

Now I comment out this flush instruction and voila, the write cycles during the second phase needing the same time than during the first Phase now. Ohhh.

Here the code snippet:
[upload|Dcd1DlvvDqL9Duj9JxYfAtNBf3Q=]
You see the both methods used and the flush instruction.

Mesurement results using stream:
[upload|2KplDCvoseIDjbh58qeBAhmmAAA=]
The out.flush() instruction is executed and time for file io doubles, see markers.

[upload|8q2M+zQmCpRR8PG3OpPRfOlm7JY=]
Without the flush instruction, it looks fine.

Measurment results using nativ file IO:
[upload|RPzQM2Dpqd2e/JZZfrtX/SM2ItQ=]
After fflush(…), we need double the time to write to the file.

[upload|6DrwWQ+I367MoEd0SOo10k0UOjI=]
Without the fflush(…) it looks fine.

It also turns out, that the runtime difference between using native file io and streams isn’t that dramatic but it is there.

As my software do the same things during all this measurements and only the flushes made the difference, there must be something going on in the implementation of the system itself, the block device driver or the USB driver, or …

I can’t use the micro SD to get closer to the problem, because Toradex told me that there is a knowen bug and it needs time to fix that.

With best regards

Gerhard

andy.tx · November 20, 2018, 3:27pm

Dear @Gerhard

It makes sense that flushing has a big impact on performance. Usually there’s a lot of caching in place to hide slow write operations. A flush blocks the program flow until all data is actually written.
Especially in a heavily loaded system, there might be a lot of data accumulated in the cache.

We cannot easily analyze the reason for the bottleneck in your system. Raja’s measurements showed a bandwidth of 2.5MBps, and if I got you correctly, 1.2MBps is sufficient for you.

I’m not sure whether your problem is solved by skipping the fflush(). If not, I suggest to find optimizations for the write performance, independent of the rest of your application first. Try to use different block sizes to write, or move the write operations into a separate thread, so it can run on the second core, …

I’m afraid we cannot do the optimizations for you, we only can give you hints what to try.

Regards, Andy

Gerhard · November 20, 2018, 3:43pm

Hi Andy,

thanks, but you miss the point.

Yes, flush takes its time.

But whats the explaination for this behaviour:

Write 10 blocks without flush, each write operation needs some time x. Than do ONE flush, which takes its time and than write 10 blocks again (without flush) and now the write time for each block is ~ 2x.

It works here, no further optimization needed, but maybe it is a hint to some problem in the file systems implementation.

I expected, that after the flush operation, the time to write a block is approximately the same as before the flush();

I alo pointed out the difference between native file IO and using streams, maybe this info is useful for one.

With best regards

Gerhard

andy.tx · November 21, 2018, 8:16am

Hi @Gerhard

Thank you for the conclusion - I really missed the point before.

We will look into this issue, but I fear we will not be able to get rid of this behavior with reasonable effort. Most of the algorithms are hidden in code layers where we don’t have source code of.

Regards, Andy

Gerhard · November 21, 2018, 8:30am

Hi Andy,
I used WEC2013 running on another SoM (AT-501 by Shiratech) with a BSP from dab-embedded. I can’t detect this behaviour during my development and testing.
So, my guess is, that it can’t be part of WEC2013 layers, but maybe I am wrong.

There is the issue open writing/reading to SD card you told me is fixed to end of the year, maybe this is related some how.

With best regards

Gerhard