SPI Write Speed Slow between iMX6 and DAC MCP4922

emmettbrown · May 10, 2019, 8:01am

Hello,

I’m developing an application the Colibri iMX6S where communicate with the Microchip MCP4922 DAC using SPI interface/protocol.

Everything works but I have an issue regarding the communication speed between the iMX6 and the DAC,

Since I need to write more than 10000 values in one second I need to achieve the best performances and here I have the issue: the speed seems to be lower to what I think it should be.

Infact since the DAC works with a 20 MHz frequency and iMX6 supports up to 23 MHz, it should be that writing 2 bytes for each value it should take normally 0,8 microseconds but I have measured instead that it takes normally 120 microsecons for each written value.

And this is a bottleneck for my application so please help me to understand how to speed up this communication.

Here is the configuration of the SPI communication;

        private const float LSB = 1 / 4095;
        public const int MAX_VALUE = 4095; //VOUT = InputCode*LSB/MAX_VAçLUE 4095
        public const int MIN_VALUE = 0;
        IntPtr spiHandle = IntPtr.Zero;
        gpio.uIo ioCs = new gpio.uIo();
        uint returnValue = 0;

        public uint DeviceAddress;
        bool bReverse_A_Reg;
        bool bReverse_B_Reg;
        bool bStarted;

        public MCP4922()
        {
            spiHandle = spi.Spi_Init("SPI1");                                                               ///< Init SPI library 
            if (spiHandle == IntPtr.Zero)
            {
                Program.cGlobals.cLogging.LogMessage("MCP4922 --> Error initializing the SPI Library");
            }
            spi.Spi_SetConfigInt(spiHandle, "SpiMode", 0, TdxCommon.ParamStorageType.StoreVolatile);        ///< Set SPI on Mode 0
            spi.Spi_SetConfigInt(spiHandle, "BitRateHz", 20*1000*1000, TdxCommon.ParamStorageType.StoreVolatile);  ///< Set SPI clock 26000 Lhz
            if (!spi.Spi_Open(spiHandle))
            {
                Program.cGlobals.cLogging.LogMessage("MCP4922 --> Failed to open SPI");
            }
            ioCs.number = 86;
            ioCs.type = (ushort)gpio.tIoType.ioColibriPin;

            spi.Spi_SetConfigInt(spiHandle, "ioCS", ioCs.GenericDefinition, TdxCommon.ParamStorageType.StoreVolatile);          ///< Set Chip Select pin sodimm 86
            spi.Spi_SetConfigInt(spiHandle, "BitsPerWord", 8, TdxCommon.ParamStorageType.StoreVolatile);
        }

And here is the Write Subroutine (where I’ve put the code with Stopwatch to take the exact time for the value writing):

public void WriteRegister(int DACNumber, int BufferControl, int Gain, int Shutdown, int iValue)
        {
            //DACNumber=0 => Registro A - DACNumber=1 => Registro B
            //BufferControl=0 => Unbuffered - BufferControl=1 => Buffered
            //Gain=0 => VOUT = 2 * VREF * D/4096 - Gain=1 => VOUT = VREF * D/4096 dove D è il valore del registro D11-D0
            //Shutdown=0 => Shutdown the selected DAC channel - Shutdown=1 => Active mode operation. VOUT is available

            if (DACNumber==0 && bReverse_A_Reg)
                iValue = MAX_VALUE - iValue;

            if (DACNumber == 1 && bReverse_B_Reg)
                iValue = MAX_VALUE - iValue;

            uint bufferLenght = 2;
            Byte[] DataToWrite = new Byte[bufferLenght];
            int SetupValue = DACNumber * 8 + BufferControl * 4 + Gain * 2 + Shutdown;
            DataToWrite[0] = (byte)(((iValue & 3840) >> 8) | (SetupValue * 16));
            DataToWrite[1] = (byte)(iValue & 255);


            try
            {
                unsafe
                {
                    fixed (byte* for_Casting_Intptr_to_Byte = DataToWrite)
                    {
                        Stopwatch sw1 = new Stopwatch();
                        sw1.Start();

                        returnValue = spi.Spi_Write(spiHandle, (IntPtr)for_Casting_Intptr_to_Byte, bufferLenght);
                        if (returnValue == 0)
                            throw new Exception();                                            ///< If Write operation returns 0

                        long ticks = sw1.ElapsedTicks;
                        double ns = 1000000000.0 * (double)ticks / Stopwatch.Frequency;
                        double micros = ns / 1000.0;
                        double millis = micros / 1000.0;
                        sw1.Stop();
                    }
                }
            }
            catch (Exception ex)
            {
                Program.cGlobals.cLogging.LogMessage("Scrittura Configurazione Fallita - Codice Errore: " + ex.Message);
            }
        }

I hope you can help me since without this improvement my application is not useful.

Thanks in advance

andy.tx · May 10, 2019, 12:11pm

Dear @emmettbrown

The bottleneck is not the SPI communication, as you calculated correctly. The overhead comes from setting up the transfer.
The biggest contributors to this setup delay are:

The .NET framework
Managed code needs to be converted to executed assembly instructions, and switching between managed code (your application) and unmanaged code (the SPI lib) also quite some time.
Handling the chip select
Due to some limitation of the SPI peripheral controller, we need to implement the SPI-Chipselect signal through a regular GPIO which is toggled in the library code. This creates quite some overhead compared to a hardware-driven chip-select.

If you want to know the details, you should take an oscilloscope and do some measurements.

To achieve better performance, you can try the following approaches (sorted from easy-to-achieve to hard-to-implment):

If your application and DAC allows it, do a large transfer which contains multiple samples. E.g. if the ADC accepts continues data reception (2 bytes per sample), just send 2000 bytes at once to get an output of 1000 samples.
Write the code to output a series of samples in native C. Do only one call from C# to your native-C function to output a series of samples.
Purchase the SPI Library source code and optimize it for your particular use case.

Regards, Andy

emmettbrown · May 10, 2019, 1:23pm

Thank you @andy.tx , I think the only way could be to write the native C-function but I don’t know how to do that.

Could you send me an example please?

andy.tx · May 13, 2019, 6:26am

Dear @emmettbrown

You can split your task in three parts. For all parts, there are plenty of public tutorials available in the web. I did a quick google search and picked some examples as a starting point for you:

learn C / C++
- Learn C
- You can also look at the demo applications which come with the ToradexCe libraries to get examples of accessing our libraries in C.
learn how to create a DLL

Walkthrough: Create and use your own Dynamic Link Library (C++)

learn how to interface between a DLL and your C# application

Look at the .NET demo applications which come with the ToradexCe libraries. Especially the file
\dotNet\TdxAllLibraries.cs
is interesting as it contains the link between the native-C DLL and the .NET application.

Regards, Andy

emmettbrown · May 21, 2019, 1:04pm

Hello @andy.tx ,

I’ve developed a C++ DLL where I control SPI, GPIO and PWM and the results are not very good.

My application is a Laser Show so I need to send to a DUAL DAC the X Position, the Y Position and the Laser Status (ON/OFF).

For this reason for each couple of X,Y points I need to enable the PWM and I need to release the DAC buffer for a simultaneus position of X and Y laser beam point.

For this reason I cannot send all the points to the DAC but I need to send one point at once.

My C++ takes 4 parameters XPos, YPOS, LaserStatus and arraylenght and operates all the animation so I’ve completely developed the point 2. since I make only one call from Managed to unmanaged code.

But surprisingly I had no very good performance: before an anymation made by 6400 items was done in 0.59 seconds, now the same animation is done in 0.54 seconds and it’s a result not good for me since I need to have max 0.2 seconds for that animation.

If you need I can post the library code.

Hope you can give me any suggestion to solve this problem.

Thanks

andy.tx · May 21, 2019, 3:18pm

Dear @emmettbrown

Your goal is to update a set of data points every 32µs (0.2s / 6400), but actually you can achieve an update rate of 84.4µs (0.54s/6400).

The actual data transfer takes roughly 2-3µs (3 x 16 bit / 20MHz), so this is not at all the performance bottleneck.
Your test has shown that the .NET didn’t add too much overhead (which is a positive surprise for me)

The conclusion is, that most of the time is used to setup the SPI transfers. This time needs to be reduced from about 80µs (for 3 transfers) to 25µs (for 3 transfers). I believe this can be done, but requires significant restructurion of the SPI library to optimize it for your usecase.
I recommend to purchase the source code of the SPI library, or access the SPI registers directly, without using the Spi library.

Regards, Andy

emmettbrown · May 21, 2019, 4:05pm

Just another information @andy.tx .

If I switch from i.MX6 to T20, should I change the Toradex Libreries and use the other ones for Tegra Modules?
Thanks

andy.tx · May 22, 2019, 6:40am

Dear @emmettbrown
I opened a new Post (“Windows CE Libraries Colibri iMX6 vs Colibri T20”) for this independent question.
Regards, Andy

emmettbrown · May 21, 2019, 4:02pm

Thanks @andy.tx , trying to update your library could be the best way but it seems that for there is “No SpiLib source code for iMX6 and iMX7”, as stated here:

Toradex CE Libraries and Code Samples

Also accessing directly to the SPI registers I didn’t find no informations on internet.

Could you point me in the right direction please?

andy.tx · May 22, 2019, 7:03am

Dear @emmettbrown

I’m sorry, you are correct, I didn’t verify the code availability - the SPI implementation is based on an underlying driver, which we are not allowed to publish the full source code.

The SPI controller is described in the NXP i.MX6 reference manual.
You can use the MapMemory library to get direct access to the SPI controller registers.

If you are looking for examples on how to use the SPI controller, there are two sources that come up to my mind:

The Linux implementation for the i.MX6
The FreeRTOS implementation for the i.MX7: This is for the additional Arm-Cortex-M4 core which is inside the i.MX7 chip. I didn’t verify, but I assume i.MX6 and i.MX7 do have quite similar (if not identical) SPI controllers, therefore also this i.MX7 code could be useful.

Regards, Andy

emmettbrown · May 22, 2019, 3:21pm

Thanks but I’m on Windows Embedded Compact so it sounds that the bottleneck limitation of the SPI protocol will be not solved. Very strange, 20MHz but the real speed is too slow.

andy.tx · May 22, 2019, 3:55pm

Dear @emmettbrown
Again, the reason is clear: the SPI library is not optimized for performance of small transfers.
Regards, Andy

emmettbrown · May 27, 2019, 8:21am

Hello @andy.tx ,

I’m try to run my application on a Toradex T20 512MB but the DLL I’ve developed doesn’t work on the T20 Module.

I receive the error "Can’t find PInvoke DLL ‘LaserLibrary.dll’.

The 3.9 Framework is installed, I’ve tryed also to reinstall it. The Windows image is the latest beta.

Could you please tell me what are the step for using my i.MX6 application on the T20 Module?

How can I use my DLL?

Can I use the same Toradex libraries that I’m using now (TdxAllLibrariesDll.dll) or should I use the Tegra libraries?

I hope you will help me.

Kind regards

andy.tx · May 27, 2019, 1:27pm

Dear @emmettbrown
Please move this into a new question, as it is unrelated to the original topic of the question.
Regards, Andy