Accessing External Memory Bus (EIM) on iMX6 via memory map

I am trying to communicate with an FPGA connected to External Memory Bus (EIM) on iMX6.

On T30 I used the Toradex Beta DMA driver which presents the external memory bus as a character device. I could read from a given address using ssize_t pread(int fd, void *buf, size_t count, off_t offset); with fd being the file descriptor of the character device node and offset the relative address of the memory region exposed by the FPGA. E.g., to read 16bit from FPGA’s address 0x1FC:

pread( gmi, buf, 2, 0x1FC );

Now I try to read from the same FPGA on iMX6. I read that this is supported using mmap (s. External memory bus (EIM) on iMX6). I found some sample code on Connect a ARM Microcontroller to a FPGA using its Extended Memory Interface (EMI).

Let’s say I want to read from FPGA’s address 0x01F8 and 0x01FC. The FPGA is on CS1.

This is my code:

#include <stdint.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>


#define FATAL do { fprintf(stderr, "Error at line %d, file %s (%d) [%s]\n", \
__LINE__, __FILE__, errno, strerror(errno)); exit(1); } while(0)

#define MAP_SIZE 0x00001000
#define MAP_MASK 0x00000FFC

int main(int argc, char **argv)
{
        int fd;
        void *map_base, *virt_addr;
        unsigned long read_result, writeval;
        //off_t cs = 0x08000000; //CS0
        off_t cs = 0x0a000000; //CS1
        //off_t cs = 0x0c000000; //CS2
        off_t read1 = 0x01F8;
        off_t read2 = 0x01FC;

        if ((fd = open("/dev/mem", O_RDWR | O_SYNC)) == -1) FATAL;
        printf("/dev/mem opened.\n");
        fflush(stdout);

        map_base = mmap(0, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
                        cs & ~MAP_MASK);
        if (map_base == (void *) - 1) FATAL;

        printf("Memory (%p) mapped at address %p.\n", cs, map_base);
        fflush(stdout);

        virt_addr = map_base + (read1 & MAP_MASK);

        read_result = *((unsigned long *) virt_addr);

        printf("Value at address 0x%X (%p): 0x%X\n", read1, virt_addr, read_result);
        fflush(stdout);
        virt_addr = map_base + (read2 & MAP_MASK);

        read_result = *((unsigned long *) virt_addr);

        printf("Value at address 0x%X (%p): 0x%X\n", read2, virt_addr, read_result);
        fflush(stdout);

        if (munmap(map_base, MAP_SIZE) == -1) FATAL;
        close(fd);
        return 0;
}

And this is the output:

/dev/mem opened.
Memory (0xa000000) mapped at address 0x76fd0000.
Value at address 0x1F8 (0x76fd01f8): 0x1CFFF
Value at address 0x1FC (0x76fd01fc): 0x1CFFF

However, I expected to get two different values. What do I miss? Do you have any working sample code?

Hi

Unfortunately we do not have any sample code in regards to mmap and/or weim.
As the community post you reference states you can do what you do from the
command line with the devmem2 tool.

e.g. writting to connected SRAM:

root@colibri-imx6:~# devmem2 0x0a0001f8 w 0x01f855aa
/dev/mem opened.
Memory mapped at address 0x76f50000.
Read at address  0x0A0001F8 (0x76f501f8): 0x00000001
Write at address 0x0A0001F8 (0x76f501f8): 0x01F855AA, readback 0x01F855AA
root@colibri-imx6:~# devmem2 0x0a0001fc w 0x01fcaa55
/dev/mem opened.
Memory mapped at address 0x76fbc000.
Read at address  0x0A0001FC (0x76fbc1fc): 0x01F855AA
Write at address 0x0A0001FC (0x76fbc1fc): 0x01FCAA55, readback 0x01FCAA55

Reading this back:

root@colibri-imx6:~# devmem2 0x0a0001f8
/dev/mem opened.
Memory mapped at address 0x76f9f000.
Read at address  0x0A0001F8 (0x76f9f1f8): 0x01F855AA
root@colibri-imx6:~# devmem2 0x0a0001fc
/dev/mem opened.
Memory mapped at address 0x76fc2000.
Read at address  0x0A0001FC (0x76fc21fc): 0x01FCAA55

Using your ‘C’ program:

root@colibri-imx6:~# ./weim_mmap
/dev/mem opened.
Memory (0xa000000) mapped at address 0x76fec000.
Value at address 0x1F8 (0x76fec1f8): 0x1F855AA
Value at address 0x1FC (0x76fec1fc): 0x1FCAA55

One thing which can go wrong would be the address mapping to the pins.
One can have the byte address at the pins, for the 16 bit wide bus this would set the A0 pin to always be 0. Or one can have the word address at the pins, i.e shifting the byte address by 1 to the right.
Compare with ‘22.5.2 Bus Sizing Configuration’ in the i.MX6SDLRM reference manual.

The current configuration in the device tree is to shift the addresses, so when the address pins are set to 0 you get the byte0 and byte1, when the address is set to 1 you get byte2 and byte3 of your memory mapped device.

Note the fsl,weim-cs-timing property in the device-tree and its description in the device-tree bindings.

Max

I suggest reading both values at once - for example, using memcpy.

This is some test code I once wrote which you may try using; although it should be hardened with exception handling:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include <sys/mman.h>

#define RESET   "\033[0m"
#define BLACK   "\033[30m"      /* Black */
#define RED     "\033[31m"      /* Red */
#define GREEN   "\033[32m"      /* Green */
#define YELLOW  "\033[33m"      /* Yellow */
#define BLUE    "\033[34m"      /* Blue */
#define MAGENTA "\033[35m"      /* Magenta */
#define CYAN    "\033[36m"      /* Cyan */
#define WHITE   "\033[37m"      /* White */
#define BOLDBLACK   "\033[1m\033[30m"      /* Bold Black */
#define BOLDRED     "\033[1m\033[31m"      /* Bold Red */
#define BOLDGREEN   "\033[1m\033[32m"      /* Bold Green */
#define BOLDYELLOW  "\033[1m\033[33m"      /* Bold Yellow */
#define BOLDBLUE    "\033[1m\033[34m"      /* Bold Blue */
#define BOLDMAGENTA "\033[1m\033[35m"      /* Bold Magenta */
#define BOLDCYAN    "\033[1m\033[36m"      /* Bold Cyan */
#define BOLDWHITE   "\033[1m\033[37m"      /* Bold White */

#define MEM_ADDR 0x08000000

#define MAP_SIZE 0x00010000
#define MAP_MASK 0x00000FFF

#define BUFFER_SIZE MAP_SIZE

#define INTERACTIVE_MODE 0
#define VERIFY_MODE 1
#define TEST_CYCLES 4096

unsigned char readbuffer[BUFFER_SIZE] = {0};
unsigned char writebuffer[2][BUFFER_SIZE] = {{0},{0}};
unsigned char zerobuffer[BUFFER_SIZE] = {0};

int clearmem(unsigned char *dataPtr)
{
	// Memory Clear
	#if INTERACTIVE_MODE
	printf("Press ENTER to clear memory\n");
	getchar();
	#endif

	// Write zeroes to memory
	memcpy(dataPtr, zerobuffer, BUFFER_SIZE*sizeof(unsigned char));
	
	return 0;
}

long test(unsigned char *dataPtr,int buf)
{
	clock_t start, end;
	// Write Test
	#if INTERACTIVE_MODE
	printf("Press ENTER to begin the write test\n");
	getchar();
	#endif

	// Write to memory
	start = clock();
	memcpy(dataPtr, writebuffer[buf], BUFFER_SIZE*sizeof(unsigned char));
	
	#if INTERACTIVE_MODE
	end = clock();

	printf("Written to buffer:\n" BOLDCYAN "%.*s" RESET "\n", BUFFER_SIZE, writebuffer );

	// Read Test
	printf("Press ENTER to begin the read test\n");
	getchar();

	start = clock()-(end-start);
	#endif
	
	// Read from memory
	memcpy(readbuffer, dataPtr, BUFFER_SIZE*sizeof(unsigned char));
	end = clock();
	
	#if INTERACTIVE_MODE
	printf("Read from buffer:\n" BOLDCYAN "%.*s" RESET "\n", BUFFER_SIZE, readbuffer );
	#endif

	#if VERIFY_MODE
	if(strncmp(readbuffer,writebuffer[buf],BUFFER_SIZE))
		printf("verification failed\n");		
	#endif

	return end-start;
}

int main(int argc, char *argv[])
{
	int i = 0;
	int ret = 0;
	int fd = open("/dev/mem", O_RDWR | O_SYNC);
	unsigned char *dataPtr = mmap(0, MAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, MEM_ADDR & ~MAP_MASK);
	double time_elapsed = 0.0;
	
	printf("Parallel Bus read/write tests\n");
	
	// Populate write buffers
	for(i=0; i < BUFFER_SIZE-1; i++)
	{
		writebuffer[0][i] = (0x21+(i%94));
		writebuffer[1][i] = (0x21+((i+47)%94));
	}
	writebuffer[0][BUFFER_SIZE-1] = '\0';
	writebuffer[1][BUFFER_SIZE-1] = '\0';
	
	for(i=0; i < TEST_CYCLES; i++)
		time_elapsed += test(dataPtr,i%2);

	printf("Combined R/W Rate = %2.3lf MBps\n",((double)BUFFER_SIZE*sizeof(unsigned char)*2*TEST_CYCLES)/time_elapsed);

	#if INTERACTIVE_MODE
	printf("Press ENTER to Exit\n");
	getchar();
	#endif
	
	close(fd);
	return ret;
}

I cross-checked my oscilloscope setup on the T30 using the GMI. There I can observe the active-low CS signal going down and pin toggling on the address pins if I read from the character device.

But this is not the case with EIM on iMX6.

I would expect CS1 (alias EXT_CS1 or B26 of Toradex Demo Board X3 interface) to go low and some activity on the address pins if issue the following command:

root@colibri-imx6:~# devmem2 0x0a000001

But I see no activity on CS1 at all. Instead it seems to be low all the time although it is active low and even worse, I see CS0 and CS2 going up for ~3ns.

If I issue this command I see CS1 going up for ~3ns

root@colibri-imx6:~# devmem2 0x08000001

This seems to be totally wrong as 0x0800 0000 to 0x09FF FFFF should pull CS0 low from high instead.

What am I missing? Do I have to load some kernel module? Or isn’t the device tree configured to talk to a device with NOR-Flash-like or PSRAM-like interface?

Hi

Now would probably be a good point to reveal what version of things you are using. HW & SW.

I found a bug in the device tree. With the port to the 4.1 kernel the weim node needs some additions to get the memory map to chip select assignments right. I pushed a patch which corrects that. With the current device tree the full weim address range is mapped to CS0.

However with or without the patch I see the CS0, CS1 and CS2 are at a high level.

With an access to any memory location for the weim node with the old device-tree CS0 generates a pulse to low. (And with the patch applied depending on the address on one of the CS0 and CS1 chip selects. CS2 and CS3 seem to have an issue and accessing them raises a bus exception.)

I do not see a pulse on a not addressed CS.

The weim driver is compiled into the kernel, no kernel module is needed.

The device tree configures weim for an SRAM like interface with chip select, read/write, output enable, data- und address bus.

Max

Now would probably be a good point to reveal what version of things you are using. HW & SW.

Colibri iMX6DL 512MB V1.0A
toradex_4.1-2.0.x-imx

I found a bug in the device tree. With the port to the 4.1 kernel the weim node needs some additions to get the memory map to chip select assignments right. I pushed a patch which corrects that.

I switched to toradex_4.1-2.0.x-imx-next and now it works after adjusting the fsl,weim-cs-timing properties to our needs.

I do not see a pulse on a not addressed CS.

Nor do I after switching to toradex_4.1-2.0.x-imx-next. Nevermind, maybe I misconfigured the DT. I did not cross check with toradex_4.1-2.0.x-imx after I got everything working in toradex_4.1-2.0.x-imx-next.

CS2 and CS3 seem to have an issue and accessing them raises a bus exception.

The bus exception disappears after adding this code to arch/arm/boot/dts/imx6dl-colibri-eval-v3.dts :

diff --git a/arch/arm/boot/dts/imx6dl-colibri-eval-v3.dts b/arch/arm/boot/dts/imx6dl-colibri-eval-v3.dts
index 980bcdab8ddc..91fbf9b7c53f 100644
--- a/arch/arm/boot/dts/imx6dl-colibri-eval-v3.dts
+++ b/arch/arm/boot/dts/imx6dl-colibri-eval-v3.dts
@@ -288,6 +288,26 @@
                fsl,weim-cs-timing = <0x00810001 0x00000000 0x11004400
                                0x00000000 0x04000040 0x00000000>;
        };
+        /* SRAM on CS2 */
+       sram@2,0 {
+               compatible = "cypress,cy7c1019dv33-10zsxi, mtd-ram";
+               reg = <2 0 0x00010000>;
+               #address-cells = <1>;
+               #size-cells = <1>;
+               bank-width = <2>;
+               fsl,weim-cs-timing = <0x00010081 0x00000000 0x04000000
+                               0x00000000 0x04000040 0x00000000>;
+       };
+        /* SRAM on CS3 */
+       sram@3,0 {
+               compatible = "cypress,cy7c1019dv33-10zsxi, mtd-ram";
+               reg = <3 0 0x00010000>;
+               #address-cells = <1>;
+               #size-cells = <1>;
+               bank-width = <2>;
+               fsl,weim-cs-timing = <0x00010081 0x00000000 0x04000000
+                               0x00000000 0x04000040 0x00000000>;
+       };
 };

Thanks for providing this info.

Silly me for not seeing the CS2/CS3 issue.

Max

Thank you for pointing me at memcpy and the sample code. Now that the basics are working, this helps a lot.

I found a bug in the device tree. With the port to the 4.1 kernel the weim node needs some additions to get the memory map to chip select assignments right. I pushed a patch which corrects that. With the current device tree the full weim address range is mapped to CS0.

max.tx, can you please specify the exact commit containing the patch? I think the head of toradex_4.1-2.0.x-imx-next moved.

Is it possible that the patch isn’t in toradex_4.1-2.0.x-imx-next anymore?

Here you go:

http://git.toradex.com/cgit/linux-toradex.git/commit/?id=4e35498518a41c46fe42b686202475f04a6690ef&h=toradex_4.1-2.0.x-imx

With the release of 2.4b4 the commit got acked and merged into the regular toradex_4.1-2.0.x-imx branch.

Max