PCIe DMA driver for FPGA (Xilinx)


have any of you experience with getting moderately fast data transfer (e.g. 50MByte/s) from an Xilinx Artix7 FPGA to an ARM Cortex CPU, in this case the one on the TK1 board, going?

I have looked at the Xilinx XDMA driver. But they explicitly state that that’s only guaranteed to work on x86 systems. One explanation for that I found was that ARM systems may not be cache coherent, and if the FPGA-as-DMA shoves data into the RAM, the CPU might still used old, cached data.
But I’d be “happy” if I was at that stage in the hierarchy of problems :wink:
I don’t know whether there are other aspects of the driver which might be x86-only without jumping in my face by using x86-only kernel functions or such, and which might explain unexpected behavior…

So if anyone knows an alternative to the XDMA driver, feel free to mention it. (Although that would also require a matching IP core if not xdma-compatible, right?)
Or if anyone has actually experience with getting this to work on an ARM target…
As the Xilinx forum hasn’t been of that much help with any details, I’m also looking in other places, as unlikely it may be that somebody used this kind of hardware constellation, too :slight_smile:

Btw., I’m aware of Xillybus, especially the price tag for it’s IP core… :wink:

What’s my experience so far:

I got the XDMA driver compiled against Linux kernel 3.10.105 (Linux 4 Tegra for TK1).

The user-interrupts via MSI-X IRQs do get allocated by the driver, their poll()'ing via provided character devices does not work, though. Which made me, for now, to poll the FPGA’s registers to know about the buffer write position and when I can fetch a block of data… draining the CPU horribly.

Reading/writing registers exposed by the FPGA configuration via /dev/xdma0_user (device file without DMA) works. But for one thing, using the DMA (via AXI-MemoryMapped interface) freezes the CPU silently (no interesting kernel messages), although reading one block of data from the FPGA’s designated DMA-able memory as a test works, so there should be no fundamental error in using this I guess.
Whether I have this CPU freeze depends on a) the driver (legacy vs. current version) and also when there have been changes to the FPGA configuration - in parts which should have nothing to do with the DMA memory or data arbiter register file…
The fact that user interrupts do not work and hence the polling scheme is easily disturbed by other things going on on the CPU, makes it harder to debug this (with lots of time consuming kernel prints and such).

Sorry, but we don’t really have any experience in integrating FPGAs via PCIe. However, you may find other customers on this forum having tried this. One thing you may try is using our mainline BSP and/or even the latest mainline or even -next Linux kernel.

Hi @sktpin,

We are looking for a XDMA driver for ARM CPU but the Xilinx one seems to be somewhat “buggy”.
The IRQ mode is not an option for us.
We are posting messages on Xilinx forum : https://forums.xilinx.com/t5/PCI-Express/XDMA-driver-issue/td-p/902373

If Xilinx cannot help us, we will modify the drivers (if we are good enough) or maybe consider xillybus.


Look in the Xilinx forum for a user called “dwd_pete” and his thread title containing something like “C2H driver broken”.

They do not support ARM targets, and even for x86 their driver is buggy at least with streaming mode.

That thread has a sub-discussion going on between two users, on how to make use of a third party modification of the XDMA driver, called “v2_xdma” by W.Zabolotny.

I have gotten that to work recently (it only supports C2H, not H2C), but only on x86_64 platform, not on ARM yet. If that’s an ARM vs. x86 specific problem, or of the older Linux kernel that the NVIDIA TK1 boards appear to be stuck to (kernel 3.10) vs. the newer kernel I used on the x86, I do not know yet.

We’re also providing 4.14 based BSP for TK1. It’s based on open drivers only, so some features are not supported(CUDA, video encode/decode, lvds). It may be good enough for you to see if that’s an issue of the old kernel.

Ah, thanks for that hint. I did make it run on x86 with 4.4.0 kernel, though, so that was not it.

I would suggest you to contact our partner greatcom, since they have good experience with Xilinx FPGA.