It really depends what you exactly want to do. If you need to be “in control” (CPU calculates stuff or something) between transfers, then it is probably hard to get lower with Linux.
However, if your use case allows you to send lots of messages in one batch without interactions in between, then you should be able to pool the transfers and send them down the stack in one batch.
The header file has some information on how to pack multiple transfers in a single message:
I would expect this to be already much faster by default. There also might be some optimization potential in the driver
If your application allows to batch transfers, then I am pretty sure you should be enable to achieve lower time between transfers.
However, if your application needs to do calculations between every transfer (closed loop style), then it will be much harder with Linux and the M4 approach may be worthwhile.