How to select GPU affinity / configuration on IMX8 QuadMax

Hello,

The I.MX8 quadmax advertises a GPU which is actually made of 2 which can be used or as 2 independent ones or as a single combined one. Does anybody know how to configure this in Linux? I found a document from NXP (https://www.nxp.com/docs/en/user-guide/i.MX_Graphics_User’s_Guide_Linux.pdf) claiming that the environment variable VIV_MGPU_AFFINITY should be used for that purpose. But setting that environment variable in a shell before executing my program does not seem to change anything (still getting the same performance).
Actually I just want to be sure that I’m using the combined GPU and not just one of the 2… By the way I’m running on a rebuilt Yocto BSP3.0b4.
Thanks.

Hi @Edouard

Thanks for writing to the Toradex Support.

Could you provide the exact version of the hardware and software of your module?
What is your application?

Best regards,
Jaski

Hello @jaski.tx

As for the hardware:
SOM: Apalis iMX8QM 4GB WB V1.0B
dev board: Apalis Evaluation Board V1.1C

As for the software:
I rebuilt the Yocto linux distribution taking the latest from branch “LinuxImage3.0”, starting from image “console-tdx-image” and adding to it weston, Vulkan and opencv libraries

My application is benchmarking the performance of a compute shader for doing a image processing operation. I want to make sure I’m using both GPUs in combined mode for the benchmark results.

Hi

The Graphics Users Guide you referenced is the only source we have on the subject.
The way I read it not setting the environment variable should have the same effect as when you set it to 0, e.g. using both GPUs in combined mode.

So I would expect a performance decreases if you set the variable to 1:0.

You did export the variable, did you?

export VIV_MGPU_AFFINITY="0"
gpu-benchmark

Max

Hi @max.tx ,

I have to admit I don’t think I was exporting it. But still I tried what you suggest and it gives me the same performance both with
export VIV_MGPU_AFFINITY=“0”
and
export VIV_MGPU_AFFINITY=“1:0”

Hi @Edouard

Sorry for the late delay. It took us a while to come up with something useful. It seems this combined GPU feature most of the time introduces more problems than it solves. I did some tests with a simple benchmark and basically two GPUs are exactly as fast as only one GPU. This is also what came out during a discussion with Qt. It seems the combined mode brings no benefit at all. According to the GPU guide from NXP it depends on the use case. However, I’m not sure if there is a use case where you actually benefit. Maybe if you would have a super big GPU task you could benefit from the combined mode.

The even worse part is that if you use both GPUs the CPU load will increase! This is because it needs to do synchronization which is done by the CPU.

Here my test results with the combined mode:

time: 7.489401s
CPU usage: 0.172949s
time: 7.489414s
CPU usage: 0.167639s
time: 7.489903s
CPU usage: 0.162080s
time: 7.496867s
CPU usage: 0.162045s
time: 7.490236s
CPU usage: 0.162631s
time: 7.489923s
CPU usage: 0.162110s
time: 7.489679s
CPU usage: 0.165047s
time: 7.489054s
CPU usage: 0.163086s
time: 7.490137s
CPU usage: 0.163588s
time: 7.489239s
CPU usage: 0.163222s
time: 7.489798s
CPU usage: 0.161937s

And with only one GPU:

time: 7.495220s
CPU usage: 0.155143s
time: 7.487847s
CPU usage: 0.143683s
time: 7.487970s
CPU usage: 0.140843s
time: 7.488016s
CPU usage: 0.143670s
time: 7.487516s
CPU usage: 0.142648s
time: 7.487977s
CPU usage: 0.149872s
time: 7.488115s
CPU usage: 0.144729s
time: 7.487528s
CPU usage: 0.142921s
time: 7.487721s
CPU usage: 0.142992s
time: 7.487794s
CPU usage: 0.150275s
time: 7.488179s
CPU usage: 0.144302s

I modified kmscube to do the benchmark and did some stupid loop test. The program doesn’t t do any actual work just some matrix multiplications:

So I would only recommend to use the second GPU only when really running two programs on two screens and not using them in Combined mode. Hopefully NXP will fix this massive overhead in a future BSP.

Regards,
Stefan

Hello @stefan_e.tx

Thanks for the investigation. Can you let me know how you did switch from combined to single GPU usage (and vice versa) in your benchmark ? I just would like to give it a try on my own program if that is possible to see if I end up with the same conclusions. My program is using the GPU to perform a convolution filter on an image using a vulkan compute shader… I was hoping using the combined mode would allow for doubling parallelism on the GPU threads and so significantly more performance…

Hi @Edouard

I was using the environment variables as you did. I also grepped the libraries and it seems libgal is really using it so probably this is the way to go…
Combined mode:

export VIV_MGPU_AFFINITY="0"

One GPU:

export VIV_MGPU_AFFINITY="1:0"

Regards,
Stefan

In document 《i.MX_Graphics_User’s_Guide.pdf》, NXP has marked it out.

15.34 i.MX 8QuadMax dual-GPU performance
For some legacy applications with small texture/rendering size and less shader complex, dual-GPU performance
may become worse than single GPU mode, because the driver needs to take more CPU effort for dual-GPU
programming, and the driver overhead is more significant than GPU load in the hardware pipeline.
For such a kind of legacy case, the users can single-GPU to achieve better performance on the i.MX 8QuadMax.