Hi team,
A Japanese partner is looking to implement their AI framework in our Apalis iMX8 with OpenCL and forwarded me some questions.
They are using SqueezeNet for CPU and yolo_tiny for GPU.
Can we change the GPU memory size?
Currently, the GPU memory size setting is 256MByte.
How can we change the setting to 512MByte, 1GB … etc?
We already tried the following method, but didn’t seem to change:
- adding
gpumem
argument for Linux Kernel parameter - changing DeviceTree’s register mapping node of
imx8_gpu_ss
OpenCL driver doesn’t support fp16
On the document for Vivante GC7000, fp16 Gflops seems to be two times faster than fp32.
Vivante GC7000 GPUs Deliver Desktop-Class Graphics to Mobile Devices
http://www.vivantecorp.com/index.php/en/media-article/news/277-20140403-vivante-gc7000-delivers-desktop-graphics-to-mobile.html
But, the OpenCL driver doesn’t seem to support fp16.
CL_DEVICE_EXTENSIONS only indicate the following flags :
CL_DEVICE_EXTENSIONS :
cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_gl_sharing
If the driver supports fp16, cl_khr_fp16
flag will be included.
Do we have any further information about enabling the fp16 feature on i.MX8?
We’ve checked some old information but not sure if there is any further information on the topic:
Kindly let us know.
OpenCL kernel compiler hangs on specific array size
The following kernel code will hang during clBuildProgram
.
__kernel void test(__global int* dst, __global int* src, int mode)
{
int pos = (int)get_global_id(0);
const int ofstbl[9] = {1, 3, 7, 5, 6, 4, 8, 0, 2}; // NG
// const int ofstbl[8] = {1, 3, 7, 5, 6, 4, 8, 0}; // OK
dst[pos] = src[pos] + ofstbl[mode];
}
-----
The following errors output for stdout.
-----
double free or corruption (!prev)
Aborted (core dumped)
It seems that when the array size is more than 9 it will hang.
Kindly check and fix this issue, please.
Which benchmark software do we use for validating OpenCL performance?
Basically, what is the benchmark software do we recommend for OpenCL on i.MX8 platform?
Copy from GPU to CPU is around 10 times slower compared to a write from CPU to GPU
We tested the performance of data copy between CPU and GPU.
When it transfer from GPU to CPU is 10 times slower than from CPU to GPU.
We use clEnqueueWriteBuffer
for copying from CPU to GPU,
clEnqueueReadBuffer
for copying from GPU to CPU.
Do we know why these differences happen and how to solve it?
We think it is related to DMA but any help would be appreciated.
Finally, they also asked for the reference manual for i.MX8 and the latest version I have is from June 2018. Is there any newer version?
Thanks and kind regards,
Alvaro.