We have an application that previously used GLES 1.x and are converting to GLES 2.x. We’ve eliminated 95% of matrix math that needs to be done in the process of porting. We have pretty simple shaders. We know Exactly how many vertexes can be drawn and we allocate a VBO of that size. We then use glBufferSubData with the buffer in GL_DYNAMIC_DRAW, and then call glDrawArrays. We’re calling DrawArrays about 420 times per frame and BufferSubData about 30 times per frame. We have 3475950 vertices.
On GLES 1.x we spent about 10ms in the gl related code which just due the nature of the fixed function pipeline includes a LOT of other things happening as well. Our total runtime is very close to that number poor loop/frame. This includes any vertex/matrix math the GL 1 does whether that’s on the CPU or GPU.
On GLES 2.x we spend about 11ms exclusively in the CPU-GPU sync time of glBufferSubData and glDrawArrays. So after everyhting we’re about 20 fps lower overall on GLES 2 even after only needing to do 5% of the matrix/vertex math of GL 1.
My assumption is that the Vivante driver is more optimized for something that we aren’t doing. Whether it’s a different drawing methodology, using different sized data, changes in the already very simple shader. Maybe it doesn’t like binding textures although I’m not sure how GL 1 could’ve gotten around that either.
For Context, this is a very non-computationally intensive application. Running this on our laptops we get 165 fps, capped by refresh rate with plenty of head room.
Any suggestions as to what we could be doing wrong or info to look at for what Vivante/NXP has optimized the driver for would be greatly appreciated.
Hi, I’m the Toradex AI Assistant. The Toradex Support team will follow up with you shortly!
Did you like my answer?
Hello,
Thank you for reaching out with your performance concerns regarding the transition from OpenGL ES 1.x to 2.x on the Apalis iMX8QM module. The differences you’re observing can indeed stem from the shift to programmable shaders in OpenGL ES 2.0, which can alter performance characteristics compared to the fixed-function pipeline of OpenGL ES 1.x.
Here are some tailored suggestions to help you optimize your application:
Shader Optimization: Even simple shaders can benefit from optimization. Ensure that your shaders are efficient by minimizing operations and using precision qualifiers where appropriate. This can help reduce the computational load.
Efficient Buffer Management: While using glBufferSubData with GL_DYNAMIC_DRAW is suitable for dynamic data, consider exploring glMapBufferRange with flags like GL_MAP_WRITE_BIT and GL_MAP_INVALIDATE_RANGE_BIT for potentially better performance. This approach can help reduce CPU-GPU synchronization time.
Batching Draw Calls: Try to reduce the number of draw calls by batching them. This can significantly decrease the overhead associated with each call, improving overall performance.
Profiling and Analysis: Utilize profiling tools to pinpoint bottlenecks in your application. This can help determine if the issue lies in CPU-GPU synchronization, shader execution, or another area.
Experimentation: Since the Vivante GPU driver may have specific optimizations, experimenting with different buffer usage patterns and shader configurations could yield performance improvements.
By focusing on these areas, you may be able to enhance the performance of your application. I hope these suggestions prove helpful in your optimization efforts.
Best regards,
Toradex AI Assistant
Please, also check if these links can help you solve your problem:
Please note that the GPU features are inherent to the i.MX 8 SoC and are not unique to the Toradex module. In our OS images, we utilize GPU drivers and other related support packages provided by NXP. Therefore, I recommend posting your question on the NXP community to receive the most accurate and helpful response.