Double precision calculations on NEON or VFP?

I am building a C++ application for an Apalis iMX6Q that makes extensive use of double’s, and I would like to ensure that I am getting the maximum performance. However, I am a bit confused about NEON vs VFPv3 and when/how each of these is used.

Given the following trivial test program:

int main()
  double a = 2.234535463524;
  double c = 4.23462354234;
  double b = b/a;

  return 0;

With arm-gcc 6.2.0, I can specify the fpu: neon, or vfpv3. However, when I compile the above code (with -O0 to prevent the code from getting optimized away), I see no difference between these two options. In both cases, the division get translated to an vdiv, like so:

vldr.64 d18, [fp, #-28]
vldr.64 d17, [fp, #-12]
vdiv.f64 d16, d18, d17
vstr.64 d16, [fp, #-28]

This is confusing, since both the NEON and the vfp instruction sets include a vdiv instruction. So what determines whether a floating point operation is executed on the NEON or vfp units?

Thanks in advance,



As I understand it the NEON engine does provide the VFP functionality. See e.g. the i.MX6 reference manual, chapter Media Processing Engine (MPE - NEON) or the ARM information center.

Note the explanation of gcc’s -funsafe-math-optimizations option here which may or may not give a performance advantage at the price of not doing IEEE 754 math.


As far as I remember there is no VDIV in NEON only in VFP and NEON on non AArch64 is only single precision so XXX.f64 will more then likely be VFP instruction.