Linux BogoMIPS count on colibri-vf


Why linux boot log or /proc/cpuinfo show about 331BogoMIPS?

# dmesg | grep BogoMIPS
[    0.002041] Calibrating delay loop... 331.77 BogoMIPS (lpj=1658880)
# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
BogoMIPS        : 331.77
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 vfpd32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc05
CPU revision    : 1

Hardware        : Freescale Vybrid VF5xx/VF6xx (Device Tree)
Revision        : 0000
Serial          : 02813976

Theory, CPU CLOCk on VF610 500Mhz, Cortex-A5 have 1,5DMIPS per MHz. So, BogoMIPS counter must be about 500 minimum (about 750 ideally). I check CCM/PLL settings with uboot and linux kernel - cpu clock really 500MHz

# cat /sys/kernel/debug/clk/clk_summary 
   clock                         enable_cnt  prepare_cnt        rate   accuracy   phase
                   pll1_pfd1              1            1   500210526          0 0  
                      pll1_pfd_sel           1            1   500210526          0 0  
                         sys_sel           1            1   500210526          0 0  
                            sys_bus           2            2   500210526          0 0  

With vf50 very similar. Is this correct?

BogoMIPS (Bogus MIPS) is used to calibrate delay loops not to measure cpu core performance in a reliable way. If you want to measure DMIPS performance of the cpu core, you’ll need heavily optimized benchmark code. Remember that vf50 does not have a L2 cache.

This really says it all.

Ok, let’s say somethig else:
I have board in production with Freescale I.MX51 board (Cortex-A8, 600MHz CPU clock) - BogoMIPS count about 600. Now i wan’t use VF61 or VF50 for replace. VF61 Cortex-A5, 500Mhz cpu clock - BogoMIPS count about 330. On MD5 calculation for big file (about 200Mb) we have performance drop 2,5 times. I think it’s wrong. I expected a loss of performance of about 20%. And I check: VF61 read SD card faster, than I.MX515.

And you are running the exact same software on both?

Yes. Some binary identical arm edition of gentoo linux.

And you do run our boot loader and kernel which do enable the L2 cache?

Bootloader and kernel from latest git sources. I have some FDT patch for kernel. Add i2c device for I2C-to-OW converter, remove SPI CAN controller, add some gpio-leds and gpio-keys, add gpio-poweroff. Change some settings in bootloader (IP params for my network and boot script for my env). But them must be not critical for performance.
On boot I can see:

[    0.000000] L2C-310 erratum 769419 enabled
[    0.000000] L2C-310 dynamic clock gating enabled, standby mode enabled
[    0.000000] L2C-310 cache controller enabled, 8 ways, 512 kB
[    0.000000] L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x06060000

L2 cache enabled.

OK, so it’s not the L2 cache. Then it is probably the DDR RAM which the Vybrid clocks at 1/3 of the CPU frequency plus it is only 16-bit wide. How about on your i.MX 51?

Thank you.
I’ll see. But, unfortunately, not so fast. There’s a very old kernel and bootloader. And it is very complicated сlock system. In any case, this is a very good reason for such a significant loss of performance.

The DDR RAM clock is actually synchrounous with the CPU on VF50 (400MHz) and has its dedicated PLL on VF61 (also running at 400MHz). But agreed, bandwith is rather limited due to 16-bit bus).

Here are some things which might contribute to this performance difference:

  • CPU Architecture (A8 is a dual-issue in-order machine, A5 is a single-issue machine)
  • Cache Hierarchy (Especially VF50 is lacking L2, which is especially problematic for memory intense work-loads)
  • Memory architecture (the Vybrid has only a 16-bit bus width, memory clock is at 400MHz)

If you care about CPU/memory performance per dollar I would recommend to have a look at our NXP i.MX 7 based modules. The memory throughput is much better due to its 32-bit bus width and 533MHz memory clock. Also, the Cortex-A7 is higher clocked and has more DMIPS than the A5 (it is a dual-issue machine like the A8).

On all modules, keep in mind that the display controller does not have its dedicated memory, hence the CPU shares memory bandwidth with the display controller. The default resolution of 640x480 should not make a big difference though.