Spontaneous reboot when testing OpenBLAS

Hi,

While running the unit tests for OpenBLAS with Fortran support, I encountered a bug which causes the entire system to spontaneously reset. The serial debug port doesn’t indicate anything happened, no segfault, no kernel panic, nothing. The system immediately reboots. I’m not sure if this is some kind of errata or if it’s something else. I could definitely use a second opinion. Normally I wouldn’t bring this kind of thing to a forum like this, but since I can’t reproduce the issue on other hardware I’m not sure where else to turn.

Target environment:

• Yocto morty-4.9.51-8qm_beta2 branch (commit 751aa61ad796d3493b49c2adc32cf02d31141ca6).

• Add Fortran support to Yocto by backporting this patch to Yocto Morty and add ‘FORTRAN_forcevariable = “,fortran”’ to local.conf.

• Add make, gcc, g++, gfortran, etc. to the fsl-image-validation-imx recipe to facilitate compiling code on the target.

• Added libgfortran to the fsl-image-validation-imx recipe.

Reproduction steps on target:

• Clone the OpenBLAS git repo and check out the v0.3.3 tag.

• Comment out ‘all : run_test‘ in OpenBLAS/utest/Makefile. This will allow OpenBLAS to finish building.

• Build OpenBLAS v0.3.3 on the target with no additional make arguments. (-j6 is unnecessary, the makefile will handle multithreaded building automatically). Selecting ARMV8 or CORTEXA57 as the target tune makes no difference. The failure happens on both.

• Manually run ‘OpenBLAS/utest/openblas_utest’. ‘TEST 23/23 fork:safety’ should cause the failure. Running openblas_utest causes the failure to occur 100% of the time in my testing.

Additional Info:

• The fork:safety test, when isolated, can also cause the failure, although it doesn’t happen every time.

• The test passes without issue when run on a Dual Cortex-A9 system running a Linux Yocto 2.4 image.

• The test passes without issue when run on an alternative (non-NXP) ARMv8-A SoC, even when using the libgfortran and OpenBLAS binaries from the i.MX8 system which experiences the failure.

Are you guys able to reproduce these results?

Hi

I guess that your use case is CPU intensive.

There is an issue with the peak power the Evaluation Board can deliver, have a look here.

Do you have the means to replace the shunt described in the errata document?

Max

I had checked out that errata sheet already actually. My carrier board was purchased after October 8th though, so I figured this wasn’t the issue.

I went ahead and replaced the shunt resistor anyways and that did indeed fix the issue.

Perfect, that it works. Thanks for the feedback.