Intermittent Bug (Heisenbug?) in Linux Kernel 4.1

I have enabled a custom module for Sensoray Model 1012 which is PCIe mini multichannel framegrabber and uses the Intersil TW6869 IC.

I have followed the steps as mentioned in another Toradex Community article:
https://www.toradex.com/community/questions/13268/building-toradex-linux-24-for-apalis-imx6.html


At this point things are mostly working fine.
We can see 8 video devices in /dev/video*
Also, gstreamer pipelines work (using the v4l driver).

But, sometimes, when I boot the system, and then try to open
or read from device, using gstreamer, a kernel oops message
is emitted. After this point, it is not possible to access
the video feeds until the system is reset.

This problem is intermittent as sometimes the kernel boots
up (up to the graphical desktop) and things work flawlessly
but sometimes this issue occurs making the system unusable
until reboot.


I am attaching following files for reference:

1) The boot log.

2) The shell script that is our application.

3) The Kernel oops message that is sometimes emitted.


What is the possible source of this problem and how can I begin troubleshooting it?

Thank you.


The bug would not be in the kernel but in the TW686x driver itself. I tried looking into the driver a bit and saw below in the notes.

 * Notes                                                                                                                                          
 * -----                                                                                                                                          
 *                                                                                                                                                
 * 1. Under stress-testing, it has been observed that the PCIe link                                                                               
 * goes down, without reason. Therefore, the driver takes special care                                                                            
 * to allow device hot-unplugging.                                                                                                                
 *                                                                                                                                                
 * 2. TW686X devices are capable of setting a few different DMA modes,                                                                            
 * including: scatter-gather, field and frame modes. However,                                                                                     
 * under stress testings it has been found that the machine can                                                                                   
 * freeze completely if DMA registers are programmed while streaming                                                                              
 * is active.                                                                                                                                     
 *                                                                                                                                                
 * Therefore, driver implements a dma_mode called 'memcpy' which                                                                                  
 * avoids cycling the DMA buffers, and insteads allocates extra DMA buffers                                                                       
 * and then copies into vmalloc'ed user buffers.                                                                                                  
 *                                                                                                                                                
 * In addition to this, when streaming is on, the driver tries to access                                                                          
 * hardware registers as infrequently as possible. This is done by using                                                                          
 * a timer to limit the rate at which DMA is reset on DMA channels error.                                                                         
 */

Can you add “dma_mode=memcpy” to the “defargs” environment variable like I believe you would have added pci=nomsi and coherent_pool=128M earlier as I recommended in another thread and check again please?

I have adding dma_mode=memcpy to the defargs

defargs=enable_wait_mode=off vmalloc=400M pci=nomsi coherent_pool=128M dma_mode=memcpy

While this has reduced the frequency of the problem, the problem still exists.

Any other suggestions?

Sorry but I do not have any solution at the moment. Does the issue go away if using only four cameras? Can you also contact Sensoray and see if they have any info? I will try to see if we can find a fix and if using mainline kernel works. However this exercise will take a while.

Thank you for looking into it.
I have contacted Sensoray, and will update this thread when I get a positive response.

I tested Sensoray 1012 on Apalis iMX6 IT and non IT module for more than 22 hours with live 8 camera previews and streams being stored on to a mSATA SSD and saw no issues.

Hi,

I have found out that there is a known bug in the (kdrv) driver from Sensoray.

As per their suggestion I have used the other driver (vbuf2).

I have not seen any crashes or kernel oops with this driver.

Thanks for your help Sanchayan, I used your suggestions for getting the vbuf2 driver to work.

I am posting the modifications I had to make in the kernel source tree:


Add these lines to arch/arm/configs/apalis_imx6_defconfig



CONFIG_VIDEOBUF2_CORE=y

CONFIG_VIDEOBUF2_MEMOPS=y

CONFIG_VIDEOBUF2_DMA_CONTIG=y

CONFIG_VIDEOBUF2_VMALLOC=y

CONFIG_VIDEOBUF2_DMA_SG=y

CONFIG_VIDEOBUF2_DVB=y




Also this is my defargs line:

defargs=enable_wait_mode=off vmalloc=400M pci=nomsi coherent_pool=128M dma_mode=memcpy quiet