Hi,
I am running into a strange problem where my linux kernel crashes with memory allocation errors after running my program for a while. The allocation is typically from the flexcan driver, but the crash log appears to show that there is still memory available in various areas. Here is one such log:
[ 2061.066160] kswapd0: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[ 2061.074920] CPU: 0 PID: 36 Comm: kswapd0 Tainted: G O 4.9.84-2.8.2+gb2a7f2f #37
[ 2061.083362] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 2061.089892] Backtrace:
[ 2061.092375] [<8010ba5c>] (dump_backtrace) from [<8010bd34>] (show_stack+0x18/0x1c)
[ 2061.099953] r7:80b03254 r6:60030193 r5:00000000 r4:80b1b030
[ 2061.105625] [<8010bd1c>] (show_stack) from [<803fe064>] (dump_stack+0x90/0xa4)
[ 2061.112859] [<803fdfd4>] (dump_stack) from [<801c7ef0>] (warn_alloc+0xf0/0x104)
[ 2061.120173] r7:80b03254 r6:00000000 r5:00000000 r4:00000000
[ 2061.125843] [<801c7e04>] (warn_alloc) from [<801c8464>] (__alloc_pages_nodemask+0x4c0/0xc5c)
[ 2061.134285] r3:00000000 r2:00000000 r1:8090fa80
[ 2061.138905] r4:02280020
[ 2061.141449] [<801c7fa4>] (__alloc_pages_nodemask) from [<802010c4>] (new_slab+0x218/0x288)
[ 2061.149721] r10:87c43b48 r9:00000015 r8:00000000 r7:00000000 r6:02080020 r5:00000000
[ 2061.157552] r4:84001e00
[ 2061.160095] [<80200eac>] (new_slab) from [<802025cc>] (___slab_alloc.constprop.5+0x200/0x260)
[ 2061.168626] r10:87c43b48 r9:00000000 r8:02080020 r7:84001e00 r6:87d7f410 r5:00000000
[ 2061.176457] r4:00000000
[ 2061.179000] [<802023cc>] (___slab_alloc.constprop.5) from [<802029d0>] (kmem_cache_alloc+0xf0/0x120)
[ 2061.188140] r10:87c43b48 r9:00000000 r8:60030113 r7:60030113 r6:00000000 r5:02080020
[ 2061.195971] r4:84001e00
[ 2061.198519] [<802028e0>] (kmem_cache_alloc) from [<806dfebc>] (__build_skb+0x30/0x98)
[ 2061.206354] r7:844f9840 r6:00000140 r5:8621a000 r4:87d7c184
[ 2061.212022] [<806dfe8c>] (__build_skb) from [<806e001c>] (__netdev_alloc_skb+0x8c/0x108)
[ 2061.220119] r9:00000000 r8:60030113 r7:844f9840 r6:00000140 r5:8621a000 r4:87d7c184
[ 2061.227875] [<806dff90>] (__netdev_alloc_skb) from [<80577a48>] (alloc_can_skb+0x24/0xb0)
[ 2061.236058] r9:00000020 r8:909f4030 r7:00040080 r6:8621a000 r5:87c43af4 r4:8621a000
[ 2061.243811] [<80577a24>] (alloc_can_skb) from [<8057a450>] (flexcan_poll+0xa0/0x3e8)
[ 2061.251558] r7:00040080 r6:0000000a r5:00000000 r4:8621a000
[ 2061.257228] [<8057a3b0>] (flexcan_poll) from [<806edf4c>] (net_rx_action+0x120/0x2fc)
[ 2061.265064] r10:87c43b48 r9:80b02d00 r8:0000000a r7:0000012c r6:0002afdd r5:8057a3b0
[ 2061.272896] r4:8621a588
[ 2061.275441] [<806ede2c>] (net_rx_action) from [<8012a970>] (__do_softirq+0x100/0x260)
[ 2061.283278] r10:00000003 r9:00000100 r8:80b02080 r7:ffffe000 r6:40000003 r5:80b0208c
[ 2061.291110] r4:00000000
[ 2061.293651] [<8012a870>] (__do_softirq) from [<8012ae08>] (irq_exit+0xe0/0x148)
[ 2061.300966] r10:00000004 r9:f4a01100 r8:86004000 r7:00000001 r6:00000000 r5:00000000
[ 2061.308798] r4:80a77d30
[ 2061.311345] [<8012ad28>] (irq_exit) from [<8016e618>] (__handle_domain_irq+0x68/0xbc)
[ 2061.319183] [<8016e5b0>] (__handle_domain_irq) from [<801014bc>] (gic_handle_irq+0x50/0x94)
[ 2061.327541] r9:f4a01100 r8:87c43c48 r7:f4a00100 r6:f4a0010c r5:80b1b200 r4:80b0344c
[ 2061.335290] [<8010146c>] (gic_handle_irq) from [<8010c88c>] (__irq_svc+0x6c/0x90)
[ 2061.342776] Exception stack(0x87c43c48 to 0x87c43c90)
[ 2061.347835] 3c40: 80b9316c 80b93160 00000000 80b11940 00002d18 87dda0a0
[ 2061.356019] 3c60: 80b930dc 87c43d38 80b11954 80b11954 00000004 87c43cd4 87c43cd8 87c43c98
[ 2061.364201] 3c80: 801fd594 807fd050 60030113 ffffffff
[ 2061.369259] r9:87c42000 r8:80b11954 r7:87c43c7c r6:ffffffff r5:60030113 r4:807fd050
[ 2061.377015] [<801fd520>] (get_swap_page) from [<801fb2f4>] (add_to_swap+0x14/0x64)
[ 2061.384592] r10:00000004 r9:87c43de8 r8:00000000 r7:87c43d38 r6:87c43f00 r5:87dda0a0
[ 2061.392423] r4:87dda0b4
[ 2061.394968] [<801fb2e0>] (add_to_swap) from [<801d2678>] (shrink_page_list+0x654/0xc38)
[ 2061.402975] r5:87dda0a0 r4:87dda0b4
[ 2061.406558] [<801d2024>] (shrink_page_list) from [<801d340c>] (shrink_inactive_list+0x2ec/0x468)
[ 2061.415349] r10:00000000 r9:80b45044 r8:80b443c4 r7:80b45040 r6:00000005 r5:80b443c0
[ 2061.423180] r4:00000020
[ 2061.425721] [<801d3120>] (shrink_inactive_list) from [<801d3cf8>] (shrink_node+0x464/0x8a8)
[ 2061.434078] r10:00000020 r9:87c43f00 r8:80b45044 r7:0000007e r6:00000008 r5:00000000
[ 2061.441910] r4:00000000
[ 2061.444450] [<801d3894>] (shrink_node) from [<801d4960>] (kswapd+0x2a8/0x664)
[ 2061.451592] r10:00000000 r9:80b443c0 r8:80b8b28c r7:80b94930 r6:ffffffff r5:00000000
[ 2061.459424] r4:80b45044
[ 2061.461966] [<801d46b8>] (kswapd) from [<80143114>] (kthread+0x110/0x118)
[ 2061.468761] r10:00000000 r9:00000000 r8:801d46b8 r7:80b443c0 r6:87c42000 r5:87c28140
[ 2061.476592] r4:00000000
[ 2061.479135] [<80143004>] (kthread) from [<80107df0>] (ret_from_fork+0x14/0x24)
[ 2061.486364] r8:00000000 r7:00000000 r6:00000000 r5:80143004 r4:87c28140
[ 2061.493067] Mem-Info:
[ 2061.495355] active_anon:844 inactive_anon:832 isolated_anon:54
[ 2061.495355] active_file:1659 inactive_file:842 isolated_file:32
[ 2061.495355] unevictable:0 dirty:1 writeback:367 unstable:0
[ 2061.495355] slab_reclaimable:1017 slab_unreclaimable:2533
[ 2061.495355] mapped:2182 shmem:1 pagetables:419 bounce:0
[ 2061.495355] free:12049 free_pcp:36 free_cma:11933
[ 2061.528293] Node 0 active_anon:3376kB inactive_anon:3328kB active_file:6636kB inactive_file:3368kB unevictable:0kB isolated(anon):216kB isolated(file):128kB mapped:8728kB dirty:4kB writeback:1468kB shmem:4kB writeback_tmp:0kB unstable:0kB pages_scanned:32 all_unreclaimable? no
[ 2061.552804] Normal free:48196kB min:1368kB low:1708kB high:2048kB active_anon:3376kB inactive_anon:3328kB active_file:6636kB inactive_file:3368kB unevictable:0kB writepending:1472kB present:262144kB managed:249360kB mlocked:0kB slab_reclaimable:4068kB slab_unreclaimable:10132kB kernel_stack:1560kB pagetables:1676kB bounce:0kB free_pcp:144kB local_pcp:144kB free_cma:47732kB
[ 2061.585809] lowmem_reserve[]: 0 0 0
[ 2061.589353] Normal: 56*4kB (MC) 11*8kB (UC) 173*16kB (UC) 232*32kB (UC) 111*64kB (C) 43*128kB (C) 26*256kB (C) 8*512kB (C) 6*1024kB (C) 2*2048kB (C) 1*4096kB (C) 0*8192kB 0*16384kB 0*32768kB = 48200kB
3636 total pagecache pages
[ 2061.609981] 1100 pages in swap cache
[ 2061.613559] Swap cache stats: add 41388, delete 40288, find 17269/29049
[ 2061.620175] Free swap = 464728kB
[ 2061.623491] Total swap = 524284kB
[ 2061.626807] 65536 pages RAM
[ 2061.629603] 0 pages HighMem/MovableOnly
[ 2061.633440] 3196 pages reserved
[ 2061.636582] 32768 pages cma reserved
This only happens if my program is changing the screen somewhat frequently (which appears to cause galcore to allocate more memory if I keep an eye on /sys/kernel/debug/gc/meminfo) AND the CAN driver is receiving data. If either one of those are not the case the system will run indefinitely. As seen above it appears that at time of crash there is still enough free normal memory and blocks, cma, and cache but it still fails to allocate anyway.
After reading this and some other things around the web I have tried these items to no avail:
- Increasing vm_min_free_kbytes - the more it is increased makes everything run slower and still have allocation issues
- Decreasing vm_min_free_kbytes to 50 - oddly for some reason this causes the crash to happen much less frequently, but it does still happen
- Adjust CMA allocation in command line, kernel config, device tree- none of these seem to have any affect on actual cma allocation
- Adjust galcore.contigous size to a smaller number- everything runs longer but still eventually fails
Curious if anyone has seen anything similar or had any thoughts