Kernel lockups

Paweł Gołaszewski blues w pld-linux.org
Sob, 29 Lip 2006, 15:47:48 CEST


I have problem with my machine. It hangs from time to time (~ once per 
week) - yesterday first time anything left in logs (attached). Earlier I 
had clear logs and sometimes on console was some "Memory Fault" or some 
omm-killer actions.

The system runs various kernels (no difference) with vserver, currently:
2.6.16.23-1smp #1 SMP (from AC)
There is 3 vservers running. 
4xPIII Xeon 700MHz i686 Pentium_III_(Cascades)
# lspci
00:08.0 SCSI storage controller: LSI Logic / Symbios Logic 53c810 (rev 23)
00:09.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
00:0a.0 VGA compatible controller: Cirrus Logic GD 5480 (rev 23)
00:0b.0 PIC: Intel Corporation 683053 Programmable Interrupt Device
00:0c.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
00:0c.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:0c.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:0c.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
00:10.0 Host bridge: Intel Corporation 450NX - 82451NX Memory & I/O Controller (rev 03)
00:12.0 Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (rev 04)
00:13.0 Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (rev 04)
01:01.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
01:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53C896/897 (rev 07)
01:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53C896/897 (rev 07)

First of all - the hardware is 100% good. I have few identical machines 
and I've replaced whole box (except HDs). And checked everything.

The problem is similar to mine:
http://www.forum-informatica-pro.info/index.php/m/128695/

I have e1000 on board too and the problems seems to be really similar 
(except I don't use nbd). Something seems to "eat" the memory, not 
releasing it. The load is raising to ~50-100 and system is dead. Mostly 
even Sysrq is not working.

I've increased /proc/sys/vm/min_free_kbytes from ~3800 to 16384, as 
mentioned in link above, and I'm waiting.... but in that link is mentioned 
also that it helps just a little bit.

Any ideas? Someone would like to try decode attached log?

-- 
pozdr.  Paweł Gołaszewski          jid:blues<at>jabber<dot>gda<dot>pl
--------------------------------------------------------------------------
If you think of MS-DOS as mono, and Windows as stereo, then Linux is Dolby
Pro-Logic Surround Sound with Bass Boost and all the music is free.
-------------- następna część ---------
Jul 28 23:05:32 jol kernel: pdflush: page allocation failure. order:1, mode:0x20
Jul 28 23:05:51 jol kernel:  [__alloc_pages+585/864] __alloc_pages+0x249/0x360
Jul 28 23:05:56 jol kernel:  [kmem_getpages+55/176] kmem_getpages+0x37/0xb0
Jul 28 23:05:58 jol kernel:  [alloc_slabmgmt+87/112] alloc_slabmgmt+0x57/0x70
Jul 28 23:05:59 jol kernel:  [cache_grow+194/384] cache_grow+0xc2/0x180
Jul 28 23:05:59 jol kernel:  [cache_alloc_refill+363/544] cache_alloc_refill+0x16b/0x220
Jul 28 23:05:59 jol kernel:  [__kmalloc+184/208] __kmalloc+0xb8/0xd0
Jul 28 23:05:59 jol kernel:  [__alloc_skb+87/272] __alloc_skb+0x57/0x110
Jul 28 23:05:59 jol kernel:  [tcp_collapse+247/896] tcp_collapse+0xf7/0x380
Jul 28 23:05:59 jol kernel:  [tcp_prune_queue+154/496] tcp_prune_queue+0x9a/0x1f0
Jul 28 23:05:59 jol kernel:  [tcp_data_queue+765/2016] tcp_data_queue+0x2fd/0x7e0
Jul 28 23:05:59 jol kernel:  [tcp_rcv_established+535/1984] tcp_rcv_established+0x217/0x7c0
Jul 28 23:05:59 jol kernel:  [tcp_v4_do_rcv+267/272] tcp_v4_do_rcv+0x10b/0x110
Jul 28 23:05:59 jol kernel:  [tcp_v4_rcv+2076/2816] tcp_v4_rcv+0x81c/0xb00
Jul 28 23:05:59 jol kernel:  [ip_local_deliver_finish+0/528] ip_local_deliver_finish+0x0/0x210
Jul 28 23:05:59 jol kernel:  [nf_hook_slow+110/272] nf_hook_slow+0x6e/0x110
Jul 28 23:05:59 jol kernel:  [ip_local_deliver+280/688] ip_local_deliver+0x118/0x2b0
Jul 28 23:05:59 jol kernel:  [ip_local_deliver_finish+0/528] ip_local_deliver_finish+0x0/0x210
Jul 28 23:05:59 jol kernel:  [ip_rcv+698/1424] ip_rcv+0x2ba/0x590
Jul 28 23:05:59 jol kernel:  [ip_rcv_finish+0/688] ip_rcv_finish+0x0/0x2b0
Jul 28 23:05:59 jol kernel:  [netif_receive_skb+730/1008] netif_receive_skb+0x2da/0x3f0
Jul 28 23:05:59 jol kernel:  [pg0+944742897/1068868608] e1000_clean_rx_irq+0x1f1/0x640 [e1000]
Jul 28 23:05:59 jol kernel:  [pg0+944741860/1068868608] e1000_clean_tx_irq+0x124/0x340 [e1000]
Jul 28 23:05:59 jol kernel:  [pg0+944741318/1068868608] e1000_clean+0xb6/0x1b0 [e1000]
Jul 28 23:05:59 jol kernel:  [net_rx_action+149/400] net_rx_action+0x95/0x190
Jul 28 23:05:59 jol kernel:  [__do_softirq+293/320] __do_softirq+0x125/0x140
Jul 28 23:05:59 jol kernel:  [do_softirq+53/64] do_softirq+0x35/0x40
Jul 28 23:05:59 jol kernel:  [do_IRQ+30/48] do_IRQ+0x1e/0x30
Jul 28 23:05:59 jol kernel:  [common_interrupt+26/32] common_interrupt+0x1a/0x20
Jul 28 23:05:59 jol kernel:  [_spin_lock+5/16] _spin_lock+0x5/0x10
Jul 28 23:05:59 jol kernel:  [page_lock_anon_vma+32/48] page_lock_anon_vma+0x20/0x30
Jul 28 23:05:59 jol kernel:  [page_referenced_anon+21/144] page_referenced_anon+0x15/0x90
Jul 28 23:05:59 jol kernel:  [add_to_swap+41/144] add_to_swap+0x29/0x90
Jul 28 23:05:59 jol kernel:  [page_referenced+142/144] page_referenced+0x8e/0x90
Jul 28 23:05:59 jol kernel:  [shrink_list+217/1056] shrink_list+0xd9/0x420
Jul 28 23:05:59 jol kernel:  [__pagevec_release+40/64] __pagevec_release+0x28/0x40
Jul 28 23:06:00 jol kernel:  [refill_inactive_zone+724/1056] refill_inactive_zone+0x2d4/0x420
Jul 28 23:06:00 jol kernel:  [shrink_cache+194/720] shrink_cache+0xc2/0x2d0
Jul 28 23:06:00 jol kernel:  [shrink_zone+182/224] shrink_zone+0xb6/0xe0
Jul 28 23:06:00 jol kernel:  [shrink_caches+127/160] shrink_caches+0x7f/0xa0
Jul 28 23:06:00 jol kernel:  [try_to_free_pages+241/560] try_to_free_pages+0xf1/0x230
Jul 28 23:06:00 jol kernel:  [__alloc_pages+377/864] __alloc_pages+0x179/0x360
Jul 28 23:06:00 jol kernel:  [kmem_getpages+55/176] kmem_getpages+0x37/0xb0
Jul 28 23:06:00 jol kernel:  [cfq_set_request+230/544] cfq_set_request+0xe6/0x220
Jul 28 23:06:00 jol kernel:  [cache_grow+194/384] cache_grow+0xc2/0x180
Jul 28 23:06:00 jol kernel:  [pg0+945824700/1068868608] xfs_bmap_do_search_extents+0xfc/0x460 [xfs]
Jul 28 23:06:00 jol kernel:  [cache_alloc_refill+363/544] cache_alloc_refill+0x16b/0x220
Jul 28 23:06:00 jol kernel:  [kmem_cache_alloc+135/160] kmem_cache_alloc+0x87/0xa0
Jul 28 23:06:00 jol kernel:  [pg0+946151804/1068868608] kmem_zone_alloc+0x4c/0xc0 [xfs]
Jul 28 23:06:00 jol kernel:  [pg0+946151955/1068868608] kmem_zone_zalloc+0x23/0x60 [xfs]
Jul 28 23:06:00 jol kernel:  [pg0+946091049/1068868608] _xfs_trans_alloc+0x19/0x60 [xfs]
Jul 28 23:06:00 jol kernel:  [pg0+946091011/1068868608] xfs_trans_alloc+0xc3/0xd0 [xfs]
Jul 28 23:06:00 jol kernel:  [__make_request+279/1152] __make_request+0x117/0x480
Jul 28 23:06:00 jol kernel:  [mempool_alloc+63/256] mempool_alloc+0x3f/0x100
Jul 28 23:06:00 jol kernel:  [pg0+946020171/1068868608] xfs_iomap_write_allocate+0x12b/0x610 [xfs]
Jul 28 23:06:00 jol kernel:  [pg0+943146426/1068868608] make_request+0x2aa/0x480 [raid1]
Jul 28 23:06:00 jol kernel:  [pg0+946015176/1068868608] xfs_iomap+0x438/0x570 [xfs]
Jul 28 23:06:00 jol kernel:  [pg0+946152600/1068868608] xfs_map_blocks+0x58/0x90 [xfs]
Jul 28 23:06:00 jol kernel:  [pg0+946157244/1068868608] xfs_page_state_convert+0x48c/0x7c0 [xfs]
Jul 28 23:06:00 jol kernel:  [submit_bh+334/416] submit_bh+0x14e/0x1a0
Jul 28 23:06:00 jol kernel:  [find_get_pages_tag+65/144] find_get_pages_tag+0x41/0x90
Jul 28 23:06:00 jol kernel:  [pg0+946160039/1068868608] linvfs_writepage+0x67/0x100 [xfs]
Jul 28 23:06:00 jol kernel:  [mpage_writepages+635/1008] mpage_writepages+0x27b/0x3f0
Jul 28 23:06:00 jol kernel:  [blkdev_writepage+0/48] blkdev_writepage+0x0/0x30
Jul 28 23:06:00 jol kernel:  [pg0+946159936/1068868608] linvfs_writepage+0x0/0x100 [xfs]
Jul 28 23:06:00 jol kernel:  [do_writepages+82/96] do_writepages+0x52/0x60
Jul 28 23:06:00 jol kernel:  [__sync_single_inode+97/512] __sync_single_inode+0x61/0x200
Jul 28 23:06:00 jol kernel:  [__writeback_single_inode+136/432] __writeback_single_inode+0x88/0x1b0
Jul 28 23:06:00 jol kernel:  [schedule_timeout+97/176] schedule_timeout+0x61/0xb0
Jul 28 23:06:00 jol kernel:  [process_timeout+0/16] process_timeout+0x0/0x10
Jul 28 23:06:00 jol kernel:  [finish_wait+48/112] finish_wait+0x30/0x70
Jul 28 23:06:00 jol kernel:  [generic_sync_sb_inodes+462/736] generic_sync_sb_inodes+0x1ce/0x2e0
Jul 28 23:06:00 jol kernel:  [writeback_inodes+205/208] writeback_inodes+0xcd/0xd0
Jul 28 23:06:00 jol kernel:  [background_writeout+122/192] background_writeout+0x7a/0xc0
Jul 28 23:06:00 jol kernel:  [pdflush+0/96] pdflush+0x0/0x60
Jul 28 23:06:00 jol kernel:  [__pdflush+190/432] __pdflush+0xbe/0x1b0
Jul 28 23:06:00 jol kernel:  [pdflush+81/96] pdflush+0x51/0x60
Jul 28 23:06:00 jol kernel:  [background_writeout+0/192] background_writeout+0x0/0xc0
Jul 28 23:06:01 jol kernel:  [kthread+213/224] kthread+0xd5/0xe0
Jul 28 23:06:05 jol kernel:  [kthread+0/224] kthread+0x0/0xe0
Jul 28 23:06:08 jol kernel:  [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
Jul 28 23:06:10 jol kernel: Mem-info:
Jul 28 23:06:10 jol kernel: DMA per-cpu:
Jul 28 23:06:11 jol kernel: cpu 0 hot: high 0, batch 1 used:0
Jul 28 23:06:11 jol kernel: cpu 0 cold: high 0, batch 1 used:0
Jul 28 23:06:12 jol kernel: cpu 1 hot: high 0, batch 1 used:0
Jul 28 23:06:12 jol kernel: cpu 1 cold: high 0, batch 1 used:0
Jul 28 23:06:13 jol kernel: cpu 2 hot: high 0, batch 1 used:0
Jul 28 23:06:13 jol kernel: cpu 2 cold: high 0, batch 1 used:0
Jul 28 23:06:14 jol kernel: cpu 3 hot: high 0, batch 1 used:0
Jul 28 23:06:14 jol kernel: cpu 3 cold: high 0, batch 1 used:0
Jul 28 23:06:14 jol kernel: DMA32 per-cpu: empty
Jul 28 23:06:14 jol kernel: Normal per-cpu:
Jul 28 23:06:15 jol kernel: cpu 0 hot: high 186, batch 31 used:61
Jul 28 23:06:17 jol kernel: cpu 0 cold: high 62, batch 15 used:22
Jul 28 23:06:17 jol kernel: cpu 1 hot: high 186, batch 31 used:66
Jul 28 23:06:18 jol kernel: cpu 1 cold: high 62, batch 15 used:15
Jul 28 23:06:18 jol kernel: cpu 2 hot: high 186, batch 31 used:103
Jul 28 23:06:18 jol kernel: cpu 2 cold: high 62, batch 15 used:24
Jul 28 23:06:18 jol kernel: cpu 3 hot: high 186, batch 31 used:35
Jul 28 23:06:18 jol kernel: cpu 3 cold: high 62, batch 15 used:59
Jul 28 23:06:19 jol kernel: HighMem per-cpu:
Jul 28 23:06:19 jol kernel: cpu 0 hot: high 42, batch 7 used:39
Jul 28 23:06:19 jol kernel: cpu 0 cold: high 14, batch 3 used:4
Jul 28 23:06:19 jol kernel: cpu 1 hot: high 42, batch 7 used:1
Jul 28 23:06:19 jol kernel: cpu 1 cold: high 14, batch 3 used:2
Jul 28 23:06:19 jol kernel: cpu 2 hot: high 42, batch 7 used:16
Jul 28 23:06:20 jol kernel: cpu 2 cold: high 14, batch 3 used:9
Jul 28 23:06:22 jol kernel: cpu 3 hot: high 42, batch 7 used:5
Jul 28 23:06:22 jol kernel: cpu 3 cold: high 14, batch 3 used:13
Jul 28 23:06:23 jol kernel: Free pages:       11292kB (124kB HighMem)
Jul 28 23:06:24 jol kernel: Active:121392 inactive:110607 dirty:63 writeback:0 unstable:0 free:2823 slab:13227 mapped:231628 pagetables:5248
Jul 28 23:06:25 jol kernel: DMA free:3552kB min:68kB low:84kB high:100kB active:4772kB inactive:3488kB present:16384kB pages_scanned:871 all_unreclaimable? no
Jul 28 23:06:25 jol kernel: lowmem_reserve[]: 0 0 880 1007
Jul 28 23:06:26 jol kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jul 28 23:06:26 jol kernel: lowmem_reserve[]: 0 0 880 1007
Jul 28 23:06:26 jol kernel: Normal free:7616kB min:3756kB low:4692kB high:5632kB active:413924kB inactive:379220kB present:901120kB pages_scanned:120821 all_unreclaimable? no
Jul 28 23:06:27 jol kernel: lowmem_reserve[]: 0 0 0 1023
Jul 28 23:06:27 jol kernel: HighMem free:124kB min:128kB low:264kB high:400kB active:66872kB inactive:59720kB present:131048kB pages_scanned:810 all_unreclaimable? no
Jul 28 23:06:27 jol kernel: lowmem_reserve[]: 0 0 0 0
Jul 28 23:06:27 jol kernel: DMA: 28*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3552kB
Jul 28 23:06:27 jol kernel: DMA32: empty
Jul 28 23:06:28 jol kernel: Normal: 1728*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 7616kB
Jul 28 23:06:28 jol kernel: HighMem: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 124kB
Jul 28 23:06:28 jol kernel: Swap cache: add 1615553, delete 1614859, find 11770444/11964091, race 0+343
Jul 28 23:06:31 jol kernel: Free swap  = 0kB
Jul 28 23:06:32 jol kernel: Total swap = 996008kB
Jul 28 23:06:32 jol kernel: Free swap:            0kB
Jul 28 23:06:32 jol kernel: 262138 pages of RAM
Jul 28 23:06:32 jol kernel: 32762 pages of HIGHMEM
Jul 28 23:06:32 jol kernel: 3476 reserved pages
Jul 28 23:06:32 jol kernel: 119825 pages shared
Jul 28 23:06:32 jol kernel: 695 pages swap cached
Jul 28 23:06:33 jol kernel: 63 pages dirty
Jul 28 23:06:33 jol kernel: 0 pages writeback
Jul 28 23:06:33 jol kernel: 231628 pages mapped
Jul 28 23:06:34 jol kernel: 13227 pages slab
Jul 28 23:06:38 jol kernel: 5248 pages pagetables
Jul 28 23:06:41 jol kernel: pdflush: page allocation failure. order:1, mode:0x20
Jul 28 23:06:41 jol kernel:  [__alloc_pages+585/864] __alloc_pages+0x249/0x360
Jul 28 23:06:42 jol kernel:  [kmem_getpages+55/176] kmem_getpages+0x37/0xb0
Jul 28 23:06:42 jol kernel:  [__mod_timer+151/192] __mod_timer+0x97/0xc0
Jul 28 23:06:42 jol kernel:  [cache_grow+194/384] cache_grow+0xc2/0x180
Jul 28 23:06:43 jol kernel:  [cache_alloc_refill+363/544] cache_alloc_refill+0x16b/0x220
Jul 28 23:06:43 jol kernel:  [__kmalloc+184/208] __kmalloc+0xb8/0xd0
Jul 28 23:06:44 jol kernel:  [__alloc_skb+87/272] __alloc_skb+0x57/0x110
Jul 28 23:06:45 jol kernel:  [tcp_collapse+247/896] tcp_collapse+0xf7/0x380
Jul 28 23:06:45 jol kernel:  [tcp_prune_queue+154/496] tcp_prune_queue+0x9a/0x1f0
Jul 28 23:06:46 jol kernel:  [tcp_data_queue+1620/2016] tcp_data_queue+0x654/0x7e0
Jul 28 23:06:46 jol kernel:  [tcp_rcv_established+535/1984] tcp_rcv_established+0x217/0x7c0
Jul 28 23:06:46 jol kernel:  [tcp_v4_do_rcv+267/272] tcp_v4_do_rcv+0x10b/0x110
Jul 28 23:06:46 jol kernel:  [tcp_v4_rcv+2076/2816] tcp_v4_rcv+0x81c/0xb00
Jul 28 23:06:46 jol kernel:  [ip_local_deliver_finish+0/528] ip_local_deliver_finish+0x0/0x210
Jul 28 23:06:46 jol kernel:  [nf_hook_slow+110/272] nf_hook_slow+0x6e/0x110
Jul 28 23:06:47 jol kernel:  [ip_local_deliver+280/688] ip_local_deliver+0x118/0x2b0
Jul 28 23:06:48 jol kernel:  [ip_local_deliver_finish+0/528] ip_local_deliver_finish+0x0/0x210
Jul 28 23:06:49 jol kernel:  [ip_rcv+698/1424] ip_rcv+0x2ba/0x590
Jul 28 23:06:49 jol kernel:  [ip_rcv_finish+0/688] ip_rcv_finish+0x0/0x2b0
Jul 28 23:06:50 jol kernel:  [netif_receive_skb+730/1008] netif_receive_skb+0x2da/0x3f0
Jul 28 23:06:55 jol kernel:  [pg0+944742897/1068868608] e1000_clean_rx_irq+0x1f1/0x640 [e1000]
Jul 28 23:07:04 jol kernel:  [pg0+944741860/1068868608] e1000_clean_tx_irq+0x124/0x340 [e1000]
Jul 28 23:07:04 jol kernel:  [ktime_get_ts+109/128] ktime_get_ts+0x6d/0x80
Jul 28 23:07:06 jol kernel:  [pg0+944741318/1068868608] e1000_clean+0xb6/0x1b0 [e1000]
Jul 28 23:07:06 jol kernel:  [net_rx_action+149/400] net_rx_action+0x95/0x190
Jul 28 23:07:13 jol kernel:  [__do_softirq+293/320] __do_softirq+0x125/0x140
Jul 28 23:07:13 jol kernel:  [do_softirq+53/64] do_softirq+0x35/0x40
Jul 28 23:07:13 jol kernel:  [do_IRQ+30/48] do_IRQ+0x1e/0x30
Jul 28 23:07:13 jol kernel:  [common_interrupt+26/32] common_interrupt+0x1a/0x20
Jul 28 23:07:13 jol kernel:  [_spin_lock+10/16] _spin_lock+0xa/0x10
Jul 28 23:07:18 jol kernel:  [page_lock_anon_vma+32/48] page_lock_anon_vma+0x20/0x30
Jul 28 23:07:18 jol kernel:  [page_referenced_anon+21/144] page_referenced_anon+0x15/0x90
Jul 28 23:07:19 jol kernel:  [page_referenced+142/144] page_referenced+0x8e/0x90
Jul 28 23:07:20 jol kernel:  [refill_inactive_zone+948/1056] refill_inactive_zone+0x3b4/0x420
Jul 28 23:07:22 jol kernel:  [shrink_zone+153/224] shrink_zone+0x99/0xe0
Jul 28 23:07:27 jol kernel:  [shrink_caches+127/160] shrink_caches+0x7f/0xa0
Jul 28 23:07:59 jol kernel:  [try_to_free_pages+241/560] try_to_free_pages+0xf1/0x230
Jul 28 23:17:18 jol kernel:  [__alloc_pages+377/864] __alloc_pages+0x179/0x360
Jul 28 23:17:37 jol kernel:  [kmem_getpages+55/176] kmem_getpages+0x37/0xb0


Więcej informacji o liście dyskusyjnej pld-kernel