Zwisy na ostatnich kernelach: < coś > blocked for more than 120 seconds.
Mateusz Korniak
mateusz w ant.gliwice.pl
Czw, 14 Paź 2010, 14:46:34 CEST
Zaczeło sie 3/4 miesiące temu przy zadaniach intensywnych IO/ np (powtarzalne
na jednej maszynie gdy data-check md, na prawie identycznych innych nie
występujący nigdy). Dotychczas były to kernele 64bit i userland 32bit ale
przed chwilą padł mój codziennie dotychczas stabilnie pracujący na 2.6.33.3
laptop w kilka godzin po przejściu na 2.6.35.7.
Ale mam wrażenie że problem dotyczył też poprzednich kerneli, tyle że może nie
był tak widoczny.
Objawia się zwisaniem na operacjach IO kolejnych procesów, load rośnie, a
maszyna się mrozi.
Ma ktoś jakieś pomysły jak się pozbyć ?
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261183] INFO: task kio_file:4489
blocked for more than 120 seconds.
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261187] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261190] kio_file D 00000003
0 4489 2767 0x00000004
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261195] f16dfe7c 00000082 f16dfe80
00000003 ef6a6b44 ef39a1e0 ec89858c f16dfe34
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261201] c7e254a8 0000017a 00005140
00005140 00005140 ef2fe380 c7e23cff 0000017a
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261206] 00005140 00005140 ef2fe380
b3492eea 002ea1e9 00000000 f4e6d500 c2005140
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261210] Call Trace:
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261222] [<c1397a6c>]
io_schedule+0x5c/0xa0
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261227] [<c10cbb35>]
sync_page+0x35/0x40
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261231] [<c13980e7>]
__wait_on_bit_lock+0x47/0x90
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261234] [<c10cbb00>] ?
sync_page+0x0/0x40
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261237] [<c10cbae6>]
__lock_page+0x76/0x80
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261242] [<c10572a0>] ?
wake_bit_function+0x0/0x60
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261247] [<c10e563e>]
handle_mm_fault+0x84e/0x860
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261252] [<c139ce90>] ?
do_page_fault+0x0/0x400
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261255] [<c139cfd6>]
do_page_fault+0x146/0x400
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261258] [<c10e8236>] ?
do_munmap+0x1f6/0x270
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261296] [<c10e7fa0>] ?
arch_unmap_area_topdown+0x0/0x30
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261299] [<c10e8b22>] ?
sys_brk+0x42/0x130
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261302] [<c139ce90>] ?
do_page_fault+0x0/0x400
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261305] [<c139a586>]
error_code+0x76/0x80
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261308] INFO: task kio_file:4490
blocked for more than 120 seconds.
i dalej pododbnie dla kolejnych procesów ...
Dla innej maszyny (tej od padów data-check kernel 2.6.34.3 64bit) wygląda to
tak:
Aug 2 13:40:36 s14 kernel: [100520.949724] md: data-check of RAID array md5
Aug 2 13:40:36 s14 kernel: [100520.949729] md: minimum _guaranteed_ speed:
1000 KB/sec/disk.
Aug 2 13:40:36 s14 kernel: [100520.949731] md: using maximum available idle
IO bandwidth (but not more than 200000 KB/sec) for data-check.
Aug 2 13:40:36 s14 kernel: [100520.949736] md: using 128k window, over a
total of 117193984 blocks.
Aug 2 16:05:15 s14 kernel: [109200.485779] INFO: task flush-9:5:3000 blocked
for more than 120 seconds.
Aug 2 16:05:15 s14 kernel: [109200.485782] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 2 16:05:15 s14 kernel: [109200.485784] flush-9:5 D ffff88000fe87140
0 3000 2 0x00000000
Aug 2 16:05:15 s14 kernel: [109200.485789] ffff8804135b5760 0000000000000046
ffff8804135b5728 ffff8804135b5724
Aug 2 16:05:15 s14 kernel: [109200.485793] 0000000000000400 ffff88000fe87020
ffff88041cbb61c0 ffff8804135b572c
Aug 2 16:05:15 s14 kernel: [109200.485796] ffff8804135b5718 0000000101f1b639
0000000000011140 0000000000011140
Aug 2 16:05:15 s14 kernel: [109200.485800] Call Trace:
Aug 2 16:05:15 s14 kernel: [109200.485815] [<ffffffffa0022749>]
__this_module+0x3689/0x6ff0 [raid10]
Aug 2 16:05:15 s14 kernel: [109200.485821] [<ffffffff8103f5a0>] ?
default_wake_function+0x0/0x10
Aug 2 16:05:15 s14 kernel: [109200.485826] [<ffffffffa0025a62>]
__this_module+0x69a2/0x6ff0 [raid10]
Aug 2 16:05:15 s14 kernel: [109200.485831] [<ffffffffa0009dca>]
md_error+0x69a/0x1000 [md_mod]
Aug 2 16:05:15 s14 kernel: [109200.485835] [<ffffffff8120a27e>]
generic_make_request+0x19e/0x450
Aug 2 16:05:15 s14 kernel: [109200.485840] [<ffffffff810e558e>] ?
mempool_alloc+0x5e/0x130
Aug 2 16:05:15 s14 kernel: [109200.485843] [<ffffffff8120a5b4>]
submit_bio+0x84/0x110
Aug 2 16:05:15 s14 kernel: [109200.485847] [<ffffffff8115e4cf>]
submit_bh+0xef/0x130
Aug 2 16:05:15 s14 kernel: [109200.485850] [<ffffffff811615b0>]
__block_write_full_page+0x1f0/0x390
Aug 2 16:05:15 s14 kernel: [109200.485853] [<ffffffff8115fbb0>] ?
end_buffer_async_write+0x0/0x150
Aug 2 16:05:15 s14 kernel: [109200.485857] [<ffffffff8116182b>]
block_write_full_page_endio+0xdb/0x110
Aug 2 16:05:15 s14 kernel: [109200.485862] [<ffffffffa00c6370>] ?
ext3_count_dirs+0x1a0/0x1090 [ext3]
Aug 2 16:05:15 s14 kernel: [109200.485865] [<ffffffff81161870>]
block_write_full_page+0x10/0x20
Aug 2 16:05:15 s14 kernel: [109200.485870] [<ffffffffa00c708d>]
ext3_count_dirs+0xebd/0x1090 [ext3]
Aug 2 16:05:15 s14 kernel: [109200.485874] [<ffffffff810eb382>]
__writepage+0x12/0x40
Aug 2 16:05:15 s14 kernel: [109200.485876] [<ffffffff810eb937>]
write_cache_pages+0x1c7/0x400
Aug 2 16:05:15 s14 kernel: [109200.485879] [<ffffffff810eb370>] ?
__writepage+0x0/0x40
Aug 2 16:05:15 s14 kernel: [109200.485882] [<ffffffff810ebb8f>]
generic_writepages+0x1f/0x30
Aug 2 16:05:15 s14 kernel: [109200.485885] [<ffffffff810ec815>]
do_writepages+0x25/0x30
Aug 2 16:05:15 s14 kernel: [109200.485888] [<ffffffff81157c8b>]
writeback_single_inode+0xeb/0x330
Aug 2 16:05:15 s14 kernel: [109200.485891] [<ffffffff8115872b>]
writeback_inodes_wb+0x11b/0x400
Aug 2 16:05:15 s14 kernel: [109200.485893] [<ffffffff81158b22>]
wb_writeback+0x112/0x1d0
Aug 2 16:05:15 s14 kernel: [109200.485897] [<ffffffff81158f7c>]
wb_do_writeback+0x1ac/0x1b0
Aug 2 16:05:15 s14 kernel: [109200.485899] [<ffffffff81158fcb>]
bdi_writeback_task+0x4b/0xd0
Aug 2 16:05:15 s14 kernel: [109200.485903] [<ffffffff810fc390>] ?
bdi_start_fn+0x0/0xd0
Aug 2 16:05:15 s14 kernel: [109200.485906] [<ffffffff810fc3fc>]
bdi_start_fn+0x6c/0xd0
Aug 2 16:05:15 s14 kernel: [109200.485909] [<ffffffff810fc390>] ?
bdi_start_fn+0x0/0xd0
Aug 2 16:05:15 s14 kernel: [109200.485912] [<ffffffff810696ce>]
kthread+0x8e/0xa0
Aug 2 16:05:15 s14 kernel: [109200.485915] [<ffffffff81003c94>]
kernel_thread_helper+0x4/0x10
Aug 2 16:05:15 s14 kernel: [109200.485918] [<ffffffff81069640>] ?
kthread+0x0/0xa0
Aug 2 16:05:15 s14 kernel: [109200.485921] [<ffffffff81003c90>] ?
kernel_thread_helper+0x0/0x10
Aug 2 16:05:15 s14 kernel: [109200.485924] INFO: task kjournald:3009 blocked
for more than 120 seconds.
--
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
krótko mówiąc - podpora społeczeństwa."
Nikos Kazantzakis - "Grek Zorba"
Więcej informacji o liście dyskusyjnej pld-users-pl