Zwisy na ostatnich kernelach: < coś > blocked for more than 120 seconds.

Mateusz Korniak mateusz w ant.gliwice.pl
Czw, 14 Paź 2010, 14:46:34 CEST


Zaczeło sie 3/4 miesiące temu przy zadaniach intensywnych IO/ np (powtarzalne 
na jednej maszynie gdy data-check md, na prawie identycznych innych nie 
występujący nigdy). Dotychczas były to kernele 64bit i userland 32bit ale 
przed chwilą padł mój codziennie dotychczas stabilnie pracujący na 2.6.33.3 
laptop w kilka godzin po przejściu na 2.6.35.7.
Ale mam wrażenie że problem dotyczył też poprzednich kerneli, tyle że może nie 
był tak widoczny.
Objawia się zwisaniem na operacjach IO kolejnych procesów, load rośnie, a 
maszyna się mrozi.

Ma ktoś jakieś pomysły jak się pozbyć ?


Oct 14 12:14:54 laptop-hp kernel: [ 1800.261183] INFO: task kio_file:4489 
blocked for more than 120 seconds.
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261187] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261190] kio_file      D 00000003     
0  4489   2767 0x00000004
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261195]  f16dfe7c 00000082 f16dfe80 
00000003 ef6a6b44 ef39a1e0 ec89858c f16dfe34
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261201]  c7e254a8 0000017a 00005140 
00005140 00005140 ef2fe380 c7e23cff 0000017a
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261206]  00005140 00005140 ef2fe380 
b3492eea 002ea1e9 00000000 f4e6d500 c2005140
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261210] Call Trace:
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261222]  [<c1397a6c>] 
io_schedule+0x5c/0xa0
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261227]  [<c10cbb35>] 
sync_page+0x35/0x40
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261231]  [<c13980e7>] 
__wait_on_bit_lock+0x47/0x90
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261234]  [<c10cbb00>] ? 
sync_page+0x0/0x40
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261237]  [<c10cbae6>] 
__lock_page+0x76/0x80
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261242]  [<c10572a0>] ? 
wake_bit_function+0x0/0x60
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261247]  [<c10e563e>] 
handle_mm_fault+0x84e/0x860
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261252]  [<c139ce90>] ? 
do_page_fault+0x0/0x400
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261255]  [<c139cfd6>] 
do_page_fault+0x146/0x400
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261258]  [<c10e8236>] ? 
do_munmap+0x1f6/0x270
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261296]  [<c10e7fa0>] ? 
arch_unmap_area_topdown+0x0/0x30
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261299]  [<c10e8b22>] ? 
sys_brk+0x42/0x130
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261302]  [<c139ce90>] ? 
do_page_fault+0x0/0x400
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261305]  [<c139a586>] 
error_code+0x76/0x80
Oct 14 12:14:54 laptop-hp kernel: [ 1800.261308] INFO: task kio_file:4490 
blocked for more than 120 seconds.

i dalej pododbnie dla kolejnych procesów ...

Dla innej maszyny (tej od padów data-check kernel 2.6.34.3 64bit) wygląda to 
tak:
Aug  2 13:40:36 s14 kernel: [100520.949724] md: data-check of RAID array md5
Aug  2 13:40:36 s14 kernel: [100520.949729] md: minimum _guaranteed_  speed: 
1000 KB/sec/disk.
Aug  2 13:40:36 s14 kernel: [100520.949731] md: using maximum available idle 
IO bandwidth (but not more than 200000 KB/sec) for data-check.
Aug  2 13:40:36 s14 kernel: [100520.949736] md: using 128k window, over a 
total of 117193984 blocks.
Aug  2 16:05:15 s14 kernel: [109200.485779] INFO: task flush-9:5:3000 blocked 
for more than 120 seconds.
Aug  2 16:05:15 s14 kernel: [109200.485782] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  2 16:05:15 s14 kernel: [109200.485784] flush-9:5     D ffff88000fe87140     
0  3000      2 0x00000000
Aug  2 16:05:15 s14 kernel: [109200.485789]  ffff8804135b5760 0000000000000046 
ffff8804135b5728 ffff8804135b5724
Aug  2 16:05:15 s14 kernel: [109200.485793]  0000000000000400 ffff88000fe87020 
ffff88041cbb61c0 ffff8804135b572c
Aug  2 16:05:15 s14 kernel: [109200.485796]  ffff8804135b5718 0000000101f1b639 
0000000000011140 0000000000011140
Aug  2 16:05:15 s14 kernel: [109200.485800] Call Trace:
Aug  2 16:05:15 s14 kernel: [109200.485815]  [<ffffffffa0022749>] 
__this_module+0x3689/0x6ff0 [raid10]
Aug  2 16:05:15 s14 kernel: [109200.485821]  [<ffffffff8103f5a0>] ? 
default_wake_function+0x0/0x10
Aug  2 16:05:15 s14 kernel: [109200.485826]  [<ffffffffa0025a62>] 
__this_module+0x69a2/0x6ff0 [raid10]
Aug  2 16:05:15 s14 kernel: [109200.485831]  [<ffffffffa0009dca>] 
md_error+0x69a/0x1000 [md_mod]
Aug  2 16:05:15 s14 kernel: [109200.485835]  [<ffffffff8120a27e>] 
generic_make_request+0x19e/0x450
Aug  2 16:05:15 s14 kernel: [109200.485840]  [<ffffffff810e558e>] ? 
mempool_alloc+0x5e/0x130
Aug  2 16:05:15 s14 kernel: [109200.485843]  [<ffffffff8120a5b4>] 
submit_bio+0x84/0x110
Aug  2 16:05:15 s14 kernel: [109200.485847]  [<ffffffff8115e4cf>] 
submit_bh+0xef/0x130
Aug  2 16:05:15 s14 kernel: [109200.485850]  [<ffffffff811615b0>] 
__block_write_full_page+0x1f0/0x390
Aug  2 16:05:15 s14 kernel: [109200.485853]  [<ffffffff8115fbb0>] ? 
end_buffer_async_write+0x0/0x150
Aug  2 16:05:15 s14 kernel: [109200.485857]  [<ffffffff8116182b>] 
block_write_full_page_endio+0xdb/0x110
Aug  2 16:05:15 s14 kernel: [109200.485862]  [<ffffffffa00c6370>] ? 
ext3_count_dirs+0x1a0/0x1090 [ext3]
Aug  2 16:05:15 s14 kernel: [109200.485865]  [<ffffffff81161870>] 
block_write_full_page+0x10/0x20
Aug  2 16:05:15 s14 kernel: [109200.485870]  [<ffffffffa00c708d>] 
ext3_count_dirs+0xebd/0x1090 [ext3]
Aug  2 16:05:15 s14 kernel: [109200.485874]  [<ffffffff810eb382>] 
__writepage+0x12/0x40
Aug  2 16:05:15 s14 kernel: [109200.485876]  [<ffffffff810eb937>] 
write_cache_pages+0x1c7/0x400
Aug  2 16:05:15 s14 kernel: [109200.485879]  [<ffffffff810eb370>] ? 
__writepage+0x0/0x40
Aug  2 16:05:15 s14 kernel: [109200.485882]  [<ffffffff810ebb8f>] 
generic_writepages+0x1f/0x30
Aug  2 16:05:15 s14 kernel: [109200.485885]  [<ffffffff810ec815>] 
do_writepages+0x25/0x30
Aug  2 16:05:15 s14 kernel: [109200.485888]  [<ffffffff81157c8b>] 
writeback_single_inode+0xeb/0x330
Aug  2 16:05:15 s14 kernel: [109200.485891]  [<ffffffff8115872b>] 
writeback_inodes_wb+0x11b/0x400
Aug  2 16:05:15 s14 kernel: [109200.485893]  [<ffffffff81158b22>] 
wb_writeback+0x112/0x1d0
Aug  2 16:05:15 s14 kernel: [109200.485897]  [<ffffffff81158f7c>] 
wb_do_writeback+0x1ac/0x1b0
Aug  2 16:05:15 s14 kernel: [109200.485899]  [<ffffffff81158fcb>] 
bdi_writeback_task+0x4b/0xd0
Aug  2 16:05:15 s14 kernel: [109200.485903]  [<ffffffff810fc390>] ? 
bdi_start_fn+0x0/0xd0
Aug  2 16:05:15 s14 kernel: [109200.485906]  [<ffffffff810fc3fc>] 
bdi_start_fn+0x6c/0xd0
Aug  2 16:05:15 s14 kernel: [109200.485909]  [<ffffffff810fc390>] ? 
bdi_start_fn+0x0/0xd0
Aug  2 16:05:15 s14 kernel: [109200.485912]  [<ffffffff810696ce>] 
kthread+0x8e/0xa0
Aug  2 16:05:15 s14 kernel: [109200.485915]  [<ffffffff81003c94>] 
kernel_thread_helper+0x4/0x10
Aug  2 16:05:15 s14 kernel: [109200.485918]  [<ffffffff81069640>] ? 
kthread+0x0/0xa0
Aug  2 16:05:15 s14 kernel: [109200.485921]  [<ffffffff81003c90>] ? 
kernel_thread_helper+0x0/0x10
Aug  2 16:05:15 s14 kernel: [109200.485924] INFO: task kjournald:3009 blocked 
for more than 120 seconds.


-- 
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
 	krótko mówiąc - podpora społeczeństwa."
				Nikos Kazantzakis - "Grek Zorba"


Więcej informacji o liście dyskusyjnej pld-users-pl