[LINUX_2_6] Badness in local_bh_enable at kernel/softirq.c:144

Marek Guevara Braun marek.guevara w atm.com.pl
Pią, 19 Maj 2006, 10:47:16 CEST


I have got 5 machines which run on kernel-grsecurity-2.6.16.16-1
3 of them are uniprocessor (with non smp kernel), the rest is dual
core with smp kernel.

The kernel was build with:
./builder -ba --with grsec_full -r LINUX_2_6 kernel
some days ago.

I'm tracking the bug related to a very frequent message:

May 19 09:46:28 turul Badness in local_bh_enable at kernel/softirq.c:144
May 19 09:46:28 turul [<c01701e4>] local_bh_enable+0x74/0x80
May 19 09:46:28 turul [<c0174f2d>] __exit_signal+0x5d/0x160
May 19 09:46:28 turul [<c016d427>] forget_original_parent+0x17/0x210
May 19 09:46:28 turul [<c016c6ac>] release_task+0x4c/0x110
May 19 09:46:28 turul [<c01689fb>] mm_release+0x8b/0x90
May 19 09:46:28 turul [<c016d787>] exit_notify+0x167/0x2d0
May 19 09:46:28 turul [<c016dae8>] do_exit+0x1f8/0x3f0
May 19 09:46:28 turul [<c016dd0d>] sys_exit+0xd/0x10
May 19 09:46:28 turul [<c014ff99>] syscall_call+0x7/0xb
May 19 09:46:28 turul [<c014007b>] 0xc014007b
May 19 09:46:28 turul [<c015007b>] syscall_trace_entry+0x1b/0x30

quick symbol lookup on the machine:
[marek w turul SPECS]$ grep local_bh_enable /proc/kallsyms
c0170170 T local_bh_enable
c0393770 r __ksymtab_local_bh_enable
c0398ae6 r __kstrtab_local_bh_enable
c0170170 U local_bh_enable      [bridge]
c0170170 U local_bh_enable      [ebtables]
c0170170 U local_bh_enable      [ip_conntrack_irc]
c0170170 U local_bh_enable      [ip_conntrack_ftp]
c0170170 U local_bh_enable      [ip_queue]
c0170170 U local_bh_enable      [sunrpc]
c0170170 U local_bh_enable      [af_key]
c0170170 U local_bh_enable      [ip_vs]
c0170170 U local_bh_enable      [3c59x]
c0170170 U local_bh_enable      [ip_nat]
c0170170 U local_bh_enable      [ipt_LOG]
c0170170 U local_bh_enable      [xt_limit]
c0170170 U local_bh_enable      [ip_conntrack]
c0170170 U local_bh_enable      [ip_tables]
c0170170 U local_bh_enable      [ip6_tables]
c0170170 U local_bh_enable      [x_tables]
c0170170 U local_bh_enable      [ipv6]

The other machine:

May 19 01:53:16 piglet Badness in local_bh_enable at kernel/softirq.c:144
May 19 01:53:16 piglet [<c01701e4>] local_bh_enable+0x74/0x80
May 19 01:53:16 piglet [<c0174f2d>] __exit_signal+0x5d/0x160
May 19 01:53:16 piglet [<c016d427>] forget_original_parent+0x17/0x210
May 19 01:53:16 piglet [<c016c6ac>] release_task+0x4c/0x110
May 19 01:53:16 piglet [<c01689fb>] mm_release+0x8b/0x90
May 19 01:53:16 piglet [<c016d787>] exit_notify+0x167/0x2d0
May 19 01:53:16 piglet [<c016dae8>] do_exit+0x1f8/0x3f0
May 19 01:53:16 piglet [<c01b6cb5>] sys_munmap+0x55/0x80
May 19 01:53:16 piglet [<c016dd0d>] sys_exit+0xd/0x10
May 19 01:53:16 piglet [<c014ff99>] syscall_call+0x7/0xb
May 19 01:53:16 piglet [<c014007b>] 0xc014007b

symbol lookup (there is no X11 session on the machine
but with X11 enabled the grep was not different)

[root w piglet ~]# grep local_bh_enable /proc/kallsyms
c0170170 T local_bh_enable
c0393770 r __ksymtab_local_bh_enable
c0398ae6 r __kstrtab_local_bh_enable
c0170170 U local_bh_enable      [xt_limit]
c0170170 U local_bh_enable      [ipt_LOG]
c0170170 U local_bh_enable      [ip_nat]
c0170170 U local_bh_enable      [ip_conntrack]
c0170170 U local_bh_enable      [ip_tables]
c0170170 U local_bh_enable      [x_tables]
c0170170 U local_bh_enable      [3c59x]
c0170170 U local_bh_enable      [bluetooth]

On the third machine the message looks similiar. The backtrace looks
identical. The otrzer two (smp) machines do not have the same message.

What is interesting I'he observed that the badness ... message appears
only when machine runs X11 session. My suspect was graphics card modules
(respectively nvidia and frglx) which were build on two of the three
machines with different version of binutils (binutils-2.15.94.0.2.2-2)
than the kernel itself (binutils-2.16.91.0.7-1) but the third machine
runs open source radeon driver (without drm), and I executed fbdev
driver on an nvidia hardware with the same effect.

The second suspect is binutils - I've build kernel packages with an old
version of binutils to check this. Have you heard about any problems
whith kernel/userland build with different versions of binutils
(especially 2.15.94.0.2.2 and 2.16.91) ?

On my check list is also running one of the machines without network and
iptables related modules - but from what I've tested unloading all
netfilter stuff does not help.

Anyway how could I track the exact which part of kernel generates
the message ? Do we have any debug enabled bcond (I haven't seen any) ?

Greets,
Marek



Więcej informacji o liście dyskusyjnej pld-kernel