SOURCES (LINUX_2_6): linux-2.6-improve-text-size.patch (NEW) - sma...

pluto pluto at pld-linux.org
Wed Dec 28 22:00:24 CET 2005


Author: pluto                        Date: Wed Dec 28 21:00:24 2005 GMT
Module: SOURCES                       Tag: LINUX_2_6
---- Log message:
- smaller/faster code :)

---- Files affected:
SOURCES:
   linux-2.6-improve-text-size.patch (NONE -> 1.1.2.1)  (NEW)

---- Diffs:

================================================================
Index: SOURCES/linux-2.6-improve-text-size.patch
diff -u /dev/null SOURCES/linux-2.6-improve-text-size.patch:1.1.2.1
--- /dev/null	Wed Dec 28 22:00:24 2005
+++ SOURCES/linux-2.6-improve-text-size.patch	Wed Dec 28 22:00:19 2005
@@ -0,0 +1,81 @@
+this patchset (for the 2.6.16 tree) consists of two patches:
+
+  gcc-no-forced-inlining.patch
+  gcc-unit-at-a-time.patch
+
+the purpose of these patches is to reduce the kernel's .text size, in 
+particular if CONFIG_CC_OPTIMIZE_FOR_SIZE is specified. The effect of 
+the patches on x86 is:
+
+    text    data     bss     dec     hex filename
+ 3286166  869852  387260 4543278  45532e vmlinux-orig
+ 3194123  955168  387260 4536551  4538e7 vmlinux-inline
+ 3119495  884960  387748 4392203  43050b vmlinux-inline+units
+  437271   77646   32192  547109   85925 vmlinux-tiny-orig
+  452694   77646   32192  562532   89564 vmlinux-tiny-inline
+  431891   77422   32128  541441   84301 vmlinux-tiny-inline+units
+
+i.e. a 5.3% .text reduction (!) with a larger .config, and a 1.2% .text 
+reduction with a smaller .config.
+
+i've also done test-builds with CC_OPTIMIZE_FOR_SIZE disabled:
+
+   text    data     bss     dec     hex filename
+4080998  870384  387260 5338642  517612 vmlinux-speed-orig
+4084421  872024  387260 5343705  5189d9 vmlinux-speed-inline
+4010957  834048  387748 5232753  4fd871 vmlinux-speed-inline+units
+
+so the more flexible inlining did not result in many changes [which is 
+good, we want gcc to inline those in the optimized-for-speed case], but 
+unit-at-a-time optimization resulted in smaller code - very likely 
+meaning speed advantages as well.
+
+unit-at-a-time still increases the kernel stack footprint somewhat (by 
+about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree 
+gcc3 used to, which prompted the original -fno-unit-at-a-time addition.
+
+so i think the combination of the two patches is a win both for small 
+and for large systems. In fact the 5.3% .text reduction for embedded 
+kernels is very significant.
+
+ arch/i386/Makefile            |    6 +++---
+ include/linux/compiler-gcc4.h |    9 +++++----
+ 2 files changed, 8 insertions(+), 7 deletions(-)
+
+--- a/include/linux/compiler-gcc4.h
++++ b/include/linux/compiler-gcc4.h
+@@ -3,14 +3,15 @@
+ /* These definitions are for GCC v4.x.  */
+ #include <linux/compiler-gcc.h>
+ 
+-#define inline			inline		__attribute__((always_inline))
+-#define __inline__		__inline__	__attribute__((always_inline))
+-#define __inline		__inline	__attribute__((always_inline))
++#define inline			inline
++#define __inline__		__inline__
++#define __inline		__inline
+ #define __deprecated		__attribute__((deprecated))
+ #define __attribute_used__	__attribute__((__used__))
+ #define __attribute_pure__	__attribute__((pure))
+ #define __attribute_const__	__attribute__((__const__))
+-#define  noinline		__attribute__((noinline))
++#define noinline		__attribute__((noinline))
++#define __always_inline		inline __attribute__((always_inline))
+ #define __must_check 		__attribute__((warn_unused_result))
+ #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
+ 
+--- a/arch/i386/Makefile
++++ b/arch/i386/Makefile
+@@ -42,9 +42,9 @@ include $(srctree)/arch/i386/Makefile.cp
+ GCC_VERSION			:= $(call cc-version)
+ cflags-$(CONFIG_REGPARM) 	+= $(shell if [ $(GCC_VERSION) -ge 0300 ] ; then echo "-mregparm=3"; fi ;)
+ 
+-# Disable unit-at-a-time mode, it makes gcc use a lot more stack
+-# due to the lack of sharing of stacklots.
+-CFLAGS += $(call cc-option,-fno-unit-at-a-time)
++# Disable unit-at-a-time mode on pre-gcc-4.0 compilers, it makes gcc use
++# a lot more stack due to the lack of sharing of stacklots:
++CFLAGS				+= $(shell if [ $(GCC_VERSION) -lt 0400 ] ; then $(call cc-option,-fno-unit-at-a-time); fi ;)
+ 
+ CFLAGS += $(cflags-y)
+ 
================================================================


More information about the pld-cvs-commit mailing list