packages (Titanium): kernel-desktop/kernel-desktop-sched-bfs.patch - update...

Sun Oct 24 15:28:39 CEST 2010

Author: shadzik                      Date: Sun Oct 24 13:28:39 2010 GMT
Module: packages                      Tag: Titanium
---- Log message:
- updated for 2.6.36

---- Files affected:
packages/kernel-desktop:
   kernel-desktop-sched-bfs.patch (1.1.2.21 -> 1.1.2.22) 

---- Diffs:

================================================================
Index: packages/kernel-desktop/kernel-desktop-sched-bfs.patch
diff -u packages/kernel-desktop/kernel-desktop-sched-bfs.patch:1.1.2.21 packages/kernel-desktop/kernel-desktop-sched-bfs.patch:1.1.2.22

--- packages/kernel-desktop/kernel-desktop-sched-bfs.patch:1.1.2.21	Sat Sep 25 21:26:51 2010
+++ packages/kernel-desktop/kernel-desktop-sched-bfs.patch	Sun Oct 24 15:28:33 2010
@@ -1,28 +1,3 @@
-The Brain Fuck Scheduler v0.350 by Con Kolivas.
-
-A single shared runqueue O(n) strict fairness earliest deadline first design.
-
-Ultra low latency and excellent desktop performance for 1 to many CPUs.
-Not recommended for 4096 cpus.
-
-Scalability is optimal when your workload is equal to the number of CPUs on
-bfs. ie you should ONLY do make -j4 on quad core, -j2 on dual core and so on.
-
-Features SCHED_IDLEPRIO and SCHED_ISO scheduling policies as well.
-You do NOT need to use these policies for good performance, they are purely
-optional for even better performance in extreme conditions.
-
-To run something idleprio, use schedtool like so:
-
-schedtool -D -e make -j4
-
-To run something isoprio, use schedtool like so:
-
-schedtool -I -e amarok
-
-Includes accurate sub-tick accounting of tasks so userspace reported
-cpu usage may be very different if you have very short lived tasks.
-
 ---
  Documentation/scheduler/sched-BFS.txt     |  351 +
  Documentation/sysctl/kernel.txt           |   26 
@@ -30,7 +5,7 @@
  fs/proc/base.c                            |    2 
  include/linux/init_task.h                 |   65 
  include/linux/ioprio.h                    |    2 
- include/linux/sched.h                     |  106 
+ include/linux/sched.h                     |   89 
  init/Kconfig                              |   17 
  init/main.c                               |    1 
  kernel/delayacct.c                        |    2 
@@ -38,7955 +13,7809 @@
  kernel/kthread.c                          |    2 
  kernel/posix-cpu-timers.c                 |   14 
  kernel/sched.c                            |    4 
- kernel/sched_bfs.c                        | 6984 ++++++++++++++++++++++++++++++
- kernel/slow-work.c                        |    1 
+ kernel/sched_bfs.c                        | 6933 ++++++++++++++++++++++++++++++
  kernel/sysctl.c                           |   31 
  lib/Kconfig.debug                         |    2 
- mm/oom_kill.c                             |    2 
- 19 files changed, 7591 insertions(+), 28 deletions(-)
+ 17 files changed, 7522 insertions(+), 26 deletions(-)
 
-Index: linux-2.6.35.5-ck1/Documentation/sysctl/kernel.txt
+Index: linux-2.6.36-ck1/arch/powerpc/platforms/cell/spufs/sched.c
 ===================================================================
---- linux-2.6.35.5-ck1.orig/Documentation/sysctl/kernel.txt	2010-02-25 21:51:46.000000000 +1100
-+++ linux-2.6.35.5-ck1/Documentation/sysctl/kernel.txt	2010-09-25 01:17:57.872918484 +1000
-@@ -31,6 +31,7 @@ show up in /proc/sys/kernel:
- - domainname
- - hostname
- - hotplug
-+- iso_cpu
- - java-appletviewer           [ binfmt_java, obsolete ]
- - java-interpreter            [ binfmt_java, obsolete ]
- - kstack_depth_to_print       [ X86 only ]
-@@ -53,6 +54,7 @@ show up in /proc/sys/kernel:
- - randomize_va_space
- - real-root-dev               ==> Documentation/initrd.txt
- - reboot-cmd                  [ SPARC only ]
-+- rr_interval
- - rtsig-max
- - rtsig-nr
- - sem
-@@ -240,6 +242,16 @@ Default value is "/sbin/hotplug".
- 
- ==============================================================
+--- linux-2.6.36-ck1.orig/arch/powerpc/platforms/cell/spufs/sched.c	2010-05-17 18:51:19.000000000 +1000
++++ linux-2.6.36-ck1/arch/powerpc/platforms/cell/spufs/sched.c	2010-10-21 10:28:34.044580421 +1100
+@@ -64,11 +64,6 @@ static struct timer_list spusched_timer;
+ static struct timer_list spuloadavg_timer;
  
-+iso_cpu: (BFS CPU scheduler only).
+ /*
+- * Priority of a normal, non-rt, non-niced'd process (aka nice level 0).
+- */
+-#define NORMAL_PRIO		120
+-
+-/*
+  * Frequency of the spu scheduler tick.  By default we do one SPU scheduler
+  * tick for every 10 CPU scheduler ticks.
+  */
+Index: linux-2.6.36-ck1/Documentation/scheduler/sched-BFS.txt
+===================================================================
+--- /dev/null	1970-01-01 00:00:00.000000000 +0000
++++ linux-2.6.36-ck1/Documentation/scheduler/sched-BFS.txt	2010-10-21 10:28:34.045580378 +1100
+@@ -0,0 +1,351 @@
++BFS - The Brain Fuck Scheduler by Con Kolivas.
 +
-+This sets the percentage cpu that the unprivileged SCHED_ISO tasks can
-+run effectively at realtime priority, averaged over a rolling five
-+seconds over the -whole- system, meaning all cpus.
++Goals.
 +
-+Set to 70 (percent) by default.
++The goal of the Brain Fuck Scheduler, referred to as BFS from here on, is to
++completely do away with the complex designs of the past for the cpu process
++scheduler and instead implement one that is very simple in basic design.
++The main focus of BFS is to achieve excellent desktop interactivity and
++responsiveness without heuristics and tuning knobs that are difficult to
++understand, impossible to model and predict the effect of, and when tuned to
++one workload cause massive detriment to another.
 +
-+==============================================================
 +
- l2cr: (PPC only)
- 
- This flag controls the L2 cache of G3 processor boards. If
-@@ -414,6 +426,20 @@ rebooting. ???
- 
- ==============================================================
- 
-+rr_interval: (BFS CPU scheduler only)
++Design summary.
 +
-+This is the smallest duration that any cpu process scheduling unit
-+will run for. Increasing this value can increase throughput of cpu
-+bound tasks substantially but at the expense of increased latencies
-+overall. Conversely decreasing it will decrease average and maximum
-+latencies but at the expense of throughput. This value is in
-+milliseconds and the default value chosen depends on the number of
-+cpus available at scheduler initialisation with a minimum of 6.
++BFS is best described as a single runqueue, O(n) lookup, earliest effective
++virtual deadline first design, loosely based on EEVDF (earliest eligible virtual
++deadline first) and my previous Staircase Deadline scheduler. Each component
++shall be described in order to understand the significance of, and reasoning for
++it. The codebase when the first stable version was released was approximately
++9000 lines less code than the existing mainline linux kernel scheduler (in
++2.6.31). This does not even take into account the removal of documentation and
++the cgroups code that is not used.
 +
-+Valid values are from 1-5000.
++Design reasoning.
 +
-+==============================================================
++The single runqueue refers to the queued but not running processes for the
++entire system, regardless of the number of CPUs. The reason for going back to
++a single runqueue design is that once multiple runqueues are introduced,
++per-CPU or otherwise, there will be complex interactions as each runqueue will
++be responsible for the scheduling latency and fairness of the tasks only on its
++own runqueue, and to achieve fairness and low latency across multiple CPUs, any
++advantage in throughput of having CPU local tasks causes other disadvantages.
++This is due to requiring a very complex balancing system to at best achieve some
++semblance of fairness across CPUs and can only maintain relatively low latency
++for tasks bound to the same CPUs, not across them. To increase said fairness
++and latency across CPUs, the advantage of local runqueue locking, which makes
++for better scalability, is lost due to having to grab multiple locks.
 +
- rtsig-max & rtsig-nr:
- 
- The file rtsig-max can be used to tune the maximum number
-Index: linux-2.6.35.5-ck1/include/linux/init_task.h
-===================================================================
---- linux-2.6.35.5-ck1.orig/include/linux/init_task.h	2010-08-02 11:12:25.000000000 +1000
-+++ linux-2.6.35.5-ck1/include/linux/init_task.h	2010-09-25 01:17:57.873918535 +1000
-@@ -106,6 +106,69 @@ extern struct cred init_cred;
-  *  INIT_TASK is used to set up the first task table, touch at
-  * your own risk!. Base=0, limit=0x1fffff (=2MB)
-  */
-+#ifdef CONFIG_SCHED_BFS
-+#define INIT_TASK(tsk)	\
-+{									\
-+	.state		= 0,						\
-+	.stack		= &init_thread_info,				\
-+	.usage		= ATOMIC_INIT(2),				\
-+	.flags		= PF_KTHREAD,					\
-+	.lock_depth	= -1,						\
-+	.prio		= NORMAL_PRIO,					\
-+	.static_prio	= MAX_PRIO-20,					\
-+	.normal_prio	= NORMAL_PRIO,					\
-+	.deadline	= 0,						\
-+	.policy		= SCHED_NORMAL,					\
-+	.cpus_allowed	= CPU_MASK_ALL,					\
-+	.mm		= NULL,						\
-+	.active_mm	= &init_mm,					\
-+	.run_list	= LIST_HEAD_INIT(tsk.run_list),			\
-+	.time_slice	= HZ,					\
-+	.tasks		= LIST_HEAD_INIT(tsk.tasks),			\
-+	.pushable_tasks = PLIST_NODE_INIT(tsk.pushable_tasks, MAX_PRIO), \
-+	.ptraced	= LIST_HEAD_INIT(tsk.ptraced),			\
-+	.ptrace_entry	= LIST_HEAD_INIT(tsk.ptrace_entry),		\
-+	.real_parent	= &tsk,						\
-+	.parent		= &tsk,						\
-+	.children	= LIST_HEAD_INIT(tsk.children),			\
-+	.sibling	= LIST_HEAD_INIT(tsk.sibling),			\
-+	.group_leader	= &tsk,						\
-+	.real_cred	= &init_cred,					\
-+	.cred		= &init_cred,					\
-+	.cred_guard_mutex =						\
-+		 __MUTEX_INITIALIZER(tsk.cred_guard_mutex),		\
-+	.comm		= "swapper",					\
-+	.thread		= INIT_THREAD,					\
-+	.fs		= &init_fs,					\
-+	.files		= &init_files,					\
-+	.signal		= &init_signals,				\
-+	.sighand	= &init_sighand,				\
-+	.nsproxy	= &init_nsproxy,				\
-+	.pending	= {						\
-+		.list = LIST_HEAD_INIT(tsk.pending.list),		\
-+		.signal = {{0}}},					\
-+	.blocked	= {{0}},					\
-+	.alloc_lock	= __SPIN_LOCK_UNLOCKED(tsk.alloc_lock),		\
-+	.journal_info	= NULL,						\
-+	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
-+	.fs_excl	= ATOMIC_INIT(0),				\
-+	.pi_lock	= __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock),		\
-+	.timer_slack_ns = 50000, /* 50 usec default slack */		\
-+	.pids = {							\
-+		[PIDTYPE_PID]  = INIT_PID_LINK(PIDTYPE_PID),		\
-+		[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),		\
-+		[PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),		\
-+	},								\
-+	.dirties = INIT_PROP_LOCAL_SINGLE(dirties),			\
-+	INIT_IDS							\
-+	INIT_PERF_EVENTS(tsk)						\
-+	INIT_TRACE_IRQFLAGS						\
-+	INIT_LOCKDEP							\
-+	INIT_FTRACE_GRAPH						\
-+	INIT_TRACE_RECURSION						\
-+	INIT_TASK_RCU_PREEMPT(tsk)					\
-+}
-+#else /* CONFIG_SCHED_BFS */
- #define INIT_TASK(tsk)	\
- {									\
- 	.state		= 0,						\
-@@ -173,7 +236,7 @@ extern struct cred init_cred;
- 	INIT_TRACE_RECURSION						\
- 	INIT_TASK_RCU_PREEMPT(tsk)					\
- }
--
-+#endif /* CONFIG_SCHED_BFS */
- 
- #define INIT_CPU_TIMERS(cpu_timers)					\
- {									\
-Index: linux-2.6.35.5-ck1/include/linux/sched.h
-===================================================================
---- linux-2.6.35.5-ck1.orig/include/linux/sched.h	2010-09-25 01:17:40.364012559 +1000
-+++ linux-2.6.35.5-ck1/include/linux/sched.h	2010-09-25 01:21:34.406120878 +1000
-@@ -36,8 +36,15 @@
- #define SCHED_FIFO		1
- #define SCHED_RR		2
- #define SCHED_BATCH		3
--/* SCHED_ISO: reserved but not implemented yet */
-+/* SCHED_ISO: Implemented on BFS only */
- #define SCHED_IDLE		5
-+#ifdef CONFIG_SCHED_BFS
-+#define SCHED_ISO		4
-+#define SCHED_IDLEPRIO		SCHED_IDLE
-+#define SCHED_MAX		(SCHED_IDLEPRIO)
-+#define SCHED_RANGE(policy)	((policy) <= SCHED_MAX)
-+#endif
++A significant feature of BFS is that all accounting is done purely based on CPU
++used and nowhere is sleep time used in any way to determine entitlement or
++interactivity. Interactivity "estimators" that use some kind of sleep/run
++algorithm are doomed to fail to detect all interactive tasks, and to falsely tag
++tasks that aren't interactive as being so. The reason for this is that it is
++close to impossible to determine that when a task is sleeping, whether it is
++doing it voluntarily, as in a userspace application waiting for input in the
++form of a mouse click or otherwise, or involuntarily, because it is waiting for
++another thread, process, I/O, kernel activity or whatever. Thus, such an
++estimator will introduce corner cases, and more heuristics will be required to
++cope with those corner cases, introducing more corner cases and failed
++interactivity detection and so on. Interactivity in BFS is built into the design
++by virtue of the fact that tasks that are waking up have not used up their quota
++of CPU time, and have earlier effective deadlines, thereby making it very likely
++they will preempt any CPU bound task of equivalent nice level. See below for
++more information on the virtual deadline mechanism. Even if they do not preempt
++a running task, because the rr interval is guaranteed to have a bound upper
++limit on how long a task will wait for, it will be scheduled within a timeframe
++that will not cause visible interface jitter.
 +
- /* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
- #define SCHED_RESET_ON_FORK     0x40000000
- 
-@@ -268,8 +275,6 @@ extern asmlinkage void schedule_tail(str
- extern void init_idle(struct task_struct *idle, int cpu);
- extern void init_idle_bootup_task(struct task_struct *idle);
- 
--extern int runqueue_is_locked(int cpu);
--
- extern cpumask_var_t nohz_cpu_mask;
- #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
- extern int select_nohz_load_balancer(int cpu);
-@@ -1173,17 +1178,31 @@ struct task_struct {
- 
- 	int lock_depth;		/* BKL lock depth */
- 
-+#ifndef CONFIG_SCHED_BFS
- #ifdef CONFIG_SMP
- #ifdef __ARCH_WANT_UNLOCKED_CTXSW
- 	int oncpu;
- #endif
- #endif
-+#else /* CONFIG_SCHED_BFS */
-+	int oncpu;
-+#endif
- 
- 	int prio, static_prio, normal_prio;
- 	unsigned int rt_priority;
-+#ifdef CONFIG_SCHED_BFS
-+	int time_slice;
-+	u64 deadline;
-+	struct list_head run_list;
-+	u64 last_ran;
-+	u64 sched_time; /* sched_clock time spent running */
 +
-+	unsigned long rt_timeout;
-+#else /* CONFIG_SCHED_BFS */
- 	const struct sched_class *sched_class;
- 	struct sched_entity se;
- 	struct sched_rt_entity rt;
-+#endif
- 
- #ifdef CONFIG_PREEMPT_NOTIFIERS
- 	/* list of struct preempt_notifier: */
-@@ -1278,6 +1297,9 @@ struct task_struct {
- 	int __user *clear_child_tid;		/* CLONE_CHILD_CLEARTID */
- 
- 	cputime_t utime, stime, utimescaled, stimescaled;
-+#ifdef CONFIG_SCHED_BFS
-+	unsigned long utime_pc, stime_pc;
-+#endif
- 	cputime_t gtime;
- #ifndef CONFIG_VIRT_CPU_ACCOUNTING
- 	cputime_t prev_utime, prev_stime;
-@@ -1500,6 +1522,67 @@ struct task_struct {
- #endif
- };
- 
-+#ifdef CONFIG_SCHED_BFS
-+extern int grunqueue_is_locked(void);
-+extern void grq_unlock_wait(void);
-+#define tsk_seruntime(t)		((t)->sched_time)
-+#define tsk_rttimeout(t)		((t)->rt_timeout)
++Design details.
 +
-+static inline void set_oom_timeslice(struct task_struct *p)
-+{
-+	p->time_slice = HZ;
-+}
++Task insertion.
 +
-+static inline void tsk_cpus_current(struct task_struct *p)
-+{
-+}
++BFS inserts tasks into each relevant queue as an O(1) insertion into a double
++linked list. On insertion, *every* running queue is checked to see if the newly
++queued task can run on any idle queue, or preempt the lowest running task on the
++system. This is how the cross-CPU scheduling of BFS achieves significantly lower
++latency per extra CPU the system has. In this case the lookup is, in the worst
++case scenario, O(n) where n is the number of CPUs on the system.
 +
-+#define runqueue_is_locked(cpu)	grunqueue_is_locked()
++Data protection.
 +
-+static inline void print_scheduler_version(void)
-+{
-+	printk(KERN_INFO"BFS CPU scheduler v0.350 by Con Kolivas.\n");
-+}
++BFS has one single lock protecting the process local data of every task in the
++global queue. Thus every insertion, removal and modification of task data in the
++global runqueue needs to grab the global lock. However, once a task is taken by
++a CPU, the CPU has its own local data copy of the running process' accounting
++information which only that CPU accesses and modifies (such as during a
++timer tick) thus allowing the accounting data to be updated lockless. Once a
++CPU has taken a task to run, it removes it from the global queue. Thus the
++global queue only ever has, at most,
 +
-+static inline int iso_task(struct task_struct *p)
-+{
-+	return (p->policy == SCHED_ISO);
-+}
-+extern void remove_cpu(unsigned long cpu);
-+#else /* CFS */
-+extern int runqueue_is_locked(int cpu);
-+#define tsk_seruntime(t)	((t)->se.sum_exec_runtime)
-+#define tsk_rttimeout(t)	((t)->rt.timeout)
++	(number of tasks requesting cpu time) - (number of logical CPUs) + 1
 +
-+static inline void sched_exit(struct task_struct *p)
-+{
-+}
++tasks in the global queue. This value is relevant for the time taken to look up
++tasks during scheduling. This will increase if many tasks with CPU affinity set
++in their policy to limit which CPUs they're allowed to run on if they outnumber
++the number of CPUs. The +1 is because when rescheduling a task, the CPU's
++currently running task is put back on the queue. Lookup will be described after
++the virtual deadline mechanism is explained.
 +
-+static inline void set_oom_timeslice(struct task_struct *p)
-+{
-+	p->rt.time_slice = HZ;
-+}
++Virtual deadline.
 +
-+static inline void tsk_cpus_current(struct task_struct *p)
-+{
-+	p->rt.nr_cpus_allowed = current->rt.nr_cpus_allowed;
-+}
++The key to achieving low latency, scheduling fairness, and "nice level"
++distribution in BFS is entirely in the virtual deadline mechanism. The one
++tunable in BFS is the rr_interval, or "round robin interval". This is the
++maximum time two SCHED_OTHER (or SCHED_NORMAL, the common scheduling policy)
++tasks of the same nice level will be running for, or looking at it the other
++way around, the longest duration two tasks of the same nice level will be
++delayed for. When a task requests cpu time, it is given a quota (time_slice)
++equal to the rr_interval and a virtual deadline. The virtual deadline is
++offset from the current time in jiffies by this equation:
 +
-+static inline void print_scheduler_version(void)
-+{
-+	printk(KERN_INFO"CFS CPU scheduler.\n");
-+}
++	jiffies + (prio_ratio * rr_interval)
 +
-+static inline int iso_task(struct task_struct *p)
-+{
-+	return 0;
-+}
++The prio_ratio is determined as a ratio compared to the baseline of nice -20
++and increases by 10% per nice level. The deadline is a virtual one only in that
++no guarantee is placed that a task will actually be scheduled by this time, but
++it is used to compare which task should go next. There are three components to
++how a task is next chosen. First is time_slice expiration. If a task runs out
++of its time_slice, it is descheduled, the time_slice is refilled, and the
++deadline reset to that formula above. Second is sleep, where a task no longer
++is requesting CPU for whatever reason. The time_slice and deadline are _not_
++adjusted in this case and are just carried over for when the task is next
++scheduled. Third is preemption, and that is when a newly waking task is deemed
++higher priority than a currently running task on any cpu by virtue of the fact
++that it has an earlier virtual deadline than the currently running task. The
++earlier deadline is the key to which task is next chosen for the first and
++second cases. Once a task is descheduled, it is put back on the queue, and an
++O(n) lookup of all queued-but-not-running tasks is done to determine which has
++the earliest deadline and that task is chosen to receive CPU next.
 +
-+static inline void remove_cpu(unsigned long cpu)
-+{
-+}
-+#endif /* CONFIG_SCHED_BFS */
++The CPU proportion of different nice tasks works out to be approximately the
 +
- /* Future-safe accessor for struct task_struct's cpus_allowed. */
- #define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
- 
-@@ -1518,9 +1601,19 @@ struct task_struct {
- 
- #define MAX_USER_RT_PRIO	100
- #define MAX_RT_PRIO		MAX_USER_RT_PRIO
-+#define DEFAULT_PRIO		(MAX_RT_PRIO + 20)
- 
-+#ifdef CONFIG_SCHED_BFS
-+#define PRIO_RANGE		(40)
-+#define MAX_PRIO		(MAX_RT_PRIO + PRIO_RANGE)
-+#define ISO_PRIO		(MAX_RT_PRIO)
-+#define NORMAL_PRIO		(MAX_RT_PRIO + 1)
-+#define IDLE_PRIO		(MAX_RT_PRIO + 2)
-+#define PRIO_LIMIT		((IDLE_PRIO) + 1)
-+#else /* CONFIG_SCHED_BFS */
- #define MAX_PRIO		(MAX_RT_PRIO + 40)
--#define DEFAULT_PRIO		(MAX_RT_PRIO + 20)
-+#define NORMAL_PRIO		DEFAULT_PRIO
-+#endif /* CONFIG_SCHED_BFS */
- 
- static inline int rt_prio(int prio)
- {
-@@ -1829,7 +1922,7 @@ task_sched_runtime(struct task_struct *t
- extern unsigned long long thread_group_sched_runtime(struct task_struct *task);
- 
- /* sched_exec is called by processes performing an exec */
--#ifdef CONFIG_SMP
-+#if defined(CONFIG_SMP) && !defined(CONFIG_SCHED_BFS)
- extern void sched_exec(void);
- #else
- #define sched_exec()   {}
-@@ -1993,6 +2086,9 @@ extern void wake_up_new_task(struct task
-  static inline void kick_process(struct task_struct *tsk) { }
- #endif
- extern void sched_fork(struct task_struct *p, int clone_flags);
-+#ifdef CONFIG_SCHED_BFS
-+extern void sched_exit(struct task_struct *p);
-+#endif
- extern void sched_dead(struct task_struct *p);
- 
- extern void proc_caches_init(void);
-Index: linux-2.6.35.5-ck1/kernel/sysctl.c
-===================================================================
---- linux-2.6.35.5-ck1.orig/kernel/sysctl.c	2010-08-02 11:12:25.000000000 +1000
-+++ linux-2.6.35.5-ck1/kernel/sysctl.c	2010-09-25 01:19:19.941498189 +1000
-@@ -115,7 +115,12 @@ static int zero;
- static int __maybe_unused one = 1;
- static int __maybe_unused two = 2;
- static unsigned long one_ul = 1;
--static int one_hundred = 100;
-+static int __maybe_unused one_hundred = 100;
-+#ifdef CONFIG_SCHED_BFS
-+extern int rr_interval;
-+extern int sched_iso_cpu;
-+static int __read_mostly one_thousand = 1000;
-+#endif
- #ifdef CONFIG_PRINTK
- static int ten_thousand = 10000;
- #endif
-@@ -252,7 +257,7 @@ static struct ctl_table root_table[] = {
- 	{ }
- };
- 
--#ifdef CONFIG_SCHED_DEBUG
-+#if defined(CONFIG_SCHED_DEBUG) && !defined(CONFIG_SCHED_BFS)
- static int min_sched_granularity_ns = 100000;		/* 100 usecs */
- static int max_sched_granularity_ns = NSEC_PER_SEC;	/* 1 second */
- static int min_wakeup_granularity_ns;			/* 0 usecs */
-@@ -269,6 +274,7 @@ static int max_extfrag_threshold = 1000;
- #endif
- 
- static struct ctl_table kern_table[] = {
-+#ifndef CONFIG_SCHED_BFS
- 	{
- 		.procname	= "sched_child_runs_first",
- 		.data		= &sysctl_sched_child_runs_first,
-@@ -382,6 +388,7 @@ static struct ctl_table kern_table[] = {
- 		.mode		= 0644,
- 		.proc_handler	= proc_dointvec,
- 	},
-+#endif /* !CONFIG_SCHED_BFS */
- #ifdef CONFIG_PROVE_LOCKING
- 	{
- 		.procname	= "prove_locking",
-@@ -779,6 +786,26 @@ static struct ctl_table kern_table[] = {
- 		.proc_handler	= proc_dointvec,
- 	},
- #endif
-+#ifdef CONFIG_SCHED_BFS
-+	{
-+		.procname	= "rr_interval",
-+		.data		= &rr_interval,
-+		.maxlen		= sizeof (int),
-+		.mode		= 0644,
-+		.proc_handler	= &proc_dointvec_minmax,
-+		.extra1		= &one,
-+		.extra2		= &one_thousand,
-+	},
-+	{
-+		.procname	= "iso_cpu",
-+		.data		= &sched_iso_cpu,
-+		.maxlen		= sizeof (int),
-+		.mode		= 0644,
-+		.proc_handler	= &proc_dointvec_minmax,
-+		.extra1		= &zero,
-+		.extra2		= &one_hundred,
-+	},
-+#endif
- #if defined(CONFIG_S390) && defined(CONFIG_SMP)
- 	{
- 		.procname	= "spin_retry",
-Index: linux-2.6.35.5-ck1/kernel/sched_bfs.c
-===================================================================
---- /dev/null	1970-01-01 00:00:00.000000000 +0000
-+++ linux-2.6.35.5-ck1/kernel/sched_bfs.c	2010-09-25 01:21:48.281964938 +1000
-@@ -0,0 +1,6984 @@
-+/*
-+ *  kernel/sched_bfs.c, was sched.c
-+ *
-+ *  Kernel scheduler and related syscalls
-+ *
-+ *  Copyright (C) 1991-2002  Linus Torvalds
-+ *
-+ *  1996-12-23  Modified by Dave Grothe to fix bugs in semaphores and
-+ *		make semaphores SMP safe
-+ *  1998-11-19	Implemented schedule_timeout() and related stuff
-+ *		by Andrea Arcangeli
-+ *  2002-01-04	New ultra-scalable O(1) scheduler by Ingo Molnar:
-+ *		hybrid priority-list and round-robin design with
-+ *		an array-switch method of distributing timeslices
-+ *		and per-CPU runqueues.  Cleanups and useful suggestions
-+ *		by Davide Libenzi, preemptible kernel bits by Robert Love.
-+ *  2003-09-03	Interactivity tuning by Con Kolivas.
-+ *  2004-04-02	Scheduler domains code by Nick Piggin
-+ *  2007-04-15  Work begun on replacing all interactivity tuning with a
-+ *              fair scheduling design by Con Kolivas.
-+ *  2007-05-05  Load balancing (smp-nice) and other improvements
-+ *              by Peter Williams
-+ *  2007-05-06  Interactivity improvements to CFS by Mike Galbraith
-+ *  2007-07-01  Group scheduling enhancements by Srivatsa Vaddagiri
<<Diff was trimmed, longer than 597 lines>>

---- CVS-web:
    http://cvs.pld-linux.org/cgi-bin/cvsweb.cgi/packages/kernel-desktop/kernel-desktop-sched-bfs.patch?r1=1.1.2.21&r2=1.1.2.22&f=u