These are patches designed to improve system responsiveness and interactivity
with specific emphasis on the desktop, but suitable to any commodity hardware workload.
Apply to 2.6.39:
Broken out tarball:
All -ck patches:
BFS by itself:
Code blog when I feel like it:
Each discrete patch contains a brief description of what it does at the top of
the patch itself.
The most substantial change since the last public release is a major version upgrade to the BFS CPU scheduler version 0.404.
Full details of the most substantial changes, which went into version 0.400, are in my blog here:
This version exhibits better throughput, better latencies, better behaviour with scaling cpu frequency governors (e.g. ondemand), better use of turbo modes in newer CPUs, and addresses a long-standing bug that affected all configurations, but was only demonstrable on lower Hz configurations (i.e. 100Hz) that caused fluctuating performance and latencies. Thus mobile configurations (e.g. Android on 100Hz) also perform better. The tuning for default round robin interval on all hardware is now set to 6ms (i.e. tuned primarily for latency). This can be easily modified with the rr_interval sysctl in BFS for special configurations (e.g. increase to 300 for encoding / folding machines).
Performance of BFS has been tested on lower power single core machines through various configuration SMP hardware, both threaded and multicore, up to 24x AMD. The 24x machine exhibited better throughput on optimally loaded kbuild performance (from make -j1 up to make -j24). Performance beyond this level of load did not match mainline. On folding benchmarks at 24x, BFS was consistently faster for the unbound (no cpu affinity in use) multi-threaded version. On 6x hardware, performance at all levels of load in kbuild and x264 encoding benchmarks was better than mainline in both throughput and latency in the presence of the workloads.
For 6 core results and graphs, see:
(desktop = 1000Hz + preempt, server = 100Hz + no preempt):
Here are some desktop config highlights:
Throughput at make -j6:
Latency in the presence of x264 ultrafast:
Throughput with x264 ultrafast:
This is not by any means a comprehensive performance analysis, nor is it meant to claim that BFS is better under all workloads and hardware than mainline. They are simply easily demonstrable advantages on some very common workloads on commodity hardware, and constitute a regular part of my regression testing. Thanks to Serge Belyshev for 6x results, statistical analysis and graphs.
Other changes in this patch release include an updated version of lru_cache_add_lru_tail as the previous version did not work entirely as planned, dropping the dirty ratio to the extreme value of 1 by default in decrease_default_dirty_ratio, and dropping of the cpufreq ondemand tweaks since BFS detects scaling CPUs internally now and works with them.
EDIT4: For those having hangs, please try this patch on top of ck1: