These are patches designed to improve system responsiveness and interactivity
with specific emphasis on the desktop, but suitable to any commodity hardware workload.
Apply to 2.6.39:
patch-2.6.39-ck1.bz2
Broken out tarball:
2.6.39-ck1-broken-out.tar.bz2
Discrete patches:
patches
Ubuntu packages:
http://ck.kolivas.org/patches/Ubuntu%20Packages
All -ck patches:
http://www.kernel.org/pub/linux/kernel/people/ck/patches/
BFS by itself:
http://ck.kolivas.org/patches/bfs/
Web:
http://kernel.kolivas.org
Code blog when I feel like it:
http://ck-hack.blogspot.com/
Each discrete patch contains a brief description of what it does at the top of
the patch itself.
The most substantial change since the last public release is a major version upgrade to the BFS CPU scheduler version 0.404.
Full details of the most substantial changes, which went into version 0.400, are in my blog here:
http://ck-hack.blogspot.com/2011/04/bfs-0400.html
This version exhibits better throughput, better latencies, better behaviour with scaling cpu frequency governors (e.g. ondemand), better use of turbo modes in newer CPUs, and addresses a long-standing bug that affected all configurations, but was only demonstrable on lower Hz configurations (i.e. 100Hz) that caused fluctuating performance and latencies. Thus mobile configurations (e.g. Android on 100Hz) also perform better. The tuning for default round robin interval on all hardware is now set to 6ms (i.e. tuned primarily for latency). This can be easily modified with the rr_interval sysctl in BFS for special configurations (e.g. increase to 300 for encoding / folding machines).
Performance of BFS has been tested on lower power single core machines through various configuration SMP hardware, both threaded and multicore, up to 24x AMD. The 24x machine exhibited better throughput on optimally loaded kbuild performance (from make -j1 up to make -j24). Performance beyond this level of load did not match mainline. On folding benchmarks at 24x, BFS was consistently faster for the unbound (no cpu affinity in use) multi-threaded version. On 6x hardware, performance at all levels of load in kbuild and x264 encoding benchmarks was better than mainline in both throughput and latency in the presence of the workloads.
For 6 core results and graphs, see:
benchmarks 20110516
(desktop = 1000Hz + preempt, server = 100Hz + no preempt):
Here are some desktop config highlights:
Throughput at make -j6:
Latency in the presence of x264 ultrafast:
Throughput with x264 ultrafast:
This is not by any means a comprehensive performance analysis, nor is it meant to claim that BFS is better under all workloads and hardware than mainline. They are simply easily demonstrable advantages on some very common workloads on commodity hardware, and constitute a regular part of my regression testing. Thanks to Serge Belyshev for 6x results, statistical analysis and graphs.
Other changes in this patch release include an updated version of lru_cache_add_lru_tail as the previous version did not work entirely as planned, dropping the dirty ratio to the extreme value of 1 by default in decrease_default_dirty_ratio, and dropping of the cpufreq ondemand tweaks since BFS detects scaling CPUs internally now and works with them.
Full patchlist:
2.6.39-sched-bfs-404.patch
sched-add-above-background-load-function.patch
mm-zero_swappiness.patch
mm-enable_swaptoken_only_when_swap_full.patch
mm-drop_swap_cache_aggressively.patch
mm-kswapd_inherit_prio-1.patch
mm-background_scan.patch
mm-idleprio_prio-1.patch
mm-lru_cache_add_lru_tail-1.patch
mm-decrease_default_dirty_ratio.patch
kconfig-expose_vmsplit_option.patch
hz-default_1000.patch
hz-no_default_250.patch
hz-raise_max.patch
preempt-desktop-tune.patch
ck1-version.patch
Please enjoy!
お楽しみください
--
-ck
EDIT4: For those having hangs, please try this patch on top of ck1:
bfs404-test6.patch
A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, MuQSS, BFS and -ck.
Showing posts with label -ck. Show all posts
Showing posts with label -ck. Show all posts
Thursday, 19 May 2011
Thursday, 24 March 2011
2.6.38-ck1, BFS 0.363 for 2.6.38
TL;DR This post isn't about lrzip at last.
I've ported BFS and all previous -ck patches to 2.6.38.
2.6.38-ck1 can be grabbed here:
2.6.38-ck1
Ubuntu packages here:
Ubuntu Packages
and BFS for 2.6.38 can be grabbed here:
BFS 2.6.38
Apart from slight architectural changes between the kernel versions, and YET ANOTHER mainline rewrite of the CPU offlining code for suspend to ram/disk (which always causes problems with porting BFS since I have to rewrite my own parts of that code), this is the same BFS v0.363 as per the last release.
There is no "autogrouping by SID" in BFS or -ck. I remain unconvinced of any tangible benefit of such an approach for real world usage, and for the potential for problems and inability to apportion CPU when you actually want to.
Please report your experiences, but only if they're meaningful. I don't care how your PC performs if you do make -j 4096 unless you happen to have 4096 CPUs :)
Please enjoy! お楽しみ下さい
I've ported BFS and all previous -ck patches to 2.6.38.
2.6.38-ck1 can be grabbed here:
2.6.38-ck1
Ubuntu packages here:
Ubuntu Packages
and BFS for 2.6.38 can be grabbed here:
BFS 2.6.38
Apart from slight architectural changes between the kernel versions, and YET ANOTHER mainline rewrite of the CPU offlining code for suspend to ram/disk (which always causes problems with porting BFS since I have to rewrite my own parts of that code), this is the same BFS v0.363 as per the last release.
There is no "autogrouping by SID" in BFS or -ck. I remain unconvinced of any tangible benefit of such an approach for real world usage, and for the potential for problems and inability to apportion CPU when you actually want to.
Please report your experiences, but only if they're meaningful. I don't care how your PC performs if you do make -j 4096 unless you happen to have 4096 CPUs :)
Please enjoy! お楽しみ下さい
Thursday, 17 March 2011
2.6.38 and BFS/-ck releases
2.6.38 came out faster than I was expecting, and I've made no effort as of yet to port BFS or -ck to it. It looked like there were still bug reports on the last -rc on the linux kernel mailing list so I wasn't expecting the "stable" release to come out yet. Furthermore, I've been working very hard on the next major release of lrzip which is a massive rewrite so that has completely occupied my time and I am not done with it yet. So there will likely be quite some delay before I can release an official BFS/CK for 2.6.38. I also would like to watch what fallout, if any, there is from the autogrouping code and decide what I should do with respect to that feature. I'm sure some unofficial BFS ports will be available soon enough but obviously I won't be able to say with any confidence whether they can be trusted or not.
Friday, 29 October 2010
2.6.36-ck2
So the one line bug in BFS 357 is big enough that it may affect anyone with an Intel wireless containing laptop or similar. How do I know this? Well I hit it on my own laptop. I hit it on suspend after about 30 suspend to ram cycles. Nothing better than a big fat OOPS on your own hardware to make you feel obliged to update your own code. So given that 2.6.36-ck1 is barely one week old, and 2.6.36 is likely to be the "stable" 3 point release for 3 months, I've decided to release 2.6.36-ck2 with just one line of code changed. As per my last blog entry, I've just removed one BUG_ON which would cause an oops on BFS 357 when it's run on a 2.6.36 kernel. If you're not hitting this bug, there is no point whatsoever to upgrade from ck1 to ck2. For my own internal testing I'm using a WARN_ON_ONCE in place of the BUG_ON, to see if there's something meaningful in the bug that's a mainline problem, but it's safe to just remove this OOPS for everyone else.
Get it here either as a full patch or split out patches:
2.6.36-ck2
I've also updated the BFS 357 patch on my website with the one line fix, and given that it's still the same BFS otherwise, all I've done is change the filename rather than bump the BFS version number.
Get it here:
2.6.36-sched-bfs-357-1.patch
On an unrelated note, I finally got off my arse and fixed the long-standing install bug if DESTDIR was set on lrzip, which was a two line fix and reported by about a billion different people. I bumped the version up to 0.47 without any other changes.
lrzip on freshmeat
I've been looking at the lrzip code recently just for a change of pace. One of the things that bugged me about it is that I upgraded it a while back to truly be 64 bit in that it accepts 64 bit sized files, and can make the most of all ram on any sized 64 bit machines, but that caused regressions in file compression sizes and speed. What this means long term is that as file sizes get bigger, and machines get more and more ram, the compression of lrzip will get more and more impressive. However the reason it bugs me is that all that 64 bit addressing costs a lot in space. So I'm working on a scaled bitsize compression format for the next version now, which will only use as many bits as the compression window is. I've seen some modest improvements only, but they're worthwhile chasing. More on that front if I make progress soon.
Get it here either as a full patch or split out patches:
2.6.36-ck2
I've also updated the BFS 357 patch on my website with the one line fix, and given that it's still the same BFS otherwise, all I've done is change the filename rather than bump the BFS version number.
Get it here:
2.6.36-sched-bfs-357-1.patch
On an unrelated note, I finally got off my arse and fixed the long-standing install bug if DESTDIR was set on lrzip, which was a two line fix and reported by about a billion different people. I bumped the version up to 0.47 without any other changes.
lrzip on freshmeat
I've been looking at the lrzip code recently just for a change of pace. One of the things that bugged me about it is that I upgraded it a while back to truly be 64 bit in that it accepts 64 bit sized files, and can make the most of all ram on any sized 64 bit machines, but that caused regressions in file compression sizes and speed. What this means long term is that as file sizes get bigger, and machines get more and more ram, the compression of lrzip will get more and more impressive. However the reason it bugs me is that all that 64 bit addressing costs a lot in space. So I'm working on a scaled bitsize compression format for the next version now, which will only use as many bits as the compression window is. I've seen some modest improvements only, but they're worthwhile chasing. More on that front if I make progress soon.
Thursday, 21 October 2010
2.6.36-ck1
I'll keep it brief by just quoting the email I sent to lkml, just to get this announce out quickly.
These are patches designed to improve system responsiveness and interactivity
with specific emphasis on the desktop, but suitable to any workload.
Apply to 2.6.36:
patch-2.6.36-ck1.bz2
Broken out tarball:
2.6.36-ck1-broken-out.tar.bz2
Discrete patches:
patches
All -ck patches:
patches
Web:
kernel.kolivas.org
Code blog when I feel like it:
ck-hack.blogspot.com
Each discrete patch contains a brief description of what it does at the top of the patch itself.
The most significant change is an updated BFS cpu scheduler to BFS 357 (Magnum). It should pretty much behave like the older one, but is tighter with respect to keeping to its deadlines, and will continue to behave fairly when load is more than 8 * number of CPUs.
The other addition is to decrease the default dirty_ratio.
The rest is a resync only since 2.6.35-ck1.
Patch series:
2.6.36-sched-bfs-357.patch
sched-add-above-background-load-function.patch
mm-make_swappiness_really_mean_it.patch
mm-zero_swappiness.patch
mm-enable_swaptoken_only_when_swap_full.patch
mm-drop_swap_cache_aggressively.patch
mm-kswapd_inherit_prio-1.patch
mm-background_scan.patch
mm-idleprio_prio-1.patch
mm-lru_cache_add_lru_tail.patch
mm-decrease_default_dirty_ratio.patch
kconfig-expose_vmsplit_option.patch
hz-default_1000.patch
hz-no_default_250.patch
hz-raise_max.patch
preempt-desktop-tune.patch
cpufreq-bfs_tweaks.patch
ck1-version.patch
Those following the development of the patches for interactivity at massive load, I have COMPLETELY DROPPED them as they introduce regressions at normal workloads, and I cannot under any circumstances approve changes to improve behaviour at ridiculous workloads which affect regular ones. I still see precisely zero point at optimising for absurd workloads. Proving how many un-niced jobs you can throw at your kernel compiles is not a measure of one's prowess. It is just a mindless test.
Enjoy!
These are patches designed to improve system responsiveness and interactivity
with specific emphasis on the desktop, but suitable to any workload.
Apply to 2.6.36:
patch-2.6.36-ck1.bz2
Broken out tarball:
2.6.36-ck1-broken-out.tar.bz2
Discrete patches:
patches
All -ck patches:
patches
Web:
kernel.kolivas.org
Code blog when I feel like it:
ck-hack.blogspot.com
Each discrete patch contains a brief description of what it does at the top of the patch itself.
The most significant change is an updated BFS cpu scheduler to BFS 357 (Magnum). It should pretty much behave like the older one, but is tighter with respect to keeping to its deadlines, and will continue to behave fairly when load is more than 8 * number of CPUs.
The other addition is to decrease the default dirty_ratio.
The rest is a resync only since 2.6.35-ck1.
Patch series:
2.6.36-sched-bfs-357.patch
sched-add-above-background-load-function.patch
mm-make_swappiness_really_mean_it.patch
mm-zero_swappiness.patch
mm-enable_swaptoken_only_when_swap_full.patch
mm-drop_swap_cache_aggressively.patch
mm-kswapd_inherit_prio-1.patch
mm-background_scan.patch
mm-idleprio_prio-1.patch
mm-lru_cache_add_lru_tail.patch
mm-decrease_default_dirty_ratio.patch
kconfig-expose_vmsplit_option.patch
hz-default_1000.patch
hz-no_default_250.patch
hz-raise_max.patch
preempt-desktop-tune.patch
cpufreq-bfs_tweaks.patch
ck1-version.patch
Those following the development of the patches for interactivity at massive load, I have COMPLETELY DROPPED them as they introduce regressions at normal workloads, and I cannot under any circumstances approve changes to improve behaviour at ridiculous workloads which affect regular ones. I still see precisely zero point at optimising for absurd workloads. Proving how many un-niced jobs you can throw at your kernel compiles is not a measure of one's prowess. It is just a mindless test.
Enjoy!
Saturday, 16 October 2010
2.6.36-rc8-ck1
So another week passes and my attempt to minimise my workload by syncing up with the apparently last -rc for 2.6.36 was only a mild failure with a new "release candidate" coming out. (Does anyone else still have a problem with Linus calling his pre-releases "release candidates" any more? It still annoys the hell out of me). The reason it was only a mild failure for me is that the patches from 2.6.36-rc7-ck1 pretty much apply cleanly to 2.6.36-rc8.
So I've resynced all the 2.6.36-rc7-ck1 patches, and added a couple of things.
Firstly, I added a tiny patch which decreases the default dirty_ratio in the vm from 20 to 5. Here is the changelog in the patch:
The only other changes are to fold in the build fixes into BFS, fix minor typos in the documentation of the BFS 357 patch, and the add the bfs357-penalise_fork_depth.patch and bfs357-group_thread_accounting.patch patches as separate entities, but DISABLED by default. The effect of these patches has been discussed at great length on this blog before. See the tunables in /proc/sys/kernel to enable them. I'm pretty sure these patches will be dropped for 2.6.36-ck1 final due to the handful of regressions seen to date.
As per last time, the patches themselves are sneakily hidden within .lrz archives which means you'll have to suffer the pain of installing my lrzip application to use them. The patches are available in here: 2.6.36 prerelease patches
So I've resynced all the 2.6.36-rc7-ck1 patches, and added a couple of things.
Firstly, I added a tiny patch which decreases the default dirty_ratio in the vm from 20 to 5. Here is the changelog in the patch:
The default dirty ratio is chosen to be a compromise between throughput and
overall system latency. On a desktop, if an application writes to disk a lot,
that application should be the one to slow down rather than the desktop as a
whole. At higher dirty ratio settings, an application could write a lot to
disk and then happily use lots of CPU time after that while the rest of the
system is busy waiting on that naughty application's disk writes to complete
before anything else happening.
Lower ratios mean that applications that do a lot of disk writes end up
being responsible for their own actions and they're the ones that slow down
rather than the system in general.
This does decrease overall write throughput slightly, but to the benefit of
the latency of the system as a whole.
The only other changes are to fold in the build fixes into BFS, fix minor typos in the documentation of the BFS 357 patch, and the add the bfs357-penalise_fork_depth.patch and bfs357-group_thread_accounting.patch patches as separate entities, but DISABLED by default. The effect of these patches has been discussed at great length on this blog before. See the tunables in /proc/sys/kernel to enable them. I'm pretty sure these patches will be dropped for 2.6.36-ck1 final due to the handful of regressions seen to date.
As per last time, the patches themselves are sneakily hidden within .lrz archives which means you'll have to suffer the pain of installing my lrzip application to use them. The patches are available in here: 2.6.36 prerelease patches
Thursday, 7 October 2010
2.6.36-rc7 with -ck1 and BFS 357
Since I was on the tail end of my hack fest and Linus announced 2.6.36-rc7, saying it was likely the last -rc, I figured it was a good opportunity to sync up my patches with mainline. As always, the porting of BFS brought some unexpected surprises where a simple port would probably work, but likely fail long term. So there were lots of little subtle changes that I had to make to BFS. Functionally this is virtually the same as BFS 357 for 2.6.35.7, apart from some minor tweaks to avoid new warnings. There was one teensy change to niffy_diff to also ensure a minimum difference was observed according to ticks, and the minimum difference was decreased from 1us to anything greater than 0 as the niffy clock may well be updated in less than 1us. One nice thing also came about from the update. I managed to remove some code when I realised the nohz_load_balancer I'd been maintaining in the BFS code was simply me blindly porting it a while back and not even realising what it was for. Of course there is no load balancing on BFS since it has a global runqueue which means all CPUs are always in balance, so there's no need for any special case balancing on nohz configs.
For those who want some overview of what was required to port it, there were some subtle changes to the try_to_wake_up code for notifying when workers are going to sleep with workqueues. Some reshuffling of what happens on context switch was ported. Some sched domains code was updated. rlimit code was tweaked. nohz balancing code was dropped. Checking that apparently idle CPUs were actually online was added to cope with changes on forking idle tasks on .36. And random other stuff I can't remember.
It's worth noting that you'll need a beta driver from nvidia if you're evil like me and use their evil binary drivers. See here: nvnews link for their latest drivers.
Anyway here's a directory that contains lrz compressed versions of all the patches, and an lrz compressed all-inclusive -ck1 and bfs357 patch. It's my secret plan that those wishing to try my pre-release patches must also grab lrzip, which I wrote, to access them :)
http://ck.kolivas.org/patches/2.6/2.6.36/
EDIT2: If you enable schedstats, you will need the patch called 2636rc7ck1-fixes.patch in that directory added to prevent build failures.
For those who want some overview of what was required to port it, there were some subtle changes to the try_to_wake_up code for notifying when workers are going to sleep with workqueues. Some reshuffling of what happens on context switch was ported. Some sched domains code was updated. rlimit code was tweaked. nohz balancing code was dropped. Checking that apparently idle CPUs were actually online was added to cope with changes on forking idle tasks on .36. And random other stuff I can't remember.
It's worth noting that you'll need a beta driver from nvidia if you're evil like me and use their evil binary drivers. See here: nvnews link for their latest drivers.
Anyway here's a directory that contains lrz compressed versions of all the patches, and an lrz compressed all-inclusive -ck1 and bfs357 patch. It's my secret plan that those wishing to try my pre-release patches must also grab lrzip, which I wrote, to access them :)
http://ck.kolivas.org/patches/2.6/2.6.36/
EDIT2: If you enable schedstats, you will need the patch called 2636rc7ck1-fixes.patch in that directory added to prevent build failures.
Subscribe to:
Posts (Atom)