Announcing a resync and update of the BFS CPU scheduler for linux-3.12
BFS by itself:
http://ck.kolivas.org/patches/bfs/3.0/3.12/3.12-sched-bfs-443.patch
CK branded BFS:
http://ck.kolivas.org/patches/3.0/3.12/3.12-ck1/
Apologies for the delays. I've been swamped by other projects (o.k. I lie, mainly just cgminer). The changes in this new version, apart from the obvious resync with mainline, are some timing fixes courtesy of Olivier Langlois (Thanks!) and a concerted effort to make suspend to RAM/resume work properly.
Enjoy!
お楽しみください
A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, MuQSS, BFS and -ck.
Showing posts with label bfs. Show all posts
Showing posts with label bfs. Show all posts
Monday, 18 November 2013
Monday, 9 September 2013
BFS 0.441, 3.11-ck1
Announcing a resync and update of the BFS CPU scheduler for linux-3.11
BFS by itself:
3.11-sched-bfs-441.patch
Full -ck1 patchset including separate patches:
3.11-ck1
Apart from the usual resync to keep up with the mainline churn, there are a few additions from BFS 0.440. A number of changes dealing with wake lists as done by mainline were added that were missing from the previous code. There is a good chance that these were responsible for a large proportion of the suspend/resume issues people were having with BFS post linux 3.8. Of course I can't guarantee that all issues have been resolved, but it has been far more stable in my testing so far.
The other significant change is to check for throttled CPUs when choosing an idle CPU to move a process to, which should impact the behaviour and possibly throughput when using a scaling CPU governor, such as ondemand.
Those of you still using the evil proprietary Nvidia binary driver (as I still do) will encounter some issues and will need to use a patched pre-release driver from them if you build it yourself, until they release a new driver.
That is all for now.
Enjoy!
お楽しみください
BFS by itself:
3.11-sched-bfs-441.patch
Full -ck1 patchset including separate patches:
3.11-ck1
Apart from the usual resync to keep up with the mainline churn, there are a few additions from BFS 0.440. A number of changes dealing with wake lists as done by mainline were added that were missing from the previous code. There is a good chance that these were responsible for a large proportion of the suspend/resume issues people were having with BFS post linux 3.8. Of course I can't guarantee that all issues have been resolved, but it has been far more stable in my testing so far.
The other significant change is to check for throttled CPUs when choosing an idle CPU to move a process to, which should impact the behaviour and possibly throughput when using a scaling CPU governor, such as ondemand.
Those of you still using the evil proprietary Nvidia binary driver (as I still do) will encounter some issues and will need to use a patched pre-release driver from them if you build it yourself, until they release a new driver.
That is all for now.
Enjoy!
お楽しみください
Wednesday, 10 July 2013
BFS 0.440, -ck1 for linux-3.10
I finally managed to set up some 3g wireless internet in this remote mountain village I'm staying in (probably the first to ever do so). After a few revisions I was able to bring BFS into line with mainline. There are no significant changes to the design itself, but hopefully a few minor fixes have come along as a result of the resync as I also carved out bits of code not relevant to BFS and tinkered with the shutdown mechanism a bit more. As for the new tickless on busy CPU feature from mainline, it is not being offered in BFS as it is quite orthogonal to a design that so easily moves tasks from one CPU to another, and it provides no advantage for desktop/laptop/tablet/PDA/mobile device/phone/router etc. which BFS is targeted towards.
Some of the configuration code was also changed since the last version allowed you to generate an invalid configuration. You might get some strange warnings about the IRQ TIME ACCOUNTING configuration option but it should be harmless.
Get BFS by itself for 3.10.0 here:
3.10-sched-bfs-440.patch
After careful consideration, I've decided to remove the remaining -ck patches and just make the -ck patchset BFS with some extra default config options and the -ck tag. As I've said previously, those other patches were from long ago, the kernel has changed a lot since then, and I've been unable to confirm they do anything useful any more, whereas there have been reports of regressions with them.
Get the -ck tagged patchset here:
3.10-ck1
Enjoy!
お楽しみください
Some of the configuration code was also changed since the last version allowed you to generate an invalid configuration. You might get some strange warnings about the IRQ TIME ACCOUNTING configuration option but it should be harmless.
Get BFS by itself for 3.10.0 here:
3.10-sched-bfs-440.patch
After careful consideration, I've decided to remove the remaining -ck patches and just make the -ck patchset BFS with some extra default config options and the -ck tag. As I've said previously, those other patches were from long ago, the kernel has changed a lot since then, and I've been unable to confirm they do anything useful any more, whereas there have been reports of regressions with them.
Get the -ck tagged patchset here:
3.10-ck1
Enjoy!
お楽しみください
Tuesday, 7 May 2013
BFS 0.430, -ck1 for linux-3.9.x
Announcing a resync/update of the BFS and -ck patchsets for linux-3.9
Full ck patch:
http://ck.kolivas.org/patches/3.0/3.9/3.9-ck1/
BFS only patch:
http://ck.kolivas.org/patches/bfs/3.0/3.9/3.9-sched-bfs-430.patch
The full set of incremental patches is here:
http://ck.kolivas.org/patches/bfs/3.0/3.9/Incremental/
The changes to BFS include a resync from BFS 0.428, updated to work with changes from the latest mainline kernel, and numerous CPU accounting improvements courtesy of Olivier Langlois (thanks again!).
For those who tried the -ck1 release candidate patch I posted, this patch is unchanged. The only issue that showed up was a mostly cosmetic quirk with not being able to change the CPU accounting type, even though it appears you should be able to. BFS mandates high res IRQ accounting so there is no point trying to change it.
Lately my VPS provider (rapidxen) has been nothing short of appalling with incredible amounts of downtime, packet loss and IP changes without notification. They also repeatedly send me abuse complaints that I have to respond to for my software being (falsely) tagged as viruses. Luckily I have a move planned in the near future - including where and how - when time permits, but if you find my server doesn't respond, apologies.
Enjoy!
お楽しみください
EDIT: There were some fairly dramatic CPU offline code changes to mainline (YET AGAIN!) and the changes to BFS to make it work were fairly significant so there may once again be issues with power off/reboot/suspend/hibernate. It gets tiresome watching the same code being rehashed in many different ways... "because this time we'll do it right".
Full ck patch:
http://ck.kolivas.org/patches/3.0/3.9/3.9-ck1/
BFS only patch:
http://ck.kolivas.org/patches/bfs/3.0/3.9/3.9-sched-bfs-430.patch
The full set of incremental patches is here:
http://ck.kolivas.org/patches/bfs/3.0/3.9/Incremental/
The changes to BFS include a resync from BFS 0.428, updated to work with changes from the latest mainline kernel, and numerous CPU accounting improvements courtesy of Olivier Langlois (thanks again!).
For those who tried the -ck1 release candidate patch I posted, this patch is unchanged. The only issue that showed up was a mostly cosmetic quirk with not being able to change the CPU accounting type, even though it appears you should be able to. BFS mandates high res IRQ accounting so there is no point trying to change it.
Lately my VPS provider (rapidxen) has been nothing short of appalling with incredible amounts of downtime, packet loss and IP changes without notification. They also repeatedly send me abuse complaints that I have to respond to for my software being (falsely) tagged as viruses. Luckily I have a move planned in the near future - including where and how - when time permits, but if you find my server doesn't respond, apologies.
Enjoy!
お楽しみください
EDIT: There were some fairly dramatic CPU offline code changes to mainline (YET AGAIN!) and the changes to BFS to make it work were fairly significant so there may once again be issues with power off/reboot/suspend/hibernate. It gets tiresome watching the same code being rehashed in many different ways... "because this time we'll do it right".
Monday, 4 March 2013
BFS 0.428 for linux-3.8.x
Announcing a resync of the BFS and -ck patchsets for linux-3.8
Full ck patch:
http://ck.kolivas.org/patches/3.0/3.8/3.8-ck1/
BFS only patch:
http://ck.kolivas.org/patches/bfs/3.0/3.8/3.8-sched-bfs-428.patch
The full set of incremental patches is here:
http://ck.kolivas.org/patches/bfs/3.0/3.8/Incremental/
The only changes to BFS include a resync from BFS 0.427, and a micro-optimisation to the CPU accounting courtesy of Olivier Langlois (thanks!). See the incremental patch for details.
As for the -ck patchset, I am dropping the patches that no longer seem to reliably work that set sysctl values since distributions seem to change them, along with removing patches of dubious utility.
Enjoy!
お楽しみください
Full ck patch:
http://ck.kolivas.org/patches/3.0/3.8/3.8-ck1/
BFS only patch:
http://ck.kolivas.org/patches/bfs/3.0/3.8/3.8-sched-bfs-428.patch
The full set of incremental patches is here:
http://ck.kolivas.org/patches/bfs/3.0/3.8/Incremental/
The only changes to BFS include a resync from BFS 0.427, and a micro-optimisation to the CPU accounting courtesy of Olivier Langlois (thanks!). See the incremental patch for details.
As for the -ck patchset, I am dropping the patches that no longer seem to reliably work that set sysctl values since distributions seem to change them, along with removing patches of dubious utility.
Enjoy!
お楽しみください
Tuesday, 29 January 2013
BFS 0.427 for linux 3.7.x
Announcing an updated BFS patch for linux 3.7, version 0.427
Full patch:
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-sched-bfs-427.patch
Incremental patch from bfs 426 (applies to 3.7.x-ck1 as well):
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-bfs426-427.patch
The full set of incremental patches, along with a description within each patch is here:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/
A number of minor issues have been reported with BFS over time (interestingly none of them appear to be new). Some of them were cosmetic, like the reported suspicious rcu warning on startup, and the accounting for close to 100% bound cpu tasks flicking between 99 and 101%.
The most interesting actual bug was that clock_nanosleep, and timer_create would not work when used with the clock id of CLOCK_PROCESS_CPUTIME_ID. This is a timer which goes off based on the total CPU used by a process or its thread group, which I have never used myself nor was aware of its intricacies. This bug was only picked up as part of building and glibc testing by Olivier Langlois. This was an interesting bug for a number of reasons to me. First was that it had never manifested as far as I'm aware anywhere in the wild despite being a posix 2001 function, so presumably it is almost never used. Second is it's one of the few functions that tries to get accounting as a total of all the CPU used by a thread group rather than just per thread. Third is that you cannot really use clock_nanosleep with this clock id unless it is done from a separate thread to the one consuming CPU (since it puts the calling thread to sleep) so there would be precious few scenarios it would be in use currently, though coding multithreaded apps that use it for resource monitoring and control would make complete sense. Finally the most interesting part was I can now tell that it had been in BFS since its first release and no one had ever noticed as far as I'm aware.
Unfortunately it took me quite a while to find since I had to dig deep into figuring out how the whole system of timers works on a low level in the kernel before finally stumbling across one tiny piece of accounting/reporting that was missing on BFS. It's funny that a bug that directly affected almost no one should be so hard to track down. In the meantime it allowed me to tweak a number of bits of internal accounting so hopefully that should have improved as well.
Please enjoy.
お楽しみください
Full patch:
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-sched-bfs-427.patch
Incremental patch from bfs 426 (applies to 3.7.x-ck1 as well):
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-bfs426-427.patch
The full set of incremental patches, along with a description within each patch is here:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/
A number of minor issues have been reported with BFS over time (interestingly none of them appear to be new). Some of them were cosmetic, like the reported suspicious rcu warning on startup, and the accounting for close to 100% bound cpu tasks flicking between 99 and 101%.
The most interesting actual bug was that clock_nanosleep, and timer_create would not work when used with the clock id of CLOCK_PROCESS_CPUTIME_ID. This is a timer which goes off based on the total CPU used by a process or its thread group, which I have never used myself nor was aware of its intricacies. This bug was only picked up as part of building and glibc testing by Olivier Langlois. This was an interesting bug for a number of reasons to me. First was that it had never manifested as far as I'm aware anywhere in the wild despite being a posix 2001 function, so presumably it is almost never used. Second is it's one of the few functions that tries to get accounting as a total of all the CPU used by a thread group rather than just per thread. Third is that you cannot really use clock_nanosleep with this clock id unless it is done from a separate thread to the one consuming CPU (since it puts the calling thread to sleep) so there would be precious few scenarios it would be in use currently, though coding multithreaded apps that use it for resource monitoring and control would make complete sense. Finally the most interesting part was I can now tell that it had been in BFS since its first release and no one had ever noticed as far as I'm aware.
Unfortunately it took me quite a while to find since I had to dig deep into figuring out how the whole system of timers works on a low level in the kernel before finally stumbling across one tiny piece of accounting/reporting that was missing on BFS. It's funny that a bug that directly affected almost no one should be so hard to track down. In the meantime it allowed me to tweak a number of bits of internal accounting so hopefully that should have improved as well.
Please enjoy.
お楽しみください
Saturday, 15 December 2012
3.7-ck1, BFS 426 for linux-3.7
Some degree of normality has returned to my life, so I bring to you a resync of the BFS cpu scheduler for 3.7, along with the -ck patches to date.
Apply to 3.7.x:
patch-3.7-ck1.bz2
or
patch-3.7-ck1.lrz
Broken out tarball:
3.7-ck1-broken-out.tar.bz2
or
3.7-ck1-broken-out.tar.lrz
Discrete patches:
patches
Latest BFS by itself:
3.7-sched-bfs-426.patch
People often ask me why I don't maintain a git tree of my patches or at least BFS and make it easier on myself and those who download it. As it turns out, it is actually less work only for those who download it to have a git tree and would actually be more work for me to maintain a git tree.
While I'm sure most people are shaking their head and thinking I'm just some kind of git-phobe, I'll try to explain (Note that I maintain git trees for lrzip https://github.com/ckolivas/lrzip and cgminer https://github.com/ckolivas/cgminer).
I do NOT keep track of the linux kernel patches as they come in during the development phase prior to the latest stable release. Unfortunately I simply do not have the time nor the inclination to care on that level any more about linux kernel. However I still do believe quite a lot in what BFS has to offer. If I watched each patch as it came into git, I could simply keep my fork with BFS and merge the linux kernel patches as they came in, resyncing and modifying as it went along with the changes. When new patches go into the kernel, there is a common pattern of many changes occurring shortly after they're merged, with a few fixes going in, some files being moved around a few times, and occasionally the patch backed out when it's found the patch introduces some nasty regression that proves a showstopper to it being released. Each one of these changes - fixes, moves, renames, removal, require a resync if you are maintaining a fork.
The way I've coded up the actual BFS patch itself is to be as unobtrusive as possible - it does not actually replace large chunks of code en bloc, just adding files and redirecting builds to use those new files instead of the mainline files. This is done to minimise how much effort it is to resync when new changes come. The vast majority of the time, only trivial changes need to be made for the patch to even apply. Thus applying an old patch to a new kernel just needs fixes to apply (even if it doesn't build). This is usually the first step I do in syncing BFS, and I end up with something like this after fixing the rejects:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/3.7-sched-bfs-425.patch
This patch is only the 3.6 patch fixing any chunks that don't apply.
After that, I go through the incremental changes from mainline 3.6 to 3.7 to see any scheduler related changes that should be applied to BFS to 1. make it build with API changes in mainline and 2. benefit from any new features going into mainline that are relevant to the scheduler in general. I manually add the changes and end up with an incremental patch like this:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/bfs425-merge.patch
This patch is only merging 3.6->3.7 changes into BFS itself
Finally I actually apply any new changes to BFS since the last major release, bugfixes or improvements as the case may be, as per this patch here:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/bfs425-updates.patch
Git is an excellent source control tool, but provides me with almost nothing for this sort of process where a patch is synced up after 3 months of development. If I were to have my fork and then start merging all patches between 3.6 and 3.7, it would fail to merge new changes probably dozens and potentially hundreds of times along the way, each requiring manual correction. While merge conflicts are just as easy to resolve with git as they are with patch, they aren't actually easier, and instead of there being conflicts precisely once in the development process, there are likely many with this approach.
However git also does not provide me with any way to port new changes from mainline to the BFS patch itself. They still need to be applied manually, and if changes occur along the way between 3.6 stable through 3.7-rc unstable to 3.7 stable, each time a change occurs to mainline, the change needs to be done to BFS. Thus I end up reproducing all the bugfixes, moves, renames and back-outs that mainline does along the way, instead of just doing it once.
Hopefully this gives some insight into the process and why git is actually counter-productive to BFS syncing.
Enjoy 3.7 BFS.
お楽しみください
Apply to 3.7.x:
patch-3.7-ck1.bz2
or
patch-3.7-ck1.lrz
Broken out tarball:
3.7-ck1-broken-out.tar.bz2
or
3.7-ck1-broken-out.tar.lrz
Discrete patches:
patches
Latest BFS by itself:
3.7-sched-bfs-426.patch
People often ask me why I don't maintain a git tree of my patches or at least BFS and make it easier on myself and those who download it. As it turns out, it is actually less work only for those who download it to have a git tree and would actually be more work for me to maintain a git tree.
While I'm sure most people are shaking their head and thinking I'm just some kind of git-phobe, I'll try to explain (Note that I maintain git trees for lrzip https://github.com/ckolivas/lrzip and cgminer https://github.com/ckolivas/cgminer).
I do NOT keep track of the linux kernel patches as they come in during the development phase prior to the latest stable release. Unfortunately I simply do not have the time nor the inclination to care on that level any more about linux kernel. However I still do believe quite a lot in what BFS has to offer. If I watched each patch as it came into git, I could simply keep my fork with BFS and merge the linux kernel patches as they came in, resyncing and modifying as it went along with the changes. When new patches go into the kernel, there is a common pattern of many changes occurring shortly after they're merged, with a few fixes going in, some files being moved around a few times, and occasionally the patch backed out when it's found the patch introduces some nasty regression that proves a showstopper to it being released. Each one of these changes - fixes, moves, renames, removal, require a resync if you are maintaining a fork.
The way I've coded up the actual BFS patch itself is to be as unobtrusive as possible - it does not actually replace large chunks of code en bloc, just adding files and redirecting builds to use those new files instead of the mainline files. This is done to minimise how much effort it is to resync when new changes come. The vast majority of the time, only trivial changes need to be made for the patch to even apply. Thus applying an old patch to a new kernel just needs fixes to apply (even if it doesn't build). This is usually the first step I do in syncing BFS, and I end up with something like this after fixing the rejects:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/3.7-sched-bfs-425.patch
This patch is only the 3.6 patch fixing any chunks that don't apply.
After that, I go through the incremental changes from mainline 3.6 to 3.7 to see any scheduler related changes that should be applied to BFS to 1. make it build with API changes in mainline and 2. benefit from any new features going into mainline that are relevant to the scheduler in general. I manually add the changes and end up with an incremental patch like this:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/bfs425-merge.patch
This patch is only merging 3.6->3.7 changes into BFS itself
Finally I actually apply any new changes to BFS since the last major release, bugfixes or improvements as the case may be, as per this patch here:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/bfs425-updates.patch
Git is an excellent source control tool, but provides me with almost nothing for this sort of process where a patch is synced up after 3 months of development. If I were to have my fork and then start merging all patches between 3.6 and 3.7, it would fail to merge new changes probably dozens and potentially hundreds of times along the way, each requiring manual correction. While merge conflicts are just as easy to resolve with git as they are with patch, they aren't actually easier, and instead of there being conflicts precisely once in the development process, there are likely many with this approach.
However git also does not provide me with any way to port new changes from mainline to the BFS patch itself. They still need to be applied manually, and if changes occur along the way between 3.6 stable through 3.7-rc unstable to 3.7 stable, each time a change occurs to mainline, the change needs to be done to BFS. Thus I end up reproducing all the bugfixes, moves, renames and back-outs that mainline does along the way, instead of just doing it once.
Hopefully this gives some insight into the process and why git is actually counter-productive to BFS syncing.
Enjoy 3.7 BFS.
お楽しみください
Tuesday, 13 November 2012
BFS related
Unfortunately my life remains in disarray with respect to dealing with my brother's issues.
However a couple of BFS related things came in my mail and I thought I'd share them.
One is graysky's performance comparison of current schedulers across various hardware in this comprehensive pdf:
cpu_schedulers_compared.pdf
The second is this rather cute set of T-shirts that Teb had made for himself and his girlfriend:
However a couple of BFS related things came in my mail and I thought I'd share them.
One is graysky's performance comparison of current schedulers across various hardware in this comprehensive pdf:
cpu_schedulers_compared.pdf
The second is this rather cute set of T-shirts that Teb had made for himself and his girlfriend:
Thursday, 16 August 2012
3.5-ck1, BFS 424 for linux-3.5
Thanks to those who have been providing interim patches porting BFS to linux 3.5 while I've been busy!
Finally I found some downtime from my current coding contract work to port BFS and -ck to linux 3.5, and here is the announce below:
These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but suitable to any commodity hardware workload. Apply to 3.5.x: http://ck.kolivas.org/patches/3.0/3.5/3.5-ck1/patch-3.5-ck1.bz2 or http://ck.kolivas.org/patches/3.0/3.5/3.5-ck1/patch-3.5-ck1.lrz Broken out tarball: http://ck.kolivas.org/patches/3.0/3.5/3.5-ck1/3.5-ck1-broken-out.tar.bz2 or http://ck.kolivas.org/patches/3.0/3.5/3.5-ck1/3.5-ck1-broken-out.tar.lrz Discrete patches: http://ck.kolivas.org/patches/3.0/3.5/3.5-ck1/patches/ Latest BFS by itself: http://ck.kolivas.org/patches/bfs/3.5.0/3.5-sched-bfs-424.patch Web: http://kernel.kolivas.org Code blog when I feel like it: http://ck-hack.blogspot.com/ This is a resync from 3.4-ck3. However, the broken out tarballs above also include the upgradeable rwlocks patch, and a modification of the global runqueue in BFS to use the urwlocks. These are NOT applied in the -ck1 patch, but can be applied manually at the end of the series as indicated by the series file. It is currently of no demonstrable performance advantage OR detriment in its current state, but is code for future development. Enjoy! お楽しみください -- -ck
Tuesday, 31 July 2012
BFS and -ck delays for linux-3.5.0
Once again I find myself writing a post saying there will be delays with the resync of BFS and -ck for the new linux kernel. This time the reason for most people would be a quite unexpected development. As you may have read on this blog last year, I got invited to interview with Google for a job as a software engineer and then in the end I got turned down due to lack of adequate breadth of knowledge. This was probably for the best for me anyway since I have a full time unrelated career and the jump would have been too great. Anyway a small company noticed the work I had done on cgminer with bitcoin and openCL work and asked if I was interested in writing some software for them. The work involves writing openCL frameworks so they can provide distributed computing capability to clients. They were quire happy to forego any of the regular interview details or pretty much anything that is normally involved in employing someone and before long we started talking contracts instead. Since the work itself actually looked like a lot of fun, I decided to go with the opportunity.
Anyway, long story short, I'm doing a little bit of contract work for them and my kernel work will take a slightly lower priority in the meantime. I'm not abandoning it, but it will be delayed some more before the next release. Apologies for any inconvenience this may cause in the interim.
Anyway, long story short, I'm doing a little bit of contract work for them and my kernel work will take a slightly lower priority in the meantime. I'm not abandoning it, but it will be delayed some more before the next release. Apologies for any inconvenience this may cause in the interim.
Tuesday, 3 July 2012
BFS 424, linux-3.4-ck3
As seen on this blog previously, a bug showed up in 3.4-ck2/BFS 423 to do with unplugged I/O management that would lead to severe stalls/hangs. I'm releasing BFS 424 officially and upgrading 3.4-ck2 to 3.4-ck3, incorporating just this one change.
BFS 424:
3.4-sched-bfs-424.patch
3.4-ck3:
3.4-ck3/
Others on -ck2 can simply apply the incremental patch to be up to date.
3.4bfs423-424.patch
Enjoy!
お楽しみください
BFS 424:
3.4-sched-bfs-424.patch
3.4-ck3:
3.4-ck3/
Others on -ck2 can simply apply the incremental patch to be up to date.
3.4bfs423-424.patch
Enjoy!
お楽しみください
Sunday, 1 July 2012
BFS 424 test
A couple of bug reports mostly related to disk I/O seem to have cropped up with BFS 423/3.4-ck2. The likely culprit seems to be the plugged I/O management within schedule() that I modified going from BFS 420 to BFS 423, when I adopted mainline's approach to managing the plugged I/O. It appears that the mechanism I had put in place for BFS was the correct one, and mainline's approach does not work (for BFS) so I've backed out that change and increased the version number. Here is the test patch:
bfs423-424.patch
Those with issues of any sort related to BFS 423 or ck2, please test this patch on top of the previous BFS patched kernel. Thanks!
bfs423-424.patch
Those with issues of any sort related to BFS 423 or ck2, please test this patch on top of the previous BFS patched kernel. Thanks!
Monday, 11 June 2012
Upgradeable rwlocks and BFS
I've been experimenting with improving the locking scheme in BFS and had a few ideas. I'm particularly attached to the global runqueue that makes BFS what it is and obviously having one queue that has one lock protecting all its data structures accessed by multiple CPUs will end up having quite significant scalability limits with many CPUs. Much like a lot of the scalability work, it tends to have the opposite effect with smaller hardware (i.e. the more scalable you make something, the more it will tend to harm the smaller hardware). Fortunately most of the scalability issues with BFS are pretty much irrelevant until you have more than 16 logical CPUs, but we've now reached the point where 8 logical CPUs is not unusual for a standard PC. Whether or not we actually need so many logical CPUs or not and can use them appropriately is an argument for a different time and place, but they're here. So I've always had at the back of my mind how I might go about making BFS more scalable in the long term and locking is one obvious area.
Unfortunately time and realistic limitations means I'm unlikely to ever do anything on a grand scale or be able to support it. However I had ideas for how to change the locking long-term but it would require lots of incremental steps. After all that rambling, this post is about the first step to changing it. Like some of the experimental steps of the past (such as skip lists), there is absolutely no guarantee that these are worth pursuing.
I've implemented a variant of the read/write locks as used in the kernel to better suit being used as a replacement for the spinlock that protects all the global runqueue data. The problem with read/write locks is they favour readers over writers, and you cannot upgrade or downgrade locks. Once you have grabbed them, if you drop the lock and then try to grab the other variant (eg. going from read to write), the data is no longer guaranteed to be the same. What I've put together (and I'm not remotely suggesting this is my original idea, I'm sure it's been done elsewhere) are upgradeable read/write locks where you can take either the read lock, write lock, or an upgradeable variant. Upgradeable locks can be up or downgraded to write or read locks, and write locks can be downgraded to read locks, but read locks can not be changed. These URW locks are unfortunately more overhead than either spinlocks or normal rwlocks since they're actually just taking combinations of spinlocks and rwlocks. However the overhead of the locks themself may be worthwhile if it allows you to convert otherwise locked data into sections of parallel read code.
So here is a patch that implements them and can be put into pretty much any recent linux kernel version:
urw-locks.patch
Note that patch does nothing by itself, but here is a patch to add on top of that one for BFS423 that modifies the global runqueue to use the urw locks and implements some degree of read/write separation that did not exist with the regular spinlock:
bfs423-grq_urwlocks.patch
It's rather early code but I've given it a fairly thorough testing at home and it at least seems to be working as desired. On the simplest of preliminary testing I'm unable to demonstrate any throughput advantage on my quad core hardware, but the reassuring thing is I also find no disadvantage. Whether this translates to some advantage on 8x or 16x is something I don't have hardware to test for myself (hint to readers).
Note that even if this does not prove to be any significant throughput gain, then provided it is not harmful, I hope to eventually use it as a stepping stone to a grander plan I have for the locking and scalability. I don't like vapourware and since I haven't finalised the details myself exactly how I would implement them, there's nothing more to say for now. Then there's the time issue... there never seems to be enough, but I only ever hack for fun so it's no problem really.
P.S. Don't let this long post make you not notice that BFS 423 and 3.4-ck2 are also out in the previous post.
Unfortunately time and realistic limitations means I'm unlikely to ever do anything on a grand scale or be able to support it. However I had ideas for how to change the locking long-term but it would require lots of incremental steps. After all that rambling, this post is about the first step to changing it. Like some of the experimental steps of the past (such as skip lists), there is absolutely no guarantee that these are worth pursuing.
I've implemented a variant of the read/write locks as used in the kernel to better suit being used as a replacement for the spinlock that protects all the global runqueue data. The problem with read/write locks is they favour readers over writers, and you cannot upgrade or downgrade locks. Once you have grabbed them, if you drop the lock and then try to grab the other variant (eg. going from read to write), the data is no longer guaranteed to be the same. What I've put together (and I'm not remotely suggesting this is my original idea, I'm sure it's been done elsewhere) are upgradeable read/write locks where you can take either the read lock, write lock, or an upgradeable variant. Upgradeable locks can be up or downgraded to write or read locks, and write locks can be downgraded to read locks, but read locks can not be changed. These URW locks are unfortunately more overhead than either spinlocks or normal rwlocks since they're actually just taking combinations of spinlocks and rwlocks. However the overhead of the locks themself may be worthwhile if it allows you to convert otherwise locked data into sections of parallel read code.
So here is a patch that implements them and can be put into pretty much any recent linux kernel version:
urw-locks.patch
Note that patch does nothing by itself, but here is a patch to add on top of that one for BFS423 that modifies the global runqueue to use the urw locks and implements some degree of read/write separation that did not exist with the regular spinlock:
bfs423-grq_urwlocks.patch
It's rather early code but I've given it a fairly thorough testing at home and it at least seems to be working as desired. On the simplest of preliminary testing I'm unable to demonstrate any throughput advantage on my quad core hardware, but the reassuring thing is I also find no disadvantage. Whether this translates to some advantage on 8x or 16x is something I don't have hardware to test for myself (hint to readers).
Note that even if this does not prove to be any significant throughput gain, then provided it is not harmful, I hope to eventually use it as a stepping stone to a grander plan I have for the locking and scalability. I don't like vapourware and since I haven't finalised the details myself exactly how I would implement them, there's nothing more to say for now. Then there's the time issue... there never seems to be enough, but I only ever hack for fun so it's no problem really.
P.S. Don't let this long post make you not notice that BFS 423 and 3.4-ck2 are also out in the previous post.
bfs 0.423, 3.4-ck2
A couple of issues showed up with BFS 0.422, one being the "0 load" bug and the other being a build issue on non-hotplug releases. So here is BFS 0.423 and 3.4-ck2 (which is just ck1 with the BFS update) which should fix those:
3.4-sched-bfs-423.patch
3.4-ck2/
and the increment only:
3.4bfs422-423.patch
Enjoy!
お楽しみください
3.4-sched-bfs-423.patch
3.4-ck2/
and the increment only:
3.4bfs422-423.patch
Enjoy!
お楽しみください
Saturday, 2 June 2012
BFS 0.422, 3.4.0-ck1
Announcing the release of BFS for 3.4, along with the complete -ck1 patch.
BFS alone:
3.4-sched-bfs-422.patch
Full 3.4-ck1 patches:
3.4-ck1
Alas I was unable to keep the 420 number for BFS due to a number of minor changes. I also incremented the number beyond the unofficial 421 patch put to lkml so there was no confusion. The only changes are that some trivial display accounting fixes were added, along with forcing SLUB in the config by default as other SLAB allocators crash with BFS (you should all be using SLUB anyway). The rest of the BFS changes are a resync with the new code going into linux 3.4, along with more merging of code from mainline into BFS where suitable. Note that I have adopted the mainline approach of dealing with unplugged I/O. Previously I had spent a lot of time making it work with BFS for those who remember that period of instability, so hopefully the mainline approach will work seamlessly now (since mainline ended up having the same bug but it was harder to reproduce).
3.4-ck1 is just a resync of the remainder of the patches from 3.3-ck1.
Enjoy!
お楽しみください
EDIT: If you build on SMP without enabling CPU hotplug you will need this patch on top for BFS to build:
bfs422-nohotplug_fix.patch
BFS alone:
3.4-sched-bfs-422.patch
Full 3.4-ck1 patches:
3.4-ck1
Alas I was unable to keep the 420 number for BFS due to a number of minor changes. I also incremented the number beyond the unofficial 421 patch put to lkml so there was no confusion. The only changes are that some trivial display accounting fixes were added, along with forcing SLUB in the config by default as other SLAB allocators crash with BFS (you should all be using SLUB anyway). The rest of the BFS changes are a resync with the new code going into linux 3.4, along with more merging of code from mainline into BFS where suitable. Note that I have adopted the mainline approach of dealing with unplugged I/O. Previously I had spent a lot of time making it work with BFS for those who remember that period of instability, so hopefully the mainline approach will work seamlessly now (since mainline ended up having the same bug but it was harder to reproduce).
3.4-ck1 is just a resync of the remainder of the patches from 3.3-ck1.
Enjoy!
お楽しみください
EDIT: If you build on SMP without enabling CPU hotplug you will need this patch on top for BFS to build:
bfs422-nohotplug_fix.patch
Saturday, 24 March 2012
3.3.0-ck1
New -ck version for the latest mainline linux kernel, 3.3.0:
3.3-ck1
Changes since 3.2.0-ck1:
New BFS version 0.420 AKA smoking as discussed here:
bfs-420-aka-smoking-for-linux-33
Includes one build bugfix for UP compared to that first release candidate.
Other changes:
These patches have been dropped:
-mm-background_scan.patch
-mm-lru_cache_add_lru_tail-2.patch
The Virtual Memory subsystem has changed so much it's hard to know if these patches do what they originally intended to do, nor if they are helpful any more. In the absence of being able to test their validity, it seems safer to just drop them.
The rest of the patchset is just a resync.
Enjoy!
お楽しみ下さい
3.3-ck1
Changes since 3.2.0-ck1:
New BFS version 0.420 AKA smoking as discussed here:
bfs-420-aka-smoking-for-linux-33
Includes one build bugfix for UP compared to that first release candidate.
Other changes:
These patches have been dropped:
-mm-background_scan.patch
-mm-lru_cache_add_lru_tail-2.patch
The Virtual Memory subsystem has changed so much it's hard to know if these patches do what they originally intended to do, nor if they are helpful any more. In the absence of being able to test their validity, it seems safer to just drop them.
The rest of the patchset is just a resync.
Enjoy!
お楽しみ下さい
Thursday, 22 March 2012
BFS 420 AKA smoking for linux 3.3
Here's the release candidate for the first stable release of BFS for linux 3.3. BFS hadn't received much attention lately so I felt a little guilty and worked on a few things that had been nagging me:
3.3-sched-bfs-420.patch
It is not clear whether the bug which I posted about here is present any more or not, given that it is extremely rare and hard to reproduce it is impossible to know for sure. A number of minor issues were addressed in the code which hopefully have fixed it (and hardly anyone was affected anyway).
The changes since BFS version 0.416 include a fairly large architectural change just to bring the codebase in sync with 3.3, but none of the changes should be noticeable in any way. One change that may be user-visible is that the high resolution IRQ accounting now appears to be on by default for x86 architectures. There is an issue that system time accounting is wrong without this feature enabled in BFS so this should correct that problem.
Other changes:
416-417: A number of ints were changed to bool which though unlikely to have any performance impact, do make the code cleaner and the compiled code does often come out different. rq_running_iso was converted from a function to macro to avoid it being a separate function call when compiled in with the attendant overhead. requeue_task within the scheduler tick was moved to being done under lock which may prevent rare races. test_ret_isorefractory() was optimised. set_rq_task() was not being called on tasks that were being requeued within schedule() which could possibly have led to issues if the task ran out of timeslice during that requeue and should have had its deadline offset. The need_resched() check that occurs at the end of schedule() was changed to unlikely() since it really is that. Moved the scheduler version print function to bfs.c to avoid recompiling the entire kernel if the version number is changed.
417-418: Fixed a problem with the accounting resync for linux 3.3.
418-419: There was a small possibility that an unnecessary resched would occur in try_preempt if a task had changed affinity and called try_preempt with its ->cpu still set to the old cpu it could no longer run on, so try_preempt was reworked slightly. Reintroduced the deadline offset based on CPU cache locality on sticky tasks in a way that was cheaper than we currently offset the deadline.
419-420: Finally rewrote the earliest_deadline_task code. This has long been one of the hottest code paths in the scheduler and small changes here that made it look nice would often slow it down. I spent quite a few hours reworking it to include less GOTOs while disassembling the code to make sure it was actually getting smaller with every change. Then I wrote a scheduler specific version of find_next_bit which could be inlined into this code and avoid another function call in the hot path. The overall behaviour is unchanged from previous BFS versions, but initial benchmarking confirms slight improvements in throughput.
Now I'll leave it open for wider testing confirming it's all good and then I have to think about what I should do with the full -ck patchset. As I've said on numerous posts before, I'm no longer sure about the validity of some of the patches in the set with all the changes to the virtual memory subsystem in the mainline kernel.
3.3-sched-bfs-420.patch
It is not clear whether the bug which I posted about here is present any more or not, given that it is extremely rare and hard to reproduce it is impossible to know for sure. A number of minor issues were addressed in the code which hopefully have fixed it (and hardly anyone was affected anyway).
The changes since BFS version 0.416 include a fairly large architectural change just to bring the codebase in sync with 3.3, but none of the changes should be noticeable in any way. One change that may be user-visible is that the high resolution IRQ accounting now appears to be on by default for x86 architectures. There is an issue that system time accounting is wrong without this feature enabled in BFS so this should correct that problem.
Other changes:
416-417: A number of ints were changed to bool which though unlikely to have any performance impact, do make the code cleaner and the compiled code does often come out different. rq_running_iso was converted from a function to macro to avoid it being a separate function call when compiled in with the attendant overhead. requeue_task within the scheduler tick was moved to being done under lock which may prevent rare races. test_ret_isorefractory() was optimised. set_rq_task() was not being called on tasks that were being requeued within schedule() which could possibly have led to issues if the task ran out of timeslice during that requeue and should have had its deadline offset. The need_resched() check that occurs at the end of schedule() was changed to unlikely() since it really is that. Moved the scheduler version print function to bfs.c to avoid recompiling the entire kernel if the version number is changed.
417-418: Fixed a problem with the accounting resync for linux 3.3.
418-419: There was a small possibility that an unnecessary resched would occur in try_preempt if a task had changed affinity and called try_preempt with its ->cpu still set to the old cpu it could no longer run on, so try_preempt was reworked slightly. Reintroduced the deadline offset based on CPU cache locality on sticky tasks in a way that was cheaper than we currently offset the deadline.
419-420: Finally rewrote the earliest_deadline_task code. This has long been one of the hottest code paths in the scheduler and small changes here that made it look nice would often slow it down. I spent quite a few hours reworking it to include less GOTOs while disassembling the code to make sure it was actually getting smaller with every change. Then I wrote a scheduler specific version of find_next_bit which could be inlined into this code and avoid another function call in the hot path. The overall behaviour is unchanged from previous BFS versions, but initial benchmarking confirms slight improvements in throughput.
Now I'll leave it open for wider testing confirming it's all good and then I have to think about what I should do with the full -ck patchset. As I've said on numerous posts before, I'm no longer sure about the validity of some of the patches in the set with all the changes to the virtual memory subsystem in the mainline kernel.
Tuesday, 20 March 2012
BFS issues and linux 3.3
Linux 3.3 is out, but I'm not releasing BFS for it at this stage. The reason is that a regression has been reported as showing up in BFS and it's proving hard to track down.
The issue involves a slowdown under load when the load is niced. The problem is the slowdown does not occur all the time, and I've only seen it in the wild once and have not been able to repeat it since. I've audited the code and have not yet found the culprit, but when it does happen, it is very obvious with mouse stalling for seconds. The 'top' output is usually a give away that something has gone wrong because the 'PR' column should normally report values of 1-41 for a nice 19 load. However when it happens, it will show values much higher, in the 42-81 range which should not happen. Unfortunately, the best hint for me would be to find what version of BFS it was introduced, and look for the change responsible, and since I can't even reproduce the problem most of the time, I can't do this regression testing.
So I'm appealing to the BFS users out there to see if anyone has this problem more regularly that has the time to try older versions of BFS. By older versions of BFS, I don't mean the same version of BFS on older kernels, but to try the first version of BFS that was available for that kernel. Running something continuously as a 'nice'd load is required to reproduce it, where the load is equal to the number of CPUs, so for example 'nice -19 make -j4' continuously in a kernel tree on a quad core machine. I'm hoping that someone out there is able to reproduce it and can do the regression testing. Thanks in advance. In the meantime, I'll keep auditing code and comparing new to old versions in the hope something stands out.
EDIT: An alternative approach was to try moving to 3.3 and make minor fixes along the way to see if the problem persists. Consider this patch a pre-release for now (CPU accounting appears all screwy still):
EDIT2: Fixed CPU accounting, bumped version to 418:
3.3-sched-bfs-418.patch
The issue involves a slowdown under load when the load is niced. The problem is the slowdown does not occur all the time, and I've only seen it in the wild once and have not been able to repeat it since. I've audited the code and have not yet found the culprit, but when it does happen, it is very obvious with mouse stalling for seconds. The 'top' output is usually a give away that something has gone wrong because the 'PR' column should normally report values of 1-41 for a nice 19 load. However when it happens, it will show values much higher, in the 42-81 range which should not happen. Unfortunately, the best hint for me would be to find what version of BFS it was introduced, and look for the change responsible, and since I can't even reproduce the problem most of the time, I can't do this regression testing.
So I'm appealing to the BFS users out there to see if anyone has this problem more regularly that has the time to try older versions of BFS. By older versions of BFS, I don't mean the same version of BFS on older kernels, but to try the first version of BFS that was available for that kernel. Running something continuously as a 'nice'd load is required to reproduce it, where the load is equal to the number of CPUs, so for example 'nice -19 make -j4' continuously in a kernel tree on a quad core machine. I'm hoping that someone out there is able to reproduce it and can do the regression testing. Thanks in advance. In the meantime, I'll keep auditing code and comparing new to old versions in the hope something stands out.
EDIT: An alternative approach was to try moving to 3.3 and make minor fixes along the way to see if the problem persists. Consider this patch a pre-release for now (CPU accounting appears all screwy still):
EDIT2: Fixed CPU accounting, bumped version to 418:
3.3-sched-bfs-418.patch
Monday, 16 January 2012
3.2-ck1
These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but suitable to any commodity hardware workload.
This is an upgrade to BFS 416, trivial bugfixes and a resync from 3.1.0-ck1 . I've changed the versioning to not specify a 3 point release since it usually applies to successive 3 point releases for some time.
Apply to 3.2.x:
patch-3.2-ck1.bz2
Broken out tarball:
3.2-ck1-broken-out.tar.bz2
Discrete patches:
patches
BFS by itself:
bfs
Web:
kernel.kolivas.org
Code blog when I feel like it:
ck-hack.blogspot.com
Each discrete patch usually contains a brief description of what it does at
the top of the patch itself.
Full patchlist:
3.2-sched-bfs-416.patch
mm-minimal_swappiness.patch
mm-enable_swaptoken_only_when_swap_full.patch
mm-drop_swap_cache_aggressively.patch
mm-kswapd_inherit_prio-1.patch
mm-background_scan.patch
mm-idleprio_prio-1.patch
mm-lru_cache_add_lru_tail-2.patch
mm-decrease_default_dirty_ratio-1.patch
kconfig-expose_vmsplit_option.patch
hz-default_1000.patch
hz-no_default_250.patch
hz-raise_max.patch
preempt-desktop-tune.patch
ck1-version.patch
Enjoy!
お楽しみください
--ck
P.S. What this really means is I finished playing zelda.
This is an upgrade to BFS 416, trivial bugfixes and a resync from 3.1.0-ck1 . I've changed the versioning to not specify a 3 point release since it usually applies to successive 3 point releases for some time.
Apply to 3.2.x:
patch-3.2-ck1.bz2
Broken out tarball:
3.2-ck1-broken-out.tar.bz2
Discrete patches:
patches
BFS by itself:
bfs
Web:
kernel.kolivas.org
Code blog when I feel like it:
ck-hack.blogspot.com
Each discrete patch usually contains a brief description of what it does at
the top of the patch itself.
Full patchlist:
3.2-sched-bfs-416.patch
mm-minimal_swappiness.patch
mm-enable_swaptoken_only_when_swap_full.patch
mm-drop_swap_cache_aggressively.patch
mm-kswapd_inherit_prio-1.patch
mm-background_scan.patch
mm-idleprio_prio-1.patch
mm-lru_cache_add_lru_tail-2.patch
mm-decrease_default_dirty_ratio-1.patch
kconfig-expose_vmsplit_option.patch
hz-default_1000.patch
hz-no_default_250.patch
hz-raise_max.patch
preempt-desktop-tune.patch
ck1-version.patch
Enjoy!
お楽しみください
--ck
P.S. What this really means is I finished playing zelda.
Monday, 9 January 2012
Towards Transparent CPU Scheduling
Of BFS related interest is an excellent thesis by Joseph T. Meehean
entitled "Towards Transparent CPU Scheduling". Of particular note is
the virtually deterministic nature of BFS, especially in fairness and
latency. While this of course interests me greatly because of
extensive testing of the BFS CPU scheduler, there are many aspects of
both the current CFS CPU scheduler and the older O(1) CPU scheduler
that are discussed that anyone working on issues to do with
predictability, scalability, fairness and latency should read.
http://research.cs.wisc.edu/ wind/Publications/meehean- thesis11.html
We have made two advances in the area of applying scientific analysis to CPU schedulers. The first, CPU Futures, is a combination of predictive scheduling models embedded into the CPU scheduler and user-space controller that steers applications using feedback from these models. We have developed these predictive models for two different Linux schedulers (CFS and O(1)), based on two different scheduling paradigms (timesharing and proportional-share). Using three different case studies, we demonstrate that applications can use our predictive models to reduce interference from low-importance applications by over 70%, reduce web server starvation by an order of magnitude, and enforce scheduling policies that contradict the CPU scheduler's.
Harmony, our second contribution, is a framework and set of experiments for extracting multiprocessor scheduling policy from commodity operating systems. We used this tool to extract and analyze the policies of three Linux schedulers: O(1), CFS, and BFS. These schedulers often implement strikingly different policies. At the high level, the O(1) scheduler carefully selects processes for migration and strongly values processor affinity. In contrast, CFS continuously searches for a better balance and, as a result, selects processes for migration at random. BFS strongly values fairness and often disregards processor affinity.
entitled "Towards Transparent CPU Scheduling". Of particular note is
the virtually deterministic nature of BFS, especially in fairness and
latency. While this of course interests me greatly because of
extensive testing of the BFS CPU scheduler, there are many aspects of
both the current CFS CPU scheduler and the older O(1) CPU scheduler
that are discussed that anyone working on issues to do with
predictability, scalability, fairness and latency should read.
http://research.cs.wisc.edu/
Abstract:
In this thesis we propose using the scientific method to develop a deeper understanding of CPU schedulers; we use this approach to explain and understand the sometimes erratic behavior of CPU schedulers. This approach begins with introducing controlled workloads into commodity operating systems and observing the CPU scheduler's behavior. From these observations we are able to infer the underlying CPU scheduling policy and create models that predict scheduling behavior.We have made two advances in the area of applying scientific analysis to CPU schedulers. The first, CPU Futures, is a combination of predictive scheduling models embedded into the CPU scheduler and user-space controller that steers applications using feedback from these models. We have developed these predictive models for two different Linux schedulers (CFS and O(1)), based on two different scheduling paradigms (timesharing and proportional-share). Using three different case studies, we demonstrate that applications can use our predictive models to reduce interference from low-importance applications by over 70%, reduce web server starvation by an order of magnitude, and enforce scheduling policies that contradict the CPU scheduler's.
Harmony, our second contribution, is a framework and set of experiments for extracting multiprocessor scheduling policy from commodity operating systems. We used this tool to extract and analyze the policies of three Linux schedulers: O(1), CFS, and BFS. These schedulers often implement strikingly different policies. At the high level, the O(1) scheduler carefully selects processes for migration and strongly values processor affinity. In contrast, CFS continuously searches for a better balance and, as a result, selects processes for migration at random. BFS strongly values fairness and often disregards processor affinity.
Subscribe to:
Posts (Atom)
