A couple of bug reports mostly related to disk I/O seem to have cropped up with BFS 423/3.4-ck2. The likely culprit seems to be the plugged I/O management within schedule() that I modified going from BFS 420 to BFS 423, when I adopted mainline's approach to managing the plugged I/O. It appears that the mechanism I had put in place for BFS was the correct one, and mainline's approach does not work (for BFS) so I've backed out that change and increased the version number. Here is the test patch:
bfs423-424.patch
Those with issues of any sort related to BFS 423 or ck2, please test this patch on top of the previous BFS patched kernel. Thanks!
A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, MuQSS, BFS and -ck.
Sunday, 1 July 2012
Monday, 11 June 2012
Upgradeable rwlocks and BFS
I've been experimenting with improving the locking scheme in BFS and had a few ideas. I'm particularly attached to the global runqueue that makes BFS what it is and obviously having one queue that has one lock protecting all its data structures accessed by multiple CPUs will end up having quite significant scalability limits with many CPUs. Much like a lot of the scalability work, it tends to have the opposite effect with smaller hardware (i.e. the more scalable you make something, the more it will tend to harm the smaller hardware). Fortunately most of the scalability issues with BFS are pretty much irrelevant until you have more than 16 logical CPUs, but we've now reached the point where 8 logical CPUs is not unusual for a standard PC. Whether or not we actually need so many logical CPUs or not and can use them appropriately is an argument for a different time and place, but they're here. So I've always had at the back of my mind how I might go about making BFS more scalable in the long term and locking is one obvious area.
Unfortunately time and realistic limitations means I'm unlikely to ever do anything on a grand scale or be able to support it. However I had ideas for how to change the locking long-term but it would require lots of incremental steps. After all that rambling, this post is about the first step to changing it. Like some of the experimental steps of the past (such as skip lists), there is absolutely no guarantee that these are worth pursuing.
I've implemented a variant of the read/write locks as used in the kernel to better suit being used as a replacement for the spinlock that protects all the global runqueue data. The problem with read/write locks is they favour readers over writers, and you cannot upgrade or downgrade locks. Once you have grabbed them, if you drop the lock and then try to grab the other variant (eg. going from read to write), the data is no longer guaranteed to be the same. What I've put together (and I'm not remotely suggesting this is my original idea, I'm sure it's been done elsewhere) are upgradeable read/write locks where you can take either the read lock, write lock, or an upgradeable variant. Upgradeable locks can be up or downgraded to write or read locks, and write locks can be downgraded to read locks, but read locks can not be changed. These URW locks are unfortunately more overhead than either spinlocks or normal rwlocks since they're actually just taking combinations of spinlocks and rwlocks. However the overhead of the locks themself may be worthwhile if it allows you to convert otherwise locked data into sections of parallel read code.
So here is a patch that implements them and can be put into pretty much any recent linux kernel version:
urw-locks.patch
Note that patch does nothing by itself, but here is a patch to add on top of that one for BFS423 that modifies the global runqueue to use the urw locks and implements some degree of read/write separation that did not exist with the regular spinlock:
bfs423-grq_urwlocks.patch
It's rather early code but I've given it a fairly thorough testing at home and it at least seems to be working as desired. On the simplest of preliminary testing I'm unable to demonstrate any throughput advantage on my quad core hardware, but the reassuring thing is I also find no disadvantage. Whether this translates to some advantage on 8x or 16x is something I don't have hardware to test for myself (hint to readers).
Note that even if this does not prove to be any significant throughput gain, then provided it is not harmful, I hope to eventually use it as a stepping stone to a grander plan I have for the locking and scalability. I don't like vapourware and since I haven't finalised the details myself exactly how I would implement them, there's nothing more to say for now. Then there's the time issue... there never seems to be enough, but I only ever hack for fun so it's no problem really.
P.S. Don't let this long post make you not notice that BFS 423 and 3.4-ck2 are also out in the previous post.
Unfortunately time and realistic limitations means I'm unlikely to ever do anything on a grand scale or be able to support it. However I had ideas for how to change the locking long-term but it would require lots of incremental steps. After all that rambling, this post is about the first step to changing it. Like some of the experimental steps of the past (such as skip lists), there is absolutely no guarantee that these are worth pursuing.
I've implemented a variant of the read/write locks as used in the kernel to better suit being used as a replacement for the spinlock that protects all the global runqueue data. The problem with read/write locks is they favour readers over writers, and you cannot upgrade or downgrade locks. Once you have grabbed them, if you drop the lock and then try to grab the other variant (eg. going from read to write), the data is no longer guaranteed to be the same. What I've put together (and I'm not remotely suggesting this is my original idea, I'm sure it's been done elsewhere) are upgradeable read/write locks where you can take either the read lock, write lock, or an upgradeable variant. Upgradeable locks can be up or downgraded to write or read locks, and write locks can be downgraded to read locks, but read locks can not be changed. These URW locks are unfortunately more overhead than either spinlocks or normal rwlocks since they're actually just taking combinations of spinlocks and rwlocks. However the overhead of the locks themself may be worthwhile if it allows you to convert otherwise locked data into sections of parallel read code.
So here is a patch that implements them and can be put into pretty much any recent linux kernel version:
urw-locks.patch
Note that patch does nothing by itself, but here is a patch to add on top of that one for BFS423 that modifies the global runqueue to use the urw locks and implements some degree of read/write separation that did not exist with the regular spinlock:
bfs423-grq_urwlocks.patch
It's rather early code but I've given it a fairly thorough testing at home and it at least seems to be working as desired. On the simplest of preliminary testing I'm unable to demonstrate any throughput advantage on my quad core hardware, but the reassuring thing is I also find no disadvantage. Whether this translates to some advantage on 8x or 16x is something I don't have hardware to test for myself (hint to readers).
Note that even if this does not prove to be any significant throughput gain, then provided it is not harmful, I hope to eventually use it as a stepping stone to a grander plan I have for the locking and scalability. I don't like vapourware and since I haven't finalised the details myself exactly how I would implement them, there's nothing more to say for now. Then there's the time issue... there never seems to be enough, but I only ever hack for fun so it's no problem really.
P.S. Don't let this long post make you not notice that BFS 423 and 3.4-ck2 are also out in the previous post.
bfs 0.423, 3.4-ck2
A couple of issues showed up with BFS 0.422, one being the "0 load" bug and the other being a build issue on non-hotplug releases. So here is BFS 0.423 and 3.4-ck2 (which is just ck1 with the BFS update) which should fix those:
3.4-sched-bfs-423.patch
3.4-ck2/
and the increment only:
3.4bfs422-423.patch
Enjoy!
お楽しみください
3.4-sched-bfs-423.patch
3.4-ck2/
and the increment only:
3.4bfs422-423.patch
Enjoy!
お楽しみください
Saturday, 2 June 2012
BFS 0.422, 3.4.0-ck1
Announcing the release of BFS for 3.4, along with the complete -ck1 patch.
BFS alone:
3.4-sched-bfs-422.patch
Full 3.4-ck1 patches:
3.4-ck1
Alas I was unable to keep the 420 number for BFS due to a number of minor changes. I also incremented the number beyond the unofficial 421 patch put to lkml so there was no confusion. The only changes are that some trivial display accounting fixes were added, along with forcing SLUB in the config by default as other SLAB allocators crash with BFS (you should all be using SLUB anyway). The rest of the BFS changes are a resync with the new code going into linux 3.4, along with more merging of code from mainline into BFS where suitable. Note that I have adopted the mainline approach of dealing with unplugged I/O. Previously I had spent a lot of time making it work with BFS for those who remember that period of instability, so hopefully the mainline approach will work seamlessly now (since mainline ended up having the same bug but it was harder to reproduce).
3.4-ck1 is just a resync of the remainder of the patches from 3.3-ck1.
Enjoy!
お楽しみください
EDIT: If you build on SMP without enabling CPU hotplug you will need this patch on top for BFS to build:
bfs422-nohotplug_fix.patch
BFS alone:
3.4-sched-bfs-422.patch
Full 3.4-ck1 patches:
3.4-ck1
Alas I was unable to keep the 420 number for BFS due to a number of minor changes. I also incremented the number beyond the unofficial 421 patch put to lkml so there was no confusion. The only changes are that some trivial display accounting fixes were added, along with forcing SLUB in the config by default as other SLAB allocators crash with BFS (you should all be using SLUB anyway). The rest of the BFS changes are a resync with the new code going into linux 3.4, along with more merging of code from mainline into BFS where suitable. Note that I have adopted the mainline approach of dealing with unplugged I/O. Previously I had spent a lot of time making it work with BFS for those who remember that period of instability, so hopefully the mainline approach will work seamlessly now (since mainline ended up having the same bug but it was harder to reproduce).
3.4-ck1 is just a resync of the remainder of the patches from 3.3-ck1.
Enjoy!
お楽しみください
EDIT: If you build on SMP without enabling CPU hotplug you will need this patch on top for BFS to build:
bfs422-nohotplug_fix.patch
Subscribe to:
Posts (Atom)