-ck hacking: BFS 0.427 for linux 3.7.x

Tuesday, 29 January 2013

BFS 0.427 for linux 3.7.x

Announcing an updated BFS patch for linux 3.7, version 0.427

Full patch:
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-sched-bfs-427.patch

Incremental patch from bfs 426 (applies to 3.7.x-ck1 as well):
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-bfs426-427.patch

The full set of incremental patches, along with a description within each patch is here:
http://ck.kolivas.org/patches/bfs/3.0/3.7/incremental/

A number of minor issues have been reported with BFS over time (interestingly none of them appear to be new). Some of them were cosmetic, like the reported suspicious rcu warning on startup, and the accounting for close to 100% bound cpu tasks flicking between 99 and 101%.

The most interesting actual bug was that clock_nanosleep, and timer_create would not work when used with the clock id of CLOCK_PROCESS_CPUTIME_ID. This is a timer which goes off based on the total CPU used by a process or its thread group, which I have never used myself nor was aware of its intricacies. This bug was only picked up as part of building and glibc testing by Olivier Langlois. This was an interesting bug for a number of reasons to me. First was that it had never manifested as far as I'm aware anywhere in the wild despite being a posix 2001 function, so presumably it is almost never used. Second is it's one of the few functions that tries to get accounting as a total of all the CPU used by a thread group rather than just per thread. Third is that you cannot really use clock_nanosleep with this clock id unless it is done from a separate thread to the one consuming CPU (since it puts the calling thread to sleep) so there would be precious few scenarios it would be in use currently, though coding multithreaded apps that use it for resource monitoring and control would make complete sense. Finally the most interesting part was I can now tell that it had been in BFS since its first release and no one had ever noticed as far as I'm aware.

Unfortunately it took me quite a while to find since I had to dig deep into figuring out how the whole system of timers works on a low level in the kernel before finally stumbling across one tiny piece of accounting/reporting that was missing on BFS. It's funny that a bug that directly affected almost no one should be so hard to track down. In the meantime it allowed me to tweak a number of bits of internal accounting so hopefully that should have improved as well.

Please enjoy.
お楽しみください

29 comments:

graysky29 January 2013 at 14:12
Thanks for your devotion and vigilance, CK. I will of course run a head-to-head comparison of 0.426 to 0.427 with the usual 'make' benchmark for you and post the results shortly.
ReplyDelete
Replies
Anonymous29 January 2013 at 17:30
Thanks so much for this new update!
ReplyDelete
Replies
Andre R.29 January 2013 at 23:11
Really love your kernel mods. Thank you for your hard work, Con.
ReplyDelete
Replies
graysky30 January 2013 at 07:02
No differences between v0.426 and v0.427. I can post the data if anyone is interested.
ReplyDelete
Replies
Oleksandr Natalenko30 January 2013 at 18:10
People complain about glibc inability to be compiled under -ck kernels: https://bbs.archlinux.org/viewtopic.php?id=154594

Any idea?
ReplyDelete
Replies
Anonymous31 January 2013 at 02:19
@graysky: Could you please also benchmark the latest BFQ released today, which includes the new Early Queue Merge feature? Thanks
ReplyDelete
Replies
Anonymous31 January 2013 at 02:34
Excellent work! And I love having correctly accounting of process times again!

I observe the nvidia-proprietary settings gui showing less heated graphics processor in my little mac mini pc !?

Greeting from cold and cloudy Hamburg,
Ralph Ulirch

ReplyDelete
Replies
Mangix9 February 2013 at 17:56
is there a patch anywhere for kernel 2.6.22?
ReplyDelete
Replies
Alfred Chen17 February 2013 at 17:15
Thanks for your updates, CK. As I reported suspicious_rcu_warn issue before, so I retest it with 427. But another dead lock issue is found. I will just post the log here.

[ 0.124430] ======================================================
[ 0.125331] [ INFO: possible circular locking dependency detected ]
[ 0.126233] 3.7.7+ #114 Not tainted
[ 0.126666] -------------------------------------------------------
[ 0.126666] BFS/0/1 is trying to acquire lock:
[ 0.126666] (sched_domains_mutex){+.+.+.}, at: [] sched_init_smp+0x129/0x2b2
[ 0.126666]
[ 0.126666] but task is already holding lock:
[ 0.126666] (&grq.lock){-.....}, at: [] sched_init_smp+0x101/0x2b2
[ 0.126666]
[ 0.126666] which lock already depends on the new lock.
[ 0.126666]
[ 0.126666]
[ 0.126666] the existing dependency chain (in reverse order) is:
ReplyDelete
Replies
Alfred Chen17 February 2013 at 19:46
Hi, CK. For the accounting problem, I am not sure if this one is the one you trying to solve. But I still find this with 427 with 3.7.7 kernel. To reproduce it:
1. compile the kernel source, (I used 'time make -j4')
2. use top to monitoring, some as processes will show with 9999 %CPU and a huge TIME+

like this

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15120 root 5 0 18860 5924 1168 S 9999 0.1 5124095h as
15111 root 7 0 50556 23m 3784 R 8 0.6 0:00.25 cc1
ReplyDelete
Replies
Anonymous19 February 2013 at 23:23
ck are you working on a 3.8 patch ?
I failed to merge the 3.7 patch into the new kernel. There seem to be some bigger changes in the scheduler interface. Probably related to the new rcu stuff... I'm eager to test this with bfs but no way to "brute force" the old code into the new kernel :)
ReplyDelete
Replies
Anonymous21 February 2013 at 09:04
BFQ v6 for the 3.8 kernel is out, as of 20130219, too:
http://algo.ing.unimo.it/people/paolo/disk_sched/patches/3.8.0-v6/

Best regards,
Manuel Krause
ReplyDelete
Replies
Anonymous25 February 2013 at 07:01
@ CK: Do you need help? Is your family well up? Please tell us, how we as community can support you, whereever needed!

Manuel Krause
ReplyDelete
Replies

Add comment