Here's an updated BFS/CK which includes the one test patch I put on this blog after 463 and another trivial fix for the previous release. The patch fixed a lot of regressions including hangs with BTRFS and panics on shutdown.
BFS by itself:
-ck branded linux-4.1-ck1 patches:
Thank you. I am using the patch with 4.1.4 in PCLinuxOS and so far, everything is working well.ReplyDelete
many, many THANKS. It seems, that with BFS version 0.464 the stability is back. Until now, no crashes during heavy IO on my server machine. Will test it on my laptop asap.
And no problems with ZEN kernel 4.1.5 and BFS on my laptop under heavy IO.Delete
So again, thanks Con.
I don't know if this would bother Con too much?! Anyways...ReplyDelete
Can someone of you, having had troubles with BFS 463/ 41-ck1, try Alfred Chen's -gc branch in comparison, to see, if the issues persists with it?
The only patches you'd need are these two:
I would be glad, if we'd find some testers for this. Thank you in advance for reporting back, and
Bare -gc branch fails at least for me under I/O load, so I'm testing latest BFS update from Con.Delete
Thanks for this Con.ReplyDelete
For me BFS works noticable better for QuakeLive (latency really metter for this game) using wine but only if I disable lowpower c-states of the cpu. Something like this:
echo 1 |sudo tee /sys/devices/system/cpu/cpu*/cpuidle/state4/disable
echo 1 |sudo tee /sys/devices/system/cpu/cpu*/cpuidle/state3/disable
echo 1 |sudo tee /sys/devices/system/cpu/cpu*/cpuidle/state2/disable
Maybe this is also usefull for benchmarking the scheduler....
Thanks a lot Con! -ck2 is stable for me with 4.1.5 and 4.1.6, it's the first stable version since 3.16.7-ck2.ReplyDelete
I think there is a bug in bfs since I can't reproduce this in cfs.ReplyDelete
After some time one of my cores stops entering low-power c3/c6 states.
Here is an example output of the 'turbostat --debug' command:
Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CoreTmp Pkg%pc3 Pkg%pc6
- - 29 1.96 1489 2394 0 50.47 0.23 47.35 56 0.00 0.00
0 0 29 1.30 2213 2394 6 3.56 0.46 94.69 55 0.00 0.00
0 2 46 3.48 1331 2394 6 1.37
2 1 31 2.45 1287 2394 6 97.55 0.00 0.00 56
2 3 10 0.61 1656 2394 6 99.38
As you can see core 2 never enters c3/c6 states. This almost doubles the power consumption of my laptop. Before this happens the power consumption is pretty equal to the consumption of cfs. I can trigger this almost reliably by touching a file in a kernel tree a doing a make -j4 while firefox + emacs are opened. Can someone reproduce this? Any suggestions how to debug this?
On more observation.Delete
If start doing I/O e.g. 'cat /dev/zero >tmp/zzzzzzzz' the core starts to enter the c3/c6 states.
The core starts entering the low-power states only during the I/O. If I/O stops the core stops entering the low-power states again
So now I can trigger this reliably.Delete
After a reboot I need just to 'cat /dev/zero >tmp/zzzzzzzz' and one of the cores stops entering the low-power states
it doesn't happen if I revert bfs463-revert-unplugged.patchReplyDelete
Currently I am on the unplugged_io issue and come up with a trial patch for testing, @pf and @kernelOfTruth have tested it and gave positive result.Delete
You can try it and see if it help with your C3/C6 state issue and unplugged_io issue(if you have). The patch is at https://bitbucket.org/alfredchen/linux-gc/downloads/sched_submit_work_02.patch
after reverting bfs463-revert-unplugged.patch on BFS464 the C3/C6 problem disappeared for me. As far as I can see your patch ads in addition to reverting bfs463-revert-unplugged.patch some more checks to sched_submit_work. I guess this is meant to solve the freezes which some people had during io on btrfs. I'm using ext4 and I don't have any freezes so far. I seems tha only people on btrfs have issues so it could be a btrfs bug.
Then maybe you wanna read this:
I am cautiously asking if anyone has experienced problems with the nvidia blob. I have been running into freezes with 4.1.6 + bfs 464 and nvidia 352.30. I don't know why they suddenly happened - i've had temporary freezes in the past but they went away. This time round i upgraded a lot to try and fix it (Xorg is now 1.17.2 and gcc is 4.9.3) and rebuilt kernel and modules. Afterwards the freezes were worse and ended in a complete UI freeze. Via SSH i could still work in system. I noticed the system log had the following entries in correspondence with the freeze events:
[ 1898.883851] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000010
[ 1900.889007] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Now i'm pretty sure the nvidia blob is to blame - however it is scheduler related and does so far not occur with the vanilly kernel (i am still testing).
Any ideas welcome.
My porting of bfs to 4.2 and other enhancements upon it has been done at http://cchalpha.blogspot.com/2015/09/42-sync-up-completed-for-gc-branch.htmlReplyDelete
You can try it to have fun with bfs in 4.2 before ck's next update.
Multiple people report system freezes when running -ck patched 4.1.9 and 4.1.10 kernels. There is little information to provide because nothing related to the freezes is written to system logs. See here: https://bbs.archlinux.org/viewtopic.php?pid=1566879#p1566879 (linked post and others after it in the thread)ReplyDelete
I got RCU stalls on 4.1.8, 4.1.9.
Remote server not responding anymore.
Even with no task running.
[35381.296473] INFO: rcu_preempt self-detected stall on CPU
[35381.296490] 2: (1 GPs behind) idle=5a3/2/0 softirq=211619/211621 fqs=5480121
[35381.296518] (t=16440365 jiffies g=10239 c=10238 q=86010)
[35381.296535] Task dump for CPU 2:
[35381.296536] BFS/2 R running task 0 0 1 0x00000008
[35381.296537] 0000000000000003 ffffffff816244c0 ffffffff8106919f 00000000000027ff
[35381.296538] ffff88041fb14500 ffffffff816244c0 ffffffff816244c0 ffffffff8165b520
[35381.296539] ffffffff8106c668 ffff88041fb03bf8 ffff88041fb14500 ffff88041fb03c08
[35381.296541] Call Trace:
[35381.296541]  ? rcu_dump_cpu_stacks+0x7f/0xc0
[35381.296544]  ? rcu_check_callbacks+0x488/0x870
[35381.296545]  ? rcu_check_callbacks+0x174/0x870
[35381.296546]  ? tick_init_highres+0x10/0x10
[35381.296548]  ? update_process_times+0x31/0x60
[35381.296549]  ? tick_sched_timer+0x41/0x160
[35381.296550]  ? tick_init_highres+0x10/0x10
[35381.296551]  ? __run_hrtimer.isra.37+0x44/0xf0
[35381.296552]  ? hrtimer_interrupt+0xd5/0x210
[35381.296554]  ? smp_apic_timer_interrupt+0x35/0x50
[35381.296555]  ? apic_timer_interrupt+0x68/0x70
[35381.296556]  ? _raw_spin_unlock_irqrestore+0x6/0x20
[35381.296557]  ? try_to_del_timer_sync+0x3f/0x60
[35381.296558]  ? del_timer_sync+0x3a/0x50
[35381.296559]  ? del_timer_sync+0x42/0x50
[35381.296561]  ? inet_csk_reqsk_queue_drop+0x71/0x1e0
[35381.296562]  ? reqsk_timer_handler+0x11b/0x270
[35381.296564]  ? inet_csk_reqsk_queue_drop+0x1e0/0x1e0
[35381.296565]  ? call_timer_fn.isra.28+0x15/0x70
[35381.296566]  ? inet_csk_reqsk_queue_drop+0x1e0/0x1e0
[35381.296567]  ? run_timer_softirq+0x1b0/0x240
[35381.296569]  ? __do_softirq+0xd4/0x1f0
[35381.296570]  ? irq_exit+0x55/0x60
[35381.296572]  ? smp_apic_timer_interrupt+0x3a/0x50
[35381.296573]  ? apic_timer_interrupt+0x68/0x70
[35381.296574]  ? cpuidle_enter_state+0x93/0x140
[35381.296576]  ? cpuidle_enter_state+0x8c/0x140
[35381.296577]  ? cpu_startup_entry+0x201/0x280
EDIT: ^^^4.1.9 only, sorry.Delete
This is a problem in 4.1.9 & 4.1.10 and has *nothing* to do with BFS. See http://www.spinics.net/lists/kernel/msg2087851.html for a fix.Delete
^^ changed this one line in the source.Delete
Running stable so far.
^ Nothing said.
Just a heads up that 4.1.12 will break compilation of BFS because of a change in the common scheduler API. I already sent a patch to Con to fix it, so please don't panic. :)ReplyDelete
It would be more than fair from you to also post/ upload your patch for other users on here and show an address to download from.
I've posted a version for/in Alfred Chen's branch several days ago.
BR Manuel Krause
This comment has been removed by the author.Delete
So 4.1.12 is finally out. You can find the patch at:ReplyDelete
Simply apply this on top.
Thank you for your needless double work!Delete
BR Manuel Krause
Looks like it's working just fine:ReplyDelete
Of course, it's working, it's just a back-copy of mainline changes that had been tested days ago.ReplyDelete
BR Manuel Krause