Here's an updated BFS/CK which includes the one test patch I put on this blog after 463 and another trivial fix for the previous release. The patch fixed a lot of regressions including hangs with BTRFS and panics on shutdown.
BFS by itself:
4.1-sched-bfs-464.patch
-ck branded linux-4.1-ck1 patches:
4.1-ck2 patches
Enjoy!
お楽しみください
Thank you. I am using the patch with 4.1.4 in PCLinuxOS and so far, everything is working well.
ReplyDeleteGalen
Hi Con,
ReplyDeletemany, many THANKS. It seems, that with BFS version 0.464 the stability is back. Until now, no crashes during heavy IO on my server machine. Will test it on my laptop asap.
CU Mike
And no problems with ZEN kernel 4.1.5 and BFS on my laptop under heavy IO.
DeleteSo again, thanks Con.
CU Mike
Great Work!
ReplyDeleteI don't know if this would bother Con too much?! Anyways...
ReplyDeleteCan someone of you, having had troubles with BFS 463/ 41-ck1, try Alfred Chen's -gc branch in comparison, to see, if the issues persists with it?
The only patches you'd need are these two:
No.1: https://bitbucket.org/alfredchen/linux-gc/downloads/bfs_enhancement_v4.1_0463_1.patch
No.2: https://bitbucket.org/alfredchen/linux-gc/downloads/4.1_0463_1_rcu_stall_fix.patch
I would be glad, if we'd find some testers for this. Thank you in advance for reporting back, and
best regards,
Manuel Krause
Bare -gc branch fails at least for me under I/O load, so I'm testing latest BFS update from Con.
DeleteThanks for this Con.
ReplyDeleteFor me BFS works noticable better for QuakeLive (latency really metter for this game) using wine but only if I disable lowpower c-states of the cpu. Something like this:
echo 1 |sudo tee /sys/devices/system/cpu/cpu*/cpuidle/state4/disable
echo 1 |sudo tee /sys/devices/system/cpu/cpu*/cpuidle/state3/disable
echo 1 |sudo tee /sys/devices/system/cpu/cpu*/cpuidle/state2/disable
Maybe this is also usefull for benchmarking the scheduler....
Thanks a lot Con! -ck2 is stable for me with 4.1.5 and 4.1.6, it's the first stable version since 3.16.7-ck2.
ReplyDeleteI think there is a bug in bfs since I can't reproduce this in cfs.
ReplyDeleteAfter some time one of my cores stops entering low-power c3/c6 states.
Here is an example output of the 'turbostat --debug' command:
Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CoreTmp Pkg%pc3 Pkg%pc6
- - 29 1.96 1489 2394 0 50.47 0.23 47.35 56 0.00 0.00
0 0 29 1.30 2213 2394 6 3.56 0.46 94.69 55 0.00 0.00
0 2 46 3.48 1331 2394 6 1.37
2 1 31 2.45 1287 2394 6 97.55 0.00 0.00 56
2 3 10 0.61 1656 2394 6 99.38
As you can see core 2 never enters c3/c6 states. This almost doubles the power consumption of my laptop. Before this happens the power consumption is pretty equal to the consumption of cfs. I can trigger this almost reliably by touching a file in a kernel tree a doing a make -j4 while firefox + emacs are opened. Can someone reproduce this? Any suggestions how to debug this?
On more observation.
DeleteIf start doing I/O e.g. 'cat /dev/zero >tmp/zzzzzzzz' the core starts to enter the c3/c6 states.
more precise...
DeleteThe core starts entering the low-power states only during the I/O. If I/O stops the core stops entering the low-power states again
So now I can trigger this reliably.
DeleteAfter a reboot I need just to 'cat /dev/zero >tmp/zzzzzzzz' and one of the cores stops entering the low-power states
it doesn't happen if I revert bfs463-revert-unplugged.patch
ReplyDeleteCurrently I am on the unplugged_io issue and come up with a trial patch for testing, @pf and @kernelOfTruth have tested it and gave positive result.
DeleteYou can try it and see if it help with your C3/C6 state issue and unplugged_io issue(if you have). The patch is at https://bitbucket.org/alfredchen/linux-gc/downloads/sched_submit_work_02.patch
Thx,
Deleteafter reverting bfs463-revert-unplugged.patch on BFS464 the C3/C6 problem disappeared for me. As far as I can see your patch ads in addition to reverting bfs463-revert-unplugged.patch some more checks to sched_submit_work. I guess this is meant to solve the freezes which some people had during io on btrfs. I'm using ext4 and I don't have any freezes so far. I seems tha only people on btrfs have issues so it could be a btrfs bug.
@Anonymous:
DeleteThen maybe you wanna read this:
http://cchalpha.blogspot.de/2015/08/the-bfs-unpluged-io-issue.html
Best regards,
Manuel Krause
Hi all,
ReplyDeleteI am cautiously asking if anyone has experienced problems with the nvidia blob. I have been running into freezes with 4.1.6 + bfs 464 and nvidia 352.30. I don't know why they suddenly happened - i've had temporary freezes in the past but they went away. This time round i upgraded a lot to try and fix it (Xorg is now 1.17.2 and gcc is 4.9.3) and rebuilt kernel and modules. Afterwards the freezes were worse and ended in a complete UI freeze. Via SSH i could still work in system. I noticed the system log had the following entries in correspondence with the freeze events:
[ 1898.883851] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000010
[ 1900.889007] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Now i'm pretty sure the nvidia blob is to blame - however it is scheduler related and does so far not occur with the vanilly kernel (i am still testing).
Any ideas welcome.
Martin
My porting of bfs to 4.2 and other enhancements upon it has been done at http://cchalpha.blogspot.com/2015/09/42-sync-up-completed-for-gc-branch.html
ReplyDeleteYou can try it to have fun with bfs in 4.2 before ck's next update.
BR Alfred
Thx.
ReplyDeleteRuns flawlessly.
4.1.9
Multiple people report system freezes when running -ck patched 4.1.9 and 4.1.10 kernels. There is little information to provide because nothing related to the freezes is written to system logs. See here: https://bbs.archlinux.org/viewtopic.php?pid=1566879#p1566879 (linked post and others after it in the thread)
ReplyDeleteHi,
ReplyDeleteI got RCU stalls on 4.1.8, 4.1.9.
Remote server not responding anymore.
Even with no task running.
Happens randomly.
[35381.296473] INFO: rcu_preempt self-detected stall on CPU
[35381.296490] 2: (1 GPs behind) idle=5a3/2/0 softirq=211619/211621 fqs=5480121
[35381.296518] (t=16440365 jiffies g=10239 c=10238 q=86010)
[35381.296535] Task dump for CPU 2:
[35381.296536] BFS/2 R running task 0 0 1 0x00000008
[35381.296537] 0000000000000003 ffffffff816244c0 ffffffff8106919f 00000000000027ff
[35381.296538] ffff88041fb14500 ffffffff816244c0 ffffffff816244c0 ffffffff8165b520
[35381.296539] ffffffff8106c668 ffff88041fb03bf8 ffff88041fb14500 ffff88041fb03c08
[35381.296541] Call Trace:
[35381.296541] [] ? rcu_dump_cpu_stacks+0x7f/0xc0
[35381.296544] [] ? rcu_check_callbacks+0x488/0x870
[35381.296545] [] ? rcu_check_callbacks+0x174/0x870
[35381.296546] [] ? tick_init_highres+0x10/0x10
[35381.296548] [] ? update_process_times+0x31/0x60
[35381.296549] [] ? tick_sched_timer+0x41/0x160
[35381.296550] [] ? tick_init_highres+0x10/0x10
[35381.296551] [] ? __run_hrtimer.isra.37+0x44/0xf0
[35381.296552] [] ? hrtimer_interrupt+0xd5/0x210
[35381.296554] [] ? smp_apic_timer_interrupt+0x35/0x50
[35381.296555] [] ? apic_timer_interrupt+0x68/0x70
[35381.296556] [] ? _raw_spin_unlock_irqrestore+0x6/0x20
[35381.296557] [] ? try_to_del_timer_sync+0x3f/0x60
[35381.296558] [] ? del_timer_sync+0x3a/0x50
[35381.296559] [] ? del_timer_sync+0x42/0x50
[35381.296561] [] ? inet_csk_reqsk_queue_drop+0x71/0x1e0
[35381.296562] [] ? reqsk_timer_handler+0x11b/0x270
[35381.296564] [] ? inet_csk_reqsk_queue_drop+0x1e0/0x1e0
[35381.296565] [] ? call_timer_fn.isra.28+0x15/0x70
[35381.296566] [] ? inet_csk_reqsk_queue_drop+0x1e0/0x1e0
[35381.296567] [] ? run_timer_softirq+0x1b0/0x240
[35381.296569] [] ? __do_softirq+0xd4/0x1f0
[35381.296570] [] ? irq_exit+0x55/0x60
[35381.296572] [] ? smp_apic_timer_interrupt+0x3a/0x50
[35381.296573] [] ? apic_timer_interrupt+0x68/0x70
[35381.296574] [] ? cpuidle_enter_state+0x93/0x140
[35381.296576] [] ? cpuidle_enter_state+0x8c/0x140
[35381.296577] [] ? cpu_startup_entry+0x201/0x280
EDIT: ^^^4.1.9 only, sorry.
DeleteThis is a problem in 4.1.9 & 4.1.10 and has *nothing* to do with BFS. See http://www.spinics.net/lists/kernel/msg2087851.html for a fix.
DeleteOk.
DeleteMy bad.
Thanks.
^^ changed this one line in the source.
DeleteRunning stable so far.
Thanks again.
^ Nothing said.
Just a heads up that 4.1.12 will break compilation of BFS because of a change in the common scheduler API. I already sent a patch to Con to fix it, so please don't panic. :)
ReplyDelete@holger:
DeleteIt would be more than fair from you to also post/ upload your patch for other users on here and show an address to download from.
Thx.
I've posted a version for/in Alfred Chen's branch several days ago.
BR Manuel Krause
This comment has been removed by the author.
DeleteSo 4.1.12 is finally out. You can find the patch at:
ReplyDeletehttps://raw.githubusercontent.com/hhoffstaette/kernel-patches/master/4.1/bfs-009-add-preempt_offset-argument-to-should_resched%28%29.patch
Simply apply this on top.
Thank you for your needless double work!
DeleteBR Manuel Krause
Looks like it's working just fine:
ReplyDeletehttps://github.com/sirlucjan/aur/tree/master/linux-bfs
Of course, it's working, it's just a back-copy of mainline changes that had been tested days ago.
ReplyDeleteBR Manuel Krause