-ck hacking: BFS 0.441, 3.11-ck1

Monday, 9 September 2013

BFS 0.441, 3.11-ck1

Announcing a resync and update of the BFS CPU scheduler for linux-3.11

BFS by itself:
3.11-sched-bfs-441.patch

Full -ck1 patchset including separate patches:
3.11-ck1

Apart from the usual resync to keep up with the mainline churn, there are a few additions from BFS 0.440. A number of changes dealing with wake lists as done by mainline were added that were missing from the previous code. There is a good chance that these were responsible for a large proportion of the suspend/resume issues people were having with BFS post linux 3.8. Of course I can't guarantee that all issues have been resolved, but it has been far more stable in my testing so far.

The other significant change is to check for throttled CPUs when choosing an idle CPU to move a process to, which should impact the behaviour and possibly throughput when using a scaling CPU governor, such as ondemand.

Those of you still using the evil proprietary Nvidia binary driver (as I still do) will encounter some issues and will need to use a patched pre-release driver from them if you build it yourself, until they release a new driver.

That is all for now.

Enjoy!
お楽しみください

123 comments:

graysky9 September 2013 at 18:36
Thanks CK! Glad you were able to find some time to dig into the suspend problem. I know the linux-ck user base will gladly provide feedback.

I will preform the usual "make" benchmarks and post to my blog (linking here). Stay tuned.
ReplyDelete
Replies
Anonymous9 September 2013 at 18:38
"which should impact the behaviour and possibly throughput"

positively or negatively?
ReplyDelete
Replies
Anonymous9 September 2013 at 19:25
Thanks a lot for this update!
ReplyDelete
Replies
Oleksandr Natalenko9 September 2013 at 19:28
It works, but still needed the following patch to make kernel bootable:

https://gist.github.com/d2bfec5758573341d656
ReplyDelete
Replies
Anonymous9 September 2013 at 22:06
THANKS!

I appreciate your work.
ReplyDelete
Replies
Anonymous10 September 2013 at 01:59
Thank you for your work.

Quick question: for those of use using the evil binary driver from NVIDIA, what patched pre-relase driver are you refeering to?
ReplyDelete
Replies
CS10 September 2013 at 05:55
Sleep/suspend still doesn't work well on my machine. How could I debug?
I'm running Archlinux with graysky binaries. When the sleep procedure is initiated, the disks power down, but the PC remains with the fans on and a blinking white cursor on the screen.
ReplyDelete
Replies
Anonymous10 September 2013 at 08:03
Regarding the microsleep ondemand fix I would like to have a backport to linux-3.10
(This LTS one will be longer used by many gentoo users for example)

Greetings and thanks from Hamburg, Germany
Ralph Ulrich
ReplyDelete
Replies
Anonymous10 September 2013 at 22:15
Regarding the sleep/suspend issue, I have obtained the following message after an
albeit successful resume on a thinkpad R60:

WARNING: CPU: 1 PID: 1804 at kernel/trace/ring_buffer.c:2571 rb_reserve_next_event.isra.48+0x227/0x327()
Delta way too big! 18446741873297462717 ts=18446744063829990609 write stamp = 2190532527892
If you just came from a suspend/resume,
please switch to the trace global clock:
echo global > /sys/kernel/debug/tracing/trace_clock

Hope this may be of help
Greetings from Rome,
Simone
ReplyDelete
Replies
Anonymous12 September 2013 at 06:54
Will these changes be ported to kernel 3.10?

As it is a long term kernel I'd like to keep using it for a while.
ReplyDelete
Replies
Qba12 September 2013 at 19:23
I'm experiencing high battery drain on BFS. My laptop's battery depletes after ~2h on BFS compared to 4-5h on CFS. :(
ReplyDelete
Replies
Anonymous16 September 2013 at 19:08
Kernel 3.11 is soo good that I don't have to use BSF anymore...
ReplyDelete
Replies
Anonymous16 September 2013 at 19:08
BFS*
ReplyDelete
Replies
Unknown17 September 2013 at 08:21
This comment has been removed by the author.
ReplyDelete
Replies
Anonymous17 September 2013 at 12:13
3.11 with BFS & BFQ is worse than ever seen.

I've ported my setup to the new machine, 3.10.xyz and many things got better, but the usual problems remainded:
SWAP, SHM & their interaction got worse. Unusable with 3.11.

With 3.10.12 + both patchsets I don't have as much problems.

And I won't use 3.11.z any longer with your promoted patches, CK, until there is a usable kernel for them.

Please, also, prepare a backport of BFS-422 for the longtime 3.10.x!

Thanks, Manuel Krause
ReplyDelete
Replies
Anonymous17 September 2013 at 12:35
I meant BFS 442 (not 422) Sorry for typo. Manuel
ReplyDelete
Replies
Anonymous17 September 2013 at 12:51
And the backport should clarify if it's 3.11 OR 3.10 related, Manuel
ReplyDelete
Replies
techaddicted24 September 2013 at 00:31
Dear graysky,
i noticed that if i use the newest linux-ck (Sandybridge) my external USB Mouse stopped working after about 5 sec.
With linux-3.11.1-1 all seams right.
Anybody else noticed that? Or is it just me (Lenovo E530 i7)
ReplyDelete
Replies
techaddicted25 September 2013 at 05:32
Dear graysky, it all seams that it is a problem the patchset, i did all the thinks you said and still the same problem. But: before the update to 3.11.1-2 all things worked right. So maybe something wrong with that?
ReplyDelete
Replies
techaddicted26 September 2013 at 03:33
Myth story solved!
All that wired stuff happend because of Laptop-Tools (USB Suspend), now it is disabled and all work perfect.
ReplyDelete
Replies
Anonymous9 October 2013 at 11:09
I made the backport of
3.11-sched-bfs-442.patch to
3.10-sched-bfs-442-new.patch
for consumation with Longtermstable Linux-3.10

It was quiet easy. Beside just copying I just
added a spinlock.h function alias.

See attachement found in Gentoo bug at
https://bugs.gentoo.org/show_bug.cgi?id=487362

Greeting from Hamburg, Germany
Ralph Ulrich
ReplyDelete
Replies
Alfred Chen9 October 2013 at 14:17
Here is my patch to remove dead lock warning at system boot up which I reported some time ago.

diff --git a/kernel/sched/bfs.c b/kernel/sched/bfs.c
index 763d417..3a617be 100644
--- a/kernel/sched/bfs.c
+++ b/kernel/sched/bfs.c
@@ -6940,6 +6940,7 @@ void __init sched_init_smp(void)
BUG();
free_cpumask_var(non_isolated_cpus);

+ mutex_lock(&sched_domains_mutex);
grq_lock_irq();
/*
* Set up the relative cache distance of each online cpu from each
@@ -6953,7 +6954,6 @@ void __init sched_init_smp(void)
for_each_online_cpu(cpu) {
struct rq *rq = cpu_rq(cpu);

- mutex_lock(&sched_domains_mutex);
for_each_domain(cpu, sd) {
int locality, other_cpu;

@@ -6983,7 +6983,6 @@ void __init sched_init_smp(void)
rq->cpu_locality[other_cpu] = locality;
}
}
- mutex_unlock(&sched_domains_mutex);

/*
* Each runqueue has its own function in case it doesn't have
@@ -6999,6 +6998,7 @@ void __init sched_init_smp(void)
#endif
}
grq_unlock_irq();
+ mutex_unlock(&sched_domains_mutex);
}
#else
void __init sched_init_smp(void)
ReplyDelete
Replies
Unknown10 October 2013 at 09:28
I finally managed to at least find a workaround for the suspend crash.
It seems there is a problem with the process migration in the cpu offline code.
simply run this command before suspending:
ps -eo pid | xargs -I'{}' taskset -pc 0 {}
this sets all processes on cpu 0 so no need for migration
after resuming this: (you have to change this to your core count this is for a 4 core i7 ):
ps -eo pid | xargs -I'{}' taskset -pc 0-7 {}
ReplyDelete
Replies
techaddicted15 October 2013 at 02:57
Hi greysky, it's me again.
I updated my linux-ck-sandybridge via repo-ck to 3.11.5 and now if got this problem if i want to load the nvidia module.

sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': Exec format error

This don't happen if i use the stock arch kernel (3.11.4).

Should i run the nvidia-bug-report and post the results?

Greetings,
Jan
ReplyDelete
Replies
techaddicted15 October 2013 at 03:48
Downloaded the new nvidia-ck-sandybridge driver, now works! Thanks!
:)
ReplyDelete
Replies
Anonymous16 October 2013 at 01:53
Yes, OT, but i think, it´s interesting:

"Here's Why Radeon Graphics Are Faster On Linux 3.12"
http://www.phoronix.com/scan.php?page=article&item=linux_312_performance&num=1

https://patchwork.kernel.org/patch/2670981/
https://patchwork.kernel.org/patch/2670991/
https://patchwork.kernel.org/patch/2671001/

have fun
ReplyDelete
Replies
Anonymous19 October 2013 at 06:33
performance drop with ck1, can somebody confirm this?

Phoronix Test Suite v4.8.3
running only point 3: "Run Complex System Test"

3.11.5+stable_queue / 3.11.5+stable_queue+ck1
3048 / 2173 TPS -- PostMark
7545.56 / 7026.06 MB/s -- RAMspeed SMP [Average/Integer]
7598.59 / 7142.49 MB/s -- RAMspeed SMP [Average/Floating Point]
43.35 / 44.09 Seconds -- C-Ray
22213.19 / 14038.33 RPS -- Apache Benchmark

i will test bfs (not ck) later.
ReplyDelete
Replies
Anonymous20 October 2013 at 02:17
3.11.5+stable_queue / 3.11.5+stable_queue+ck1 / 3.11.5+stable_queue+bfs
3048 / 2173 / 2083 TPS -- PostMark
7545.56 / 7026.06 / 7037.25 MB/s -- RAMspeed SMP [Average/Integer]
7598.59 / 7142.49 / 7148.92 MB/s -- RAMspeed SMP [Average/Floating Point]
43.35 / 44.09 / 44.05 Seconds -- C-Ray
22213.19 / 14038.33 / 13624.50 RPS -- Apache Benchmark
ReplyDelete
Replies
Unknown20 October 2013 at 05:28
baseline 3.11.6 / bfs 3.11.6
7076 / 7142 -- PostMark
15483.20 / 15483.20 -- RAMspeed SMP [Average/Integer]
15500.74 / 15500.74 -- RAMspeed SMP [Average/Floating Point]
25.46 / 24.25 -- C-Ray
19177.54 / 33466.23 -- Apache Benchmark
ReplyDelete
Replies
Anonymous21 October 2013 at 08:25
Kernel: 3.11.6 | Scheduler: cfs/bfs

_1: - .config from siduction
cfs_1: - IRQ_TIME_ACCOUNTING=y
bfs_1: - IRQ_TIME_ACCOUNTING=y SCHED_BFS=y
apart from that, cfs_1 & bfs_1 are identical (except NUMA_BALANCING | CGROUP_CPUACCT | CGROUP_SCHED | SCHED_AUTOGROUP (all disabled by bfs))

_2: - .config by me
cfs_2: - TICK_CPU_ACCOUNTING=y
bfs_2: - IRQ_TIME_ACCOUNTING=y SCHED_BFS=y
bfs_3: - TICK_CPU_ACCOUNTING=y SCHED_BFS=y
apart from that, cfs_2 & bfs_2 are identical (except NUMA_BALANCING | CGROUP_CPUACCT | CGROUP_SCHED | SCHED_AUTOGROUP !!!and!!! TICK_CPU_ACCOUNTING (all disabled by bfs))
bfs_2 & bfs_3 are identical, except IRQ_TIME_ACCOUNTING/TICK_CPU_ACCOUNTING

cfs_1 / bfs_1 / cfs_2 / bfs_2 / bfs_3
2403.00 / 1693.00 / 3086.00 / 2174.00 / 2130.00 TPS -- PostMark
7242.90 / 7463.08 / 7481.70 / 7067.49 / 7260.16 MB/s -- RAMspeed SMP [Int]
7339.97 / 7504.71 / 7576.60 / 7110.45 / 7400.83 MB/s -- RAMspeed SMP [Float]
0044.18 / 0043.27 / 0043.36 / 0043.99 / 0043.33 Sec -- C-Ray
18725.01 / 4364.33 / 23478.09 / 14183.98 / 14443.12 RPSec -- Apache Benchmark

there is a great perf. improvement with cfs and .config (cfs_2)

with bfs and a nearly identical config (bfs_2), the performance will drop
with TICK_CPU_ACCOUNTING=y, the lost in the RAMspeed test is not so high (bfs_3)
ReplyDelete
Replies
Unknown24 October 2013 at 02:53
Just noticed that while I am launching a compile job with -j num_core_in_my_system, UI responsiveness is impacted (ie: mouse cursor severely micro-blocking, web browser taking a very long time to load pages).

I am wondering if this behavior is surprising and if using schedtool -I on the Xserver would be a good idea to workaround this issue.
ReplyDelete
Replies
幻影火28 October 2013 at 00:59
with bfs apply to ubuntu source.
seems just let this code
" struct sched_rt_entity rt;"
move out to the else
but i move out the three code out of else.
ReplyDelete
Replies
el mariachi29 October 2013 at 06:50
I have been experiencing random shutdowns since some time ago. I cannot say for sure after which upgrade, but I'm positive it only occurs in the 3.11 series.
My laptop simply shuts down (not a reboot!), the battery led lighting up for a second. After this, pressing the power button doesn't "wake up" the screen, which remains black and without any power (i.e. not even the backlight turns on). I press the power button once again, for a hard shutdown, and then everything works again. I cannot see a pattern -- sometimes this happens as soons as I boot into X, other times it can be after hours, even days.
Booting with the vanilla ARCH kernel never elicits this behaviour, hence my suspicion that the kernel might be culprit.
.config
http://pastebin.com/qZFgkiFB
ReplyDelete
Replies
Oleksandr Natalenko29 October 2013 at 17:51
Any urw locks patch for grq? Old one fails with BFS v442.
ReplyDelete
Replies
zed6 November 2013 at 20:24
Phoronix: "BFS Scheduler Lost Some Charm With Linux 3.11" http://www.phoronix.com/scan.php?page=news_item&px=MTUwNDI
ReplyDelete
Replies
techaddicted7 November 2013 at 04:10
Any thoughts about a release date of linux-ck 3.12?
ReplyDelete
Replies
Anonymous9 November 2013 at 01:59
Another time again a question to you latency addicted people:
Does someone of you know any knobs, hints or web-links on how to ease the pain of heavy swapping I/O?
(I've ported my setup to the newer machine with Core2duo, 4GB RAM, 4GB /dev/shm & 10GB of swap on the 2nd disk. Often using the /dev/shm as a RAMDISK to decode files to and re-code them back to disk from there.) Now with 3.11.7+BFQ+BFS as of ck1. During swapping, avi video replay will stutter in frames for video and sound.

I've, so far, experimented with /proc/sys/vm/dirty_ratio, .../dirty_background_ratio, .../swappiness, and even with "schedtool -R -p 99 -n -19 `pidofproc -n kswapd0`" or "ionice -p `pidofproc -n kswapd0`" -- but don't see any direction to follow.

I'm in doubt if there is an issue between /dev/shm + swap + physical RAM "communication".

Thank you in advance for sharing your ideas/findings,
Manuel Krause

ReplyDelete
Replies
Anonymous15 November 2013 at 04:42
under KDE as a normal user - i´m too lazy for something else
ReplyDelete
Replies
Alfred Chen15 November 2013 at 20:32
For those who waiting bfs on 3.12, I have ported bfs-0441 to 3.12, there is 3 conflicts, but seems that are minor ones. After resolved the conflicts and build the kernel, it runs on my core2 machine.

Before ck release new version of bfs on 3.12, you can try this out.

bfs patch at
https://bitbucket.org/alfredchen/linux-gc/commits/b2912adfc8af58528e5a9d846e4873c1caa67331/raw/

And also my patch to fix the circle dead-lock
https://bitbucket.org/alfredchen/linux-gc/commits/6266b7678235c85575cfeeb380fc65ae94b0fd67/raw/

All credit goes to ck. :)
ReplyDelete
Replies
Anonymous10 January 2014 at 17:46
@ graysky: I really hope to see an updated version of your report (http://repo-ck.com/bench/cpu_schedulers_compared.pdf) when bfs for 3.13 comes out :-)
ReplyDelete
Replies
thomasmappbe12 January 2019 at 02:03
Thanks
ReplyDelete
Replies

Add comment