-ck hacking: BFS 0.372 test patch

Friday, 1 April 2011

BFS 0.372 test patch

Another day, another BFS test release. This one builds on the ideas in the 0.371-test3 patch I posted about since they're proving very positive. No April fools here. This looks like it's kicking arse.

Apply to a BFS 0.363 patched kernel such as 2.6.38-ck1:

bfs363-372-test.patch

Changelog from the patch:
---
Add a "sticky" flag for one CPU bound task per runqueue which is used to flag
the last cache warm task per CPU. Use this flag to softly affine the task to
the CPU by not allowing it to move to another CPU when a scaling CPU frequency
governor is in use. This significantly improves throughput at lower loads by
allowing tasks to cluster on CPUs, thereby allowing the scaling governor to
speed up only that CPU. This should aslo save power. Use the sticky flag to
determine cache distance in earliest_deadline_task task and abolish the
cache_distance function entirely. This is proven as, if not more, effective.

Add helpers to the 3 scaling governors to tell the scheduler when a CPU is
scaling.

Replace the frequent use of num_online_cpus() with a grq.noc variable that
is only updated when the number of online cpus changes.

Simplify resched_best_idle by removing the open coded for_each_cpu_mask as
it was not of proven benefit.

Remove warnings in try_to_wakeup_local that are harmless or never hit.

Clear the cpuidle_map bit only when edt doesn't return the idle task.

Abolish the scaled rr_interval by number of CPUs and now just use a fixed
nominal 6ms everywhere. The improved cache warmth of the sticky flag makes
this unnecessary and allows us to lower overall latencies on SMP by doing so.
---

Please test this one thoroughly. It's very stable and now heavily tested, but I won't announce any new "release" till it's been tested for maybe 5 days or more. It appears better in all workloads and with and without cpu frequency governors. Again, only SMP will benefit from this patch, but it should change behaviour in all SMP now, not just with ondemand. However, the scaling governors should show the most improvement.

The best example (with ondemand) was a single threaded cpu bound workload that took 126 seconds to complete on an i7 2 core/4 thread machine that now takes 91.5 seconds. The 2 threaded workload dropped from 66 seconds to 51.5 seconds. Note that this more or less addresses a regression in BFS behaviour with cpu frequency scaling on SMP, but it's also been an opportunity to improve behaviour elsewhere.

16 comments:

Unknown1 April 2011 at 21:33
I can't restore the system after hibernating it correctly. After a while the system hangs and CAPSLOCK starts blinking
ReplyDelete
Replies
ck1 April 2011 at 21:54
Thanks. Is this different to BFS363 or vanilla 2.6.38?
ReplyDelete
Replies
Anonymous1 April 2011 at 23:33
I have that error after hibernating. But only with kernels 2.6.38.2 and 2.6.37.6. 2.6.38.1 works fine.
ReplyDelete
Replies
Unknown2 April 2011 at 01:13
I've just tried your patches (ck1 + bfs363-372-test.patch) with 2.6.38 (no .1 nor .2) and I can resume normally
ReplyDelete
Replies
Ralph Ulrich2 April 2011 at 01:24
Works!
With newest stable-queue, like this:

release/patch-2.6.38.2
review/alsa-hda-new-ad1984a-model-for-dell-precision-r5500.patch
review/alsa-hda-fix-spdif-out-regression-on-alc889.patch
review/alsa-fix-yet-another-race-in-disconnection.patch
review/alsa-vmalloc-buffers-should-use-normal-mmap.patch
review/perf-better-fit-max-unprivileged-mlock-pages-for-tools-needs.patch
review/myri10ge-fix-rmmod-crash.patch
review/cciss-fix-lost-command-issue.patch
review/ath9k-fix-kernel-panic-in-ar2427.patch
review/sound-oss-opl3-validate-voice-and-channel-indexes.patch
review/mac80211-initialize-sta-last_rx-in-sta_info_alloc.patch
review/ses-show-devices-for-enclosures-with-no-page-7.patch
review/ses-avoid-kernel-panic-when-lun-0-is-not-mapped.patch
review/pci-acpi-report-aspm-support-to-bios-if-not-disabled-from-command-line.patch
review/x86-64-mm-put-early-page-table-high.patch
review/ecryptfs-unlock-page-in-write_begin-error-path.patch
review/ecryptfs-ecryptfs_keyring_auth_tok_for_sig-bug-fix.patch
review/crypto-aesni-intel-fixed-problem-with-packets-that-are-not-multiple-of-64bytes.patch
ck1/2.6.38-sched-bfs-363.patch
ck1/sched-add-above-background-load-function.patch
ck1/mm-zero_swappiness.patch
ck1/mm-enable_swaptoken_only_when_swap_full.patch
ck1/mm-drop_swap_cache_aggressively.patch
ck1/mm-kswapd_inherit_prio-1.patch
ck1/mm-background_scan.patch
ck1/mm-idleprio_prio-1.patch
ck1/mm-lru_cache_add_lru_tail.patch
ck1/mm-decrease_default_dirty_ratio.patch
ck1/kconfig-expose_vmsplit_option.patch
ck1/hz-default_1000.patch
ck1/hz-no_default_250.patch
ck1/hz-raise_max.patch
ck1/preempt-desktop-tune.patch
ck1/cpufreq-bfs_tweaks.patch
ck1/ck1-version.patch
ck2test/bfs363-372-test.patch
ReplyDelete
Replies
Jing2 April 2011 at 01:50
well...It's really soon to see this post...after I pack the last one this morning. XD This is the opensuse repository
11.4
http://download.opensuse.org/repositories/home:/jingtw:/kernel-11.4rc/openSUSE_11.4/
11.3
http://download.opensuse.org/repositories/home:/jingtw:/kernel-11.4rc/openSUSE_11.3/

The packages are kernel-ck100hz, kernel-ck1000hz for server and desktop. Everything seems be OK. I didn't get any problem for the last 12hrs. The OpenSUSE users may give it a try...
ReplyDelete
Replies
ck2 April 2011 at 03:27
Thanks for feedback. I don't think the hibernate issue has anything to do with this patch since it is kernel version related. Sorry I posted a new patch so soon after the other test patch, but I wasn't sure when I'd be able to "finish" the last test patch and managed to do so sooner than expected. I'm hoping not to have to do anything to this current version since it appears to do everything I need quite well.
ReplyDelete
Replies
ck2 April 2011 at 03:59
Here, read this link about suspend/resume issues with 2.6.38.2 :

http://marc.info/?l=linux-kernel&m=130151206329088&w=4
ReplyDelete
Replies
Alberto2 April 2011 at 09:05
Hi, in another post you wrote you're using kde + nvidia, so I'd like to know if you're experiencing the same problem as me: with compositing active the whole desktop experience is slowed down progressively, up to a point where typing a character requires half a second for it to appear. I noticed this with vanilla and -ck, but -ck seems to make the degradation a bit faster. Without compositing -ck is multiple times more reactive than vanilla.

Thanks
ReplyDelete
Replies
ck2 April 2011 at 12:01
Yes that's right. No one admits fault there, neither nvidia nor kde, but it's clear that it's a problem when compositing+kde+nvidia are used on kde4. However I found the problem is much rarer on kde4.6 since upgrading to that. Usually flicking to another TTY/console and then back to my desktop fixes it.
ReplyDelete
Replies
Ralph Ulrich2 April 2011 at 12:04
I have kde-4.6 with Composite-kwin, but most modules of win-effects disabled.

I do run my core2 Gentoo 64bit system since my first post today without problems!
ReplyDelete
Replies
Jing2 April 2011 at 12:37
@Alberto, try to change "Desktop Effect -> Advance -> Scale Method" to Smooth.
ReplyDelete
Replies
Anonymous2 April 2011 at 17:08
@Alberto, try it: http://techbase.kde.org/User:Lemma/KDE4-NVIDIA
ReplyDelete
Replies
chogydan3 April 2011 at 05:11
FYI Ubuntu users: I have copied a patched kernel into my ppa.

I am currently running, and I am getting better scores on the kraken benchmark for all frequency governors. ondemand is still allot slower for me (18s vs 13s for conservative).
ReplyDelete
Replies
Texstar3 April 2011 at 17:49
Running like a champ here.
ReplyDelete
Replies
Counter Strike Hacks16 April 2011 at 18:28
cool ur champ
Counter Strike Hacks
ReplyDelete
Replies

Add comment