-ck hacking: BFS 0.376 test

Thursday, 7 April 2011

BFS 0.376 test

TL;DR: Fastest BFS yet for SMP.

After extended testing on BFS 0.373, a number of minor issues came up, but the results were very promising. Now I believed I've addressed all the known issues with a newer version. Instead of flagging scaling CPUs by their governor alone, I now flag them as scaling only when they're actually throttled from maximum speed. This improves throughput further with the dynamic scaling governors like ondemand and brings it now very close to that of performance under full load. I also found that the sticky flagged tasks were not keeping their sticky flags if they were rescheduled back to back. This gave me even more of a performance boost under all situations. I addressed the oops that can occur on UP, and finally I updated the docs to match the changes in the scheduler design.

So hopefully this will be the last test patch (fingers crossed) before I make it official, because... I'm about >< close to burnout. That's not something I want to experience. Incremental for those on BFS 363 already: bfs363-376-test.patch

Full patch for 2.6.38ish:
2.6.38-sched-bfs-376.patch

Benchmarks as they come to hand...

---
x264 benchmarks Courtesy of Graysky:
Higher is better: boxplotencodethroughput.png
Lower is better: boxplotencodetime.png
CPU: Intel Xeon X3360 @ 8.5x400=3.40 GHz (4 cores/4 threads)
Linux version: Arch x86_64
x264 version: 0.114.x
handbrake version: svn3853
Base kernel version: 2.6.38.2
CK Patchset: CK1
Source video clip: 720p60 (1280x720) MPEG-PS @ 15 Mbps. 62 seconds long.
Run with ondemand multiplier, 5 times per kernel. Kernels use identical configs with exception of BFS version.
Handbrake CLI: --input test.m2ps --output output.mp4 --no-dvdnav --audio none --crop 0:0:0:0 --preset=Normal
---

15 comments:

Unknown7 April 2011 at 17:15
Legendary, thank you CK! Don't burn yourself out :)
ReplyDelete
Replies
TerminX7 April 2011 at 17:55
Keep up the good work; I've been using your patches since kernel 2.4 and I check for updates every couple of days because they're that damn good.
ReplyDelete
Replies
Alan7 April 2011 at 18:06
Ah-hah, this made my morning! I don't know which is worse: my kernel-recompiling addition or my Android-ROM-flashing addition...

Interactivity on my quad-core Xeon during heavy compiles and encodes has never been better. No more Rhythmbox skipping... whew!

Anyways, thanks from Kenya!
ReplyDelete
Replies
Anonymous7 April 2011 at 20:29
Yes CK.
Thank you for your work eventhough you're not really that appreciated by the mainstream kernel dev.
bfs + bfq is a good combo here on 2.6.35.11
ReplyDelete
Replies
graysky8 April 2011 at 08:00
Great job, CK! BFS v0.376 slightly out preforms v0.363 in my x264 tests. The results are darn close to a statistically significant margin by the way. As always, both versions of BFS are faster than the corresponding mainline scheduler by 2.6 % and 3.0 % respectively.

Here are the data:

http://img705.imageshack.us/img705/2135/boxplotencodethroughput.png
http://img849.imageshack.us/img849/4756/boxplotencodetime.png

Anyway, keep up the great work!
ReplyDelete
Replies
ck8 April 2011 at 08:24
Excellent, thank you very much everyone for your testing and feedback. It made a massive difference to making sure I tackled all the issues. I'm hoping this release can go gold over the weekend as version 0.4. If I may, graysky, could I post those very pretty graphs on my BFS page?
ReplyDelete
Replies
graysky8 April 2011 at 08:46
@CK - Always glad to help. Keep up the great work! Please feel free to repost.
ReplyDelete
Replies
graysky8 April 2011 at 08:58
Forgot to add some context to the data.

CPU: Intel Xeon X3360 @ 8.5x400=3.40 GHz (4 cores/4 threads)
Linux version: Arch x86_64
x264 version: 0.114.x
handbrake version: svn3853
Base kernel version: 2.6.38.2
CK Patchset: CK1

Source video clip: 720p60 (1280x720) MPEG-PS @ 15 Mbps. 62 seconds long.

Run with ondemand multiplier, 5 times per kernel. Kernels use identical configs with exception of BFS version.
ReplyDelete
Replies
graysky8 April 2011 at 09:00
Handbrake CLI: --input test.m2ps --output output.mp4 --no-dvdnav --audio none --crop 0:0:0:0 --preset=Normal
ReplyDelete
Replies
Anonymous9 April 2011 at 03:28
Thank you for your work.
When can we expect Ubuntu Packages vs 0.376?
ReplyDelete
Replies
Anonymous9 April 2011 at 03:33
I want to put it on the AMD Phenom II X6 1090T to work with a 10-15 game servers
ReplyDelete
Replies
graysky9 April 2011 at 12:01
Did a few more comparisons of mainline vs. bfs.

In x264 encoding, both bfs versions (0.363 and 0.376) beat the mainline scheduler hands down.
For compiling though, and interestingly enough, when my quad core CPU compiled filezilla-3.4.0, using make -j4, the latest bfs clearly beat both its predecessor and mainline; however, adding the extra thread to mainline (make -j5) brought it statistically in-line with bfs v0.376 for total compile time. Dunno what to make of that.

http://img854.imageshack.us/img854/4042/compilefilezilla340.png
http://img402.imageshack.us/img402/5769/720p60x264encode.png
ReplyDelete
Replies
ck9 April 2011 at 12:17
Thanks. That's typical of mainline's inability to utilise CPUs fully when load==CPUs, unlike BFS.
ReplyDelete
Replies
Ralph Ulrich9 April 2011 at 19:50
I would like to remind all af you what the problem was: Latency performance

1.) Latency on the desktop
2.) Energy efficiency for most of us having notebooks
3.) Throughput performance

All of that your tests done looking only at 3. ?

I now have the same Kernel:
2.6.38.2
2.6.38.2 + ck1 + 363
2.6.38.2 + ck1 + 376

And would like to have a fine test tool at hand to see results regarding 1.) for my Intel core2 mac mini !
ReplyDelete
Replies
ck9 April 2011 at 20:24
Ralph that's a very good comment. SSB who has performed many of the benchmarks to date, is compiling some meaningful benchmarks in this area and I will post them as they come to hand.
http://ck.kolivas.org/patches/bfs/test/wakeup-latency.c
Is a latency testing benchmark that he has run under various conditions with these different kernels and different throughput benchmarks at the same time. The results will be available soon, but suffice to say they're reassuring. Note also that absolute wake-up latency isn't the whole picture, since it will be wakeup-and-achieving-work that matters, but the two are intimately related.
ReplyDelete
Replies

Add comment