TL;DR: Fastest BFS yet for SMP.
After extended testing on BFS 0.373, a number of minor issues came up, but the results were very promising. Now I believed I've addressed all the known issues with a newer version. Instead of flagging scaling CPUs by their governor alone, I now flag them as scaling only when they're actually throttled from maximum speed. This improves throughput further with the dynamic scaling governors like ondemand and brings it now very close to that of performance under full load. I also found that the sticky flagged tasks were not keeping their sticky flags if they were rescheduled back to back. This gave me even more of a performance boost under all situations. I addressed the oops that can occur on UP, and finally I updated the docs to match the changes in the scheduler design.
So hopefully this will be the last test patch (fingers crossed) before I make it official, because... I'm about >< close to burnout. That's not something I want to experience.
Incremental for those on BFS 363 already:
Full patch for 2.6.38ish:
Benchmarks as they come to hand...
x264 benchmarks Courtesy of Graysky:
Higher is better: boxplotencodethroughput.png
Lower is better: boxplotencodetime.png
CPU: Intel Xeon X3360 @ 8.5x400=3.40 GHz (4 cores/4 threads)
Linux version: Arch x86_64
x264 version: 0.114.x
handbrake version: svn3853
Base kernel version: 184.108.40.206
CK Patchset: CK1
Source video clip: 720p60 (1280x720) MPEG-PS @ 15 Mbps. 62 seconds long.
Run with ondemand multiplier, 5 times per kernel. Kernels use identical configs with exception of BFS version.
Handbrake CLI: --input test.m2ps --output output.mp4 --no-dvdnav --audio none --crop 0:0:0:0 --preset=Normal