Sunday, 3 April 2011

BFS 0.373 test

The BFS 0.372 test patch has proved quite a success.There have been no regressions in performance, with slight improvements even with the performance CPU governor, and better latency all round on SMP now. There were some rare crashes that I had to track down and I believe I've fixed them all so I'm releasing another test patch, 0.373, which has addressed them and is otherwise the same as 0.372.

Apply to a 0.363 patched kernel:
bfs363-373-test.patch

Thanks to those who have tested and reported back so far!

UPDATE: Throughput benchmark results on a 6 core AMD courtesy of Serge Belyshev:
benchmark3.results-cfs-bfs363-bfs373.txt

I am planning on posting latency benchmarks soon too.

47 comments:

  1. Hi,
    I've just decided to test your patch... it looks working but I noticed a little problem. Scrolling on longer web pages in Chromium is not so smooth like it was with BFS 0.363.

    I'm using Gentoo, linux kernel 2.6.38.2 with your patchset and ondemand CPU governor. My CPU is Intel Core i5-520m.

    ReplyDelete
  2. Are you sure its bfs related? I had kernel panic resuming from hibernation, and slower 2d rendering with 2.6.38.2. I even tried the latest patches (2.6.38.3pre) and the problems were still there. 2.6.38.1 works flawlessly with bfs-373.

    ReplyDelete
  3. @vojta - not sure how you feel about Chrome extensions but smoothscroll from the Chrome Web store seems to help the browser scrolling issue. I'm not sure if it is totally related to bfs as google shows some hits about this issue outside of bfs.

    ReplyDelete
  4. Well, I have slower 2d after hibernation with all kernels since beginning on this notebook but after reboot everything work fine.

    However, when I start with kernel 2.6.38.2 and BFS 363 everything works fine but when I start same kernel with 373 it is slower than with 363. I'm going to try kernel 2.6.38.1 with BFS 373.

    ReplyDelete
  5. @Texstar: I've tried that extension and it isn't smooth with BFS 373 and 2.6.38.2, either.

    ReplyDelete
  6. I'm compilling 2.6.38.1 now and it looks that under heavy load (make -j4) scrolling is more smoother than when CPU is idle.

    ReplyDelete
  7. Same with 2.6.38.1. Btw, I apologize for so many posts.. :/

    ReplyDelete
  8. No apology necessary. I appreciate the feedback. I assume you're sure it's clearly going from 363 to 373 that you have the issue? What happens with the performance governor? Also I do notice the nvidia+kde+compositing issue comes back more quickly with 373 (presumably because it's brought on by some race condition that is exaggerated with the lower rr interval on 373). Is it slower right from the start or does it get slower with further usage? I'd be somewhat surprised to learn that it would slower under the conditions you describe but anything's possible. The processor you describe is very similar to the i7 620M I run and I have the opposite experience.

    ReplyDelete
  9. Yes, I'm sure it's difference between 363 and 373. I've tested it couple times. And I have nvidia and I'm using KDE and compositing. :/ When I use BFS 363 CPU is still on the lowest frequency 1199 MHz, but with BFS 373 it is scaling up a little. And yes, it is slowed right from start.

    However, a while ago I did some magic with X restarts and switching to console (Ctrl-Alt-F1) and back and suddenly it was working fine with BFS 373. After reboot it is slowed again. So, I guess it will be related to nvidia drivers or something around.

    ReplyDelete
  10. Aha well that's what I found too. So the problem is not BFS at all, it's just exacerbating the long term nvidia+kde+compositing problem. To me this is not a 373 regression at all. This is a very frustrating combination of problems that has been around for a while now as you can see I've been complaining about it too. It's both good and bad. Good because BFS is not responsible, but bad because BFS will make it happen sooner. However I can't very well "slow down BFS" just to avoid the problem.

    ReplyDelete
  11. The problems I had with kernel 2.6.38.2 (Anonymous above) are related to two patches in 2.6.38.2. These are drm-i915-disable-pagefaults-along-execbuffer-relocation-fast-path.patch (rendering) and x86-cleanup-highmap-after-brk-is-concluded.patch (hibernation). By reversing those patches, everything works ok now. Just post it here in case anyone is interested about it.

    ReplyDelete
  12. @ck: I understand that you can't "slow down BFS" because of this nvidia+kde+compositing problem. I hope I will find at least some workaround (switching to console etc.) to avoid slowed 2d. Anyway, thank you for these patches. :)

    ReplyDelete
  13. Thanks. I find switching VTs the most effective way of fixing it, immediately after logging in. It's far less common on kde4.6 but it sure is freaking annoying that they haven't nailed what the problem is yet. I haven't tried 4.6.1 yet.

    ReplyDelete
  14. Are you seeing this with the latest beta nvidia drivers as well as stable (270.30 vs. 260.44.19)?

    ReplyDelete
  15. Been compiling all afternoon in mass using 2.6.38.2 + ck1 + bfs v0.373 and no crashes. 0.373 looks like a great improvement stability wise over 0.372. Will 0.373 go into ck2-patchset any time soon or do you see additional testing as required before the release?

    ReplyDelete
  16. works nicely on zen kernel. multitasking feels nicer than bfs 0.360, which for some reason worked better than 363 in zen kernel. ondemand governor definitely has better performance.

    ReplyDelete
  17. @Anonymous: I've tested it only with beta drivers 270.18. I'm going to test 270.30 and then the stable one.

    ReplyDelete
  18. Thanks everyone for your testing.

    I use the latest stable nvidia driver myself.

    @graysky : I'm pretty sure that 373 is stable, but I'm being extra cautious this time. I'm planning on not changing the code itself any further, but I'll likely update the documentation and comments to reflect the changes in the code, and then push the version number up signifying the magnitude of change. Either way, I'd say it would almost certainly be safe to include this in a new package.

    ReplyDelete
  19. Ralph Ulrich5 April 2011 08:26

    Works here with nvidia-270.30-beta.

    I doubt anyone knows for sure what BFS does regarding:
    latency - throughput - energy efficiency

    Less latency is a sort of attention which costs energy and throughput. This is not known by people on zen kernel discussion forums arguing their BFS patched kernel compiles longer time and is not performant. What is performance? Differs what you expect from this "parole".

    We need some tools to exactly measure effects!

    ReplyDelete
  20. @Ralph: Indeed, performance on a desktop is NOT just throughput. I care a lot about interactivity, responsiveness AND throughput. It's my aim to primarily get good interactivity and responsiveness, while still preserving good throughput. I have posted some 6 core throughput benchmarks in the main post now, and I will be posting some latency benchmarks once they're at hand.

    ReplyDelete
  21. Ralph Ulrich5 April 2011 08:52

    @Con, I would like to have some tools: As advanced user i like to experiment and would like to exactly see effects.

    Some days ago I asked for the scheduler plugin (mainline), which you have bad feelings about, I know. But I like to test development kernels as soon as linux-rc3 is out. Such a plugin would make it feasable for me to patch a development kernel, probably ...

    ReplyDelete
  22. Sorry Ralph, the fully pluggable scheduler is almost more work to maintain than BFS itself which is why I gave it away a long time ago. I have to minimise my workload and to do that I can only dedicate time to keeping in sync whenever a new "stable" 3 point release kernel comes out.

    ReplyDelete
  23. Hi,

    When using 2.6.38.2 with CFS or BFS 0.363,
    it works well. However, if I apply BFS 0.373 test, it alwayas happens kernel panic at on booting shows that:

    EIP is at this_cpu_load...

    call trace:
    menu_select
    hrtimer_start_range_ns
    cpuidle_idle_call
    cpu_idle
    start_kernel
    unknown_bootoption

    I ever tried to disable ACPI by the parameter "acpi=off", and it could boot successfully.

    This machine is UP (AMD Athlon(tm) XP 2500+) with the following configurations:

    CONFIG_TICK_ONESHOT=y
    CONFIG_NO_HZ=y
    CONFIG_HIGH_RES_TIMERS=y
    CONFIG_PREEMPT=y

    ReplyDelete
  24. Most interesting! Thanks for testing on UP.

    Does applying this on top fix it?
    http://ck.kolivas.org/patches/bfs/bfs373-upfix.patch

    ReplyDelete
  25. This patch fixes the issue.
    Thanks.

    ReplyDelete
  26. CK - Bad news with v0.373 vs. v0.363 with regard to x264 encoding. I have seen approx a 7 % increase in total encode time (takes longer) and a corresponding average fps decease when comparing 2.6.38.2 with ck1 (v0.363) and 2.6.38.2 with ck1 (v0.373). The result is that v0.373 makes x264 encoding slower when compared to v0.363 :(

    Everything except the bfs version was held constant.

    I can post or email you a bash script I use for x264 benchmarks if you want to try it for yourself.

    ReplyDelete
  27. Ralph Ulrich6 April 2011 05:12

    > Sorry Ralph, the fully pluggable scheduler is almost more work to maintain

    Thnx for explanation. When I saw your BFS scheduler mostly residing in one file, I thought it would be easy to gain a plugin ....

    ReplyDelete
  28. @graysky: I'm aware it's a tiny bit slower when heavily loaded with ondemand enabled, and also the default rr interval is now lower. Can you confirm the slowdown is only with ondemand? See the benchmarks above to see it's actually faster on that machine/performance.

    ReplyDelete
  29. Sorry, CK. Similar results when manually forcing performance multiplier. 6 % decrease in fps and 6 % increase in encode time. This isn't a one-off the script has 5 replicates and the data for all 5 runs is very tight. The decrease is statistically significant.

    Please let me know if you want my bash script and sample 720p video for testing on your boxes.

    ReplyDelete
  30. Ooh in that case it may just be the change in rr interval. It's now 6 on all machines. Can you try resetting it to what it was on previous BFS? If you're on dual core it would be 9, if on quad core it would be 12 and so on
    echo 12 > /proc/sys/kernel/rr_interval

    ReplyDelete
  31. I bumped it to 12 since this is an X3360 and re-ran. Same 7 % decrease as seen with the stock setting of 6 :(

    ReplyDelete
  32. That's okay, your hardware may well respond well to the cache distance in previous BFS. I'll spin another patch for you try soon when you find the time. This is why testing is so important. Thanks!

    ReplyDelete
  33. Here you go. Could you please try this patch on top? It brings back the cache_distance feature which I tried to get rid of by just caring about sticky tasks.

    http://ck.kolivas.org/patches/bfs/bfs373-reinstate-cache_distance.patch

    ReplyDelete
  34. CK - Patched 2.6.38.2 in this order:

    1) ck1
    2) bfs363-373-test
    3) bfs373-reinstate-cache_distance

    Rebooted to fresh kernel and redid test, result: same 7 % decrease in performance vs. v0.363 :(

    ReplyDelete
  35. Hmm, something's not holding here and I can't quite put my finger on it. Are you comparing the results of 1) and 3) above, or just BFS363? I have better performance on 2.6.35 versus 2.6.38 and I just want to be absolutely certain the only thing different is the patch 2) above. If you revert to ck1 do you get the 7% back?

    ReplyDelete
  36. I'm confused. Is this what you want me to do:

    Experiment #1 - 2.6.38.2 + CK1
    Experiment #2 - 2.6.38.2 + CK1 + bfs363-373-test
    Experiment #3 - 2.6.38.2 + CK1 + bfs363-373-test + bfs373-reinsate-cache_distance
    Experiment #4 - 2.6.38.2 + CK1 + bfs373-reinsate-cache_distance

    All with ondemand gov? Please spell out any thing else you'd like to see and I'll gladly run them!

    ReplyDelete
  37. No I just wanted to confirm that your comparison was experiment 1 and 2 in your latest comment.

    ReplyDelete
  38. Okay CK. I ran the following experiments and attached links to the data.

    #1 - bfs363
    #2 - bfs373
    #3 - bfs374
    #4 - bfs374+rr_interval set to 12

    All are with kernel-2.6.38.2 with CK1 patchset and various versions of bfs using ondemand gov on my quad core machine using a different 720p60 mpg clip (this one is 1 min long and is different from the one I emailed to you). I ran them 5 times per kernel to get a good set.

    As you can see, bfs363 is the fastest in both encode time and throughput.

    Results:
    bfs373 and bfs374 are both about 6 % slower than bfs363.

    If I jack he rr_int from 6 to 12 on bfs374, the decrease drops to about 4 % slower. Still, the newer bfs' are statistically significantly slower than bfs363 for x264 encoding :(


    I posted the data to this google spreadsheet and provided some barcharts and boxplots with statistics.

    https://spreadsheets.google.com/ccc?key=0AhUjKA6UbmtqdEN3ZTl2NE1aU0hXRFloenZqUGFhc3c&hl=en
    http://img856.imageshack.us/img856/7336/barchartavgencodethroug.png
    http://img695.imageshack.us/img695/1382/barchartavgencodetimese.png
    http://img64.imageshack.us/img64/2135/boxplotencodethroughput.png
    http://img52.imageshack.us/img52/4756/boxplotencodetime.png

    ReplyDelete
  39. Hmm interesting. I was doing the benchmarks myself on x264 but this time I tried your script.

    I get a 4.7% drop in speed going from performance to ondemand governor, but I get the same throughput with performance on 363 as I do with 373.

    for i in `seq 0 3` ; do echo ondemand > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor ; done

    Average speed (fps) : 36.48 (0.0375)

    for i in `seq 0 3` ; do echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor ; done

    Average speed (fps) : 38.18 (0.2190)

    As I said I am getting a slowdown on ondemand, but I don't see a slowdown with performance governor going from 363 to 373. On the other hand, I see a speedup going from 2.6.38 to 2.6.35 with the same bfs as well.

    ReplyDelete
  40. Oh and I should have said, I am trying to make up the performance with ondemand, but it's difficult to get the performance at low loads -and- high loads working optimally as fixing one tends to offset the other. But it hasn't stopped me trying (and I still am trying).

    ReplyDelete
  41. Using BFS v363 I get a trace (see below) at boot time. There is no kernel panic. In fact, the system works apparently fine after the boot.

    Using mainline 2.6.38.2 there is no trace. When I patch BFS v363 the trace appears. This trace seems to depend only on BFS v363, but do _not_ depends on any other CK patches, neither the bfs363-373 patch.

    $ dmesg | grep -B5 -A50 here
    [ 4.287135] hub 5-0:1.0: 2 ports detected
    [ 4.287217] uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
    [ 4.287222] uhci_hcd 0000:00:1d.2: setting latency timer to 64
    [ 4.287225] uhci_hcd 0000:00:1d.2: UHCI Host Controller
    [ 4.287230] uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 6
    [ 4.325120] ------------[ cut here ]------------
    [ 4.325126] WARNING: at kernel/irq/manage.c:291 __enable_irq+0x32/0x5b()
    [ 4.325128] Hardware name: Infoway
    [ 4.325129] Unbalanced enable for IRQ 16
    [ 4.325130] Modules linked in: tpm_tis tpm tpm_bios ehci_hcd(+) uhci_hcd(+) r8169 parport_pc parport processor button snd_timer snd_seq_device mii usbcore jmicron(+) snd soundcore snd_page_alloc intel_gtt agpgart reiserfs sd_mod ata_generic ata_piix libata ide_pci_generic ide_core evdev thermal fan thermal_sys
    [ 4.325142] Pid: 875, comm: modprobe Not tainted 2.6.38.2 #4
    [ 4.325144] Call Trace:
    [ 4.325148] [] ? warn_slowpath_common+0x65/0x7a
    [ 4.325150] [] ? __enable_irq+0x32/0x5b
    [ 4.325152] [] ? warn_slowpath_fmt+0x26/0x2a
    [ 4.325154] [] ? __enable_irq+0x32/0x5b
    [ 4.325156] [] ? enable_irq+0x60/0x80
    [ 4.325165] [] ? ide_probe_port+0x4cb/0x4f1 [ide_core]
    [ 4.325172] [] ? ide_host_register+0x211/0x513 [ide_core]
    [ 4.325179] [] ? ide_pci_init_two+0x4dc/0x594 [ide_core]
    [ 4.325184] [] ? idr_get_empty_slot+0x144/0x215
    [ 4.325186] [] ? ida_get_new_above+0xcc/0x166
    [ 4.325189] [] ? get_parent_ip+0xb/0x31
    [ 4.325191] [] ? get_parent_ip+0xb/0x31
    [ 4.325194] [] ? set_memory_wb+0xb/0x3a
    [ 4.325200] [] ? ide_pci_init_one+0xd/0xf [ide_core]
    [ 4.325204] [] ? jmicron_init_one+0xf/0x11 [jmicron]
    [ 4.325207] [] ? local_pci_probe+0x3e/0x81
    [ 4.325209] [] ? pci_device_probe+0x43/0x66
    [ 4.325213] [] ? driver_probe_device+0x8f/0x117
    [ 4.325216] [] ? __driver_attach+0x43/0x5f
    [ 4.325218] [] ? bus_for_each_dev+0x3d/0x67
    [ 4.325220] [] ? driver_attach+0x14/0x16
    [ 4.325222] [] ? __driver_attach+0x0/0x5f
    [ 4.325224] [] ? bus_add_driver+0x9b/0x1cb
    [ 4.325226] [] ? driver_register+0x7c/0xe3
    [ 4.325230] [] ? notifier_call_chain+0x26/0x48
    [ 4.325232] [] ? __pci_register_driver+0x38/0x95
    [ 4.325235] [] ? jmicron_ide_init+0x17/0x19 [jmicron]
    [ 4.325237] [] ? do_one_initcall+0x71/0x11c
    [ 4.325239] [] ? jmicron_ide_init+0x0/0x19 [jmicron]
    [ 4.325243] [] ? sys_init_module+0xcd3/0xe5f
    [ 4.325247] [] ? sysenter_do_call+0x12/0x28
    [ 4.325249] ---[ end trace 26fc71af13f32d63 ]---
    [ 4.325280] Probing IDE interface ide1...
    [ 4.326032] uhci_hcd 0000:00:1d.2: irq 18, io base 0x0000f900
    [ 4.326056] usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
    [ 4.326057] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
    [ 4.326059] usb usb6: Product: UHCI Host Controller
    [ 4.326060] usb usb6: Manufacturer: Linux 2.6.38.2 uhci_hcd
    [ 4.326062] usb usb6: SerialNumber: 0000:00:1d.2
    [ 4.326126] hub 6-0:1.0: USB hub found
    [ 4.326129] hub 6-0:1.0: 2 ports detected
    [ 4.326786] Floppy drive(s): fd0 is 1.44M
    [ 4.330051] ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
    [ 4.330065] ehci_hcd 0000:00:1a.7: setting latency timer to 64

    ReplyDelete
  42. I forgot to say that using the same BFS v363 with mainline 2.6.36.4 there is no trace at all. My system is a Debian lenny (oldstable) box.

    ReplyDelete
  43. @graysky: Okay I found something that's worth a few more cycles:
    http://ck.kolivas.org/patches/bfs/bfs374-preserve_sticky.patch
    Does this help?

    @Anonymous: Offhand I'm not sure what to make of that warning. Does everything still work?

    ReplyDelete
  44. Hi, CK. I find regression on my eee 901(atom cpu) using 373 test patch.

    When compiling git/pidgin, top shows that there about 6~10%idle using 373 testing. While using 363 it gives me near 0%idle.

    On the other hand, when 373 runs on intel core2 duo cpu, it also give me near 0%idle.

    So, I think the regression is related to HT?

    I'll find some time to test your new 376 patch.

    ReplyDelete
  45. Hi Alfred. Thanks for testing, I'm pretty sure that regression has been addressed in 376.

    ReplyDelete
  46. Thanks. Confirmed that the regression is gone away in 376.

    ReplyDelete
  47. Great, thanks for confirming it.

    ReplyDelete