Thursday 19 May 2011

2.6.39-ck1

These are patches designed to improve system responsiveness and interactivity
with specific emphasis on the desktop, but suitable to any commodity hardware workload.


Apply to 2.6.39:
patch-2.6.39-ck1.bz2

Broken out tarball:
2.6.39-ck1-broken-out.tar.bz2

Discrete patches:
patches

Ubuntu packages:
http://ck.kolivas.org/patches/Ubuntu%20Packages

All -ck patches:
http://www.kernel.org/pub/linux/kernel/people/ck/patches/

BFS by itself:
http://ck.kolivas.org/patches/bfs/

Web:
http://kernel.kolivas.org

Code blog when I feel like it:
http://ck-hack.blogspot.com/

Each discrete patch contains a brief description of what it does at the top of
the patch itself.


The most substantial change since the last public release is a major version upgrade to the BFS CPU scheduler version 0.404.

Full details of the most substantial changes, which went into version 0.400, are in my blog here:
http://ck-hack.blogspot.com/2011/04/bfs-0400.html

This version exhibits better throughput, better latencies, better behaviour with scaling cpu frequency governors (e.g. ondemand), better use of turbo modes in newer CPUs, and addresses a long-standing bug that affected all configurations, but was only demonstrable on lower Hz configurations (i.e. 100Hz) that caused fluctuating performance and latencies. Thus mobile configurations (e.g. Android on 100Hz) also perform better. The tuning for default round robin interval on all hardware is now set to 6ms (i.e. tuned primarily for latency). This can be easily modified with the rr_interval sysctl in BFS for special configurations (e.g. increase to 300 for encoding / folding machines).

Performance of BFS has been tested on lower power single core machines through various configuration SMP hardware, both threaded and multicore, up to 24x AMD. The 24x machine exhibited better throughput on optimally loaded kbuild performance (from make -j1 up to make -j24). Performance beyond this level of load did not match mainline. On folding benchmarks at 24x, BFS was consistently faster for the unbound (no cpu affinity in use) multi-threaded version. On 6x hardware, performance at all levels of load in kbuild and x264 encoding benchmarks was better than mainline in both throughput and latency in the presence of the workloads.

For 6 core results and graphs, see:
benchmarks 20110516
(desktop = 1000Hz + preempt, server = 100Hz + no preempt):

Here are some desktop config highlights:
Throughput at make -j6:

Latency in the presence of x264 ultrafast:

Throughput with x264 ultrafast:


This is not by any means a comprehensive performance analysis, nor is it meant to claim that BFS is better under all workloads and hardware than mainline. They are simply easily demonstrable advantages on some very common workloads on commodity hardware, and constitute a regular part of my regression testing. Thanks to Serge Belyshev for 6x results, statistical analysis and graphs.


Other changes in this patch release include an updated version of lru_cache_add_lru_tail as the previous version did not work entirely as planned, dropping the dirty ratio to the extreme value of 1 by default in decrease_default_dirty_ratio, and dropping of the cpufreq ondemand tweaks since BFS detects scaling CPUs internally now and works with them.


Full patchlist:

2.6.39-sched-bfs-404.patch
sched-add-above-background-load-function.patch
mm-zero_swappiness.patch
mm-enable_swaptoken_only_when_swap_full.patch
mm-drop_swap_cache_aggressively.patch
mm-kswapd_inherit_prio-1.patch
mm-background_scan.patch
mm-idleprio_prio-1.patch
mm-lru_cache_add_lru_tail-1.patch
mm-decrease_default_dirty_ratio.patch
kconfig-expose_vmsplit_option.patch
hz-default_1000.patch
hz-no_default_250.patch
hz-raise_max.patch
preempt-desktop-tune.patch
ck1-version.patch


Please enjoy!
お楽しみください
--
-ck

EDIT4: For those having hangs, please try this patch on top of ck1:
bfs404-test6.patch

66 comments:

  1. awesome scheduler. great work! thanks!

    ReplyDelete
  2. I just notice that with 2.6.39-ck1, starting a VirtualBox VM (Windows Server 2008) with IO APIC enabled will put the system into a "limbo" state:



    - the VM got stuck before the Windows bootup logo appear

    - "top" will get stuck

    - "ps aux" will get stuck when it is about to show the VirtualBox process

    - can only turn off the system by force shutdown



    This doesn't happen with 2.6.39. Does anyone else having the same issue?

    ReplyDelete
  3. Same issue here, but i can't reproduce it reliably (sometimes it happens after 20 mins or after 6 hours). It happens when i am using chromium: suddenly the browser freezes and i can't kill the process, top and pstree -p get stuck, ls /proc/* gets stuck too when it reaches the chromium pid, no messages in dmesg.

    ReplyDelete
  4. I've build the kernel with the most recent gcc 4.6 and the number of apps that actually run is miniscule. I get into xmonad just fine, but anything that apparently isn't urxvt or not limited to the console just won't start – no errors whatsoever, they just immediately freeze (incl. non GTK / qt apps like dzen – eclipse's and libreoffice's start screens do show some progress, but then freeze as well). The standard (Arch Linux) kernel works just fine.

    ReplyDelete
  5. Hi Con,

    thanks for you effort. 2.6.39 runs with ck patches here fine. I' ve added the BFQ disk scheduler too. As always, load most time over 1 (but this could be the result form the >80 open tabs in firefox and the nvidia driver ;) ). Most time the app switching works perfect without delay, but from time to time there is a delay from 5sec. and more under heavy load >5. No clue, where this come from, but could be an IO bottleneck from my laptop hdd. No matter, will live with that ;)

    @Anonymous, no problem with VirtualBox 4.0.8 here and XP as guest, but I switched some time ago the IO APIC off, because users from VirtualBox forums had mentioned performance drawbacks with it. There is a tool, with allow the changing without bsod in XP, could be usable on WS2008 too.

    CU sysitos

    ReplyDelete
  6. Interesting. I'm unable to reproduce any of those problems here. There used to be a problem with ultra-low dirty ratio settings in the past but I thought they were fixed in newer kernels. Perhaps you're running into a variant of those? Try
    echo 5 > /proc/sys/vm/dirty_ratio
    and see if it helps the problem.

    ReplyDelete
  7. @Mike, I believe IO APIC is needed to run 64-bit guests, and VirtualBox will enable it automatically.

    Here with VirtualBox 4.0.8, I can trigger the "process stuck" bug by creating a new Windows 7 64-bit guest. About 15 minutes into the installation, I will get a "black screen".

    ReplyDelete
  8. dirty_ratio didn't change anything for me. I've got a Core 2 Duo, i. e. 2 parallel threads, if that's of importance.

    ReplyDelete
  9. Thanks, interesting. I don't use virtualbox so I'm not sure what you're seeing there, but I guess if I can find the time I'll give it a go. I tried chrome but it works perfectly fine here. Perhaps it's a config option? Can any of you having the problem link or email me your configs?

    ReplyDelete
  10. I use the Arch Linux AUR kernel26-ck package. The config is contained in the tar file downloadable here: http://aur.archlinux.org/packages.php?ID=32877

    ReplyDelete
  11. Hmm no difference. I wonder, are you all using gcc 4.6 ?

    ReplyDelete
  12. Arch is a bleeding edge, rolling release.

    $ gcc --version
    gcc (GCC) 4.6.0 20110513 (prerelease)
    Copyright (C) 2011 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    ReplyDelete
  13. gcc 4.5 here:
    % gcc --version | head -n 1
    gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2

    It happened again (after 20h of uptime) some seconds after i opened a new tab (process) on chromium. I don't know what could be the trigger on this (i tried to push chromium to the limit after a reboot but nothing happened) so i am currently rebuilding my kernel with BFS disabled to see if that patch is the problem (so i can discard the rest of the ck patchset as a cause).

    ReplyDelete
  14. Same problem here with gcc version 4.6.1 20110521 (prerelease) (Debian 4.6.0-8). I'm starting to wonder whether some part of userland is the real cause here though, since reverting to kernels that I know for a fact worked previously doesn't clear the issue up any.

    When flinging crap at the wall to see what sticks in an attempt to fix this, I did just notice my XFS /home had some corruption from one too many power outages. Whether the issue will actually go away now remains to be seen.

    I'm pretty glad this is happening to other people. I was starting to think it was something I'd done myself when I recently upgraded from 32-bit to 64-bit userland in-place without using any chroots or debootstrap or rescue media. It was quite the hack so I'm always looking over my shoulder for issues to arise from it.

    ReplyDelete
  15. Perhaps there's another common link, like the filesystem? I'm unable to reproduce the problem locally.

    ReplyDelete
  16. How about the NVIDIA driver? Anyone else with the problem using that?

    ReplyDelete
  17. I'm using the nvidia driver without problems.

    ReplyDelete
  18. There is an application with heavy I/O load.
    It is a recorder that receiving video stream from ip cameras (using Buffer I/O)

    I observe one thing. If I use BFS, the application is hung within few minutes.
    Now, I change to use CFS, and it works fine so far.

    Does it is just caused by different schedule policy?

    kernel settings:
    preempt
    NO HZ
    1000HZ
    deadline I/O scheduler

    kernel log: (with BFS)
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    engine_main D 00000000 0 3499 3431 0x00000004
    f2802a00 00000082 ffffffff 00000000 00000000 00000000 0764c525 ee7c5a80
    51bf8469 0000000c 0000a7a3 0000000c c0616320 00000002 00000041 c063f400
    c069ea00 00000000 ee7c5bd8 c069ea00 00000000 00000000 c063fc04 c063f428
    Call Trace:
    [] ? try_to_wake_up+0x6e/0x100
    [] ? __alloc_pages_nodemask+0xf9/0x740
    [] ? rwsem_down_failed_common+0x95/0x100
    [] ? call_rwsem_down_write_failed+0x6/0x8
    [] ? down_write+0x12/0x20
    [] ? jfs_get_block+0x4c/0x2b0
    [] ? kmem_cache_alloc+0x8d/0xa0
    [] ? alloc_page_buffers+0x5f/0xb0
    [] ? nobh_write_begin+0x169/0x3a0
    [] ? jfs_write_begin+0x4f/0xb0
    [] ? jfs_writepage+0x10/0x10
    [] ? generic_file_buffered_write+0xe3/0x220
    [] ? __generic_file_aio_write+0x24b/0x4f0
    [] ? do_wp_page+0x314/0x760
    [] ? generic_file_aio_write+0x6b/0xe0
    [] ? wait_on_retry_sync_kiocb+0x50/0x50
    [] ? do_sync_write+0xb3/0xf0
    [] ? tick_program_event+0x19/0x20
    [] ? vfs_write+0xa0/0x140
    [] ? sys_write+0x41/0x80
    [] ? sysenter_do_call+0x12/0x26

    ReplyDelete
  19. The only likely place I can think of that has to do with I/O is the new plugged I/O flushing. Can those people having hangs please try this patch?

    bfs404-test.patch

    ReplyDelete
  20. Patch works, my VirtualBox 64-bit Windows guest works fine now. Thanks!

    ReplyDelete
  21. Wow, nice quick response, thanks!

    Here's a better patch (replacing that test patch) that is likely what I'll make part of the next release if it fixes all the problems:
    bfs404-test2.patch

    ReplyDelete
  22. I applied bfs404-test2.patch.

    the machine:
    08:21:55 up 37 min, 2 users, load average: 1.40, 1.73, 1.6

    I think the issue --- "application with heavy I/O load" has been fixed. Thnaks!

    ReplyDelete
  23. Just had my first hardlock using Virtualbox (x86_64 based guest). Patched and will report back.

    ReplyDelete
  24. wfm again as well. Thanks.

    ReplyDelete
  25. Make it 500.

    Don't bundle -- release it standalone. Focus!

    The main advantage of your scheduler is that it lacks the heavy tail in the distribution. It is not about HZ, which is boooooring and trivial.
    So make this point about the tail clear and only talk about this in your announcement. Focus!

    The second figure is the only one that is needed in the announcement. Explain it well. You should not talk about -j24. It is too technical and does not belong to the announcement. Link it and focus on your main point.

    ReplyDelete
  26. Well that sounds like a pretty convincing bugfix for a problem big enough to make a ck2. I'll be working on a new release shortly.

    @anonymous: If it hasn't been obvious so far, I'll say it again - I'm trying to avoid too much spotlight and fanfare on lkml.

    ReplyDelete
  27. It works well with 2.6.39 + bfs404.
    but when I applied bfs404-test2.patch, and got the error:
    INFO: task flush-8:0:798 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    flush-8:0 D 00000000 0 798 2 0x00000000
    00000066 00000046 00000000 00000000 00000000 00000000 defc98d8 b0d95099
    00000052 c060a4c0 def12620 c060e3e0 def12740 def12620 def12620 00000000
    00000000 00000000 00000000 00000282 1e62a13b d9c5c680 00000000 00000200
    Call Trace:
    [] ? submit_bio+0x48/0xd0
    [] ? schedule_timeout+0x12d/0x1a0
    [] ? bio_add_page+0x54/0x70
    [] ? _xfs_buf_ioapply+0x16c/0x1e0 [xfs]
    [] ? wait_for_common+0x88/0x140
    [] ? try_to_wake_up+0x100/0x100
    [] ? _xfs_buf_read+0x58/0x70 [xfs]
    [] ? xfs_buf_read+0x4f/0x80 [xfs]
    [] ? xfs_trans_read_buf+0x1a9/0x2d0 [xfs]
    [] ? xfs_read_agf+0x94/0x1e0 [xfs]
    [] ? xfs_alloc_read_agf+0x37/0xc0 [xfs]
    [] ? xfs_alloc_pagf_init+0x23/0x50 [xfs]
    [] ? xfs_bmap_btalloc_nullfb+0x1f6/0x340 [xfs]
    [] ? xfs_bmap_btalloc+0x408/0x790 [xfs]
    [] ? xfs_bmap_search_multi_extents+0x98/0x110 [xfs]
    [] ? xfs_bmapi+0xcb4/0x1450 [xfs]
    [] ? xfs_iomap_write_allocate+0x1fa/0x3c0 [xfs]
    [] ? xfs_ilock_nowait+0x58/0xd0 [xfs]
    [] ? xfs_map_blocks+0x222/0x230 [xfs]
    [] ? xfs_vm_writepage+0x307/0x510 [xfs]
    [] ? __writepage+0x8/0x30
    [] ? write_cache_pages+0x164/0x340
    [] ? account_page_writeback+0x50/0x50
    [] ? generic_writepages+0x38/0x60
    [] ? do_writepages+0x13/0x40
    [] ? writeback_single_inode+0x159/0x250
    [] ? writeback_sb_inodes+0xae/0x1c0
    [] ? writeback_inodes_wb+0xbf/0x160
    [] ? wb_writeback+0x2cb/0x340
    [] ? global_dirty_limits+0x2f/0xc0
    [] ? wb_do_writeback+0x16a/0x180
    [] ? bdi_writeback_thread+0x49/0xf0
    [] ? wb_do_writeback+0x180/0x180
    [] ? kthread+0x68/0x70
    [] ? flush_kthread_worker+0x70/0x70
    [] ? kernel_thread_helper+0x6/0xd
    INFO: task /usr/bin/deluge:2492 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    /usr/bin/deluge D 00000000 0 2492 1 0x00000000
    00000066 00200082 183b3000 00000000 00001000 00001000 d9badd6c 064754bf
    0000005c c060a4c0 d98b5b90 c060e3e0 d98b5cb0 00000000 d98b5b90 c060e3e0
    c060a4c0 d9baddf0 d9baddb4 c013fefc c060e3e0 00000803 d98b5b90 dd0f3570
    Call Trace:
    [] ? sched_clock_local.constprop.1+0x4c/0x1a0
    [] ? rwsem_down_failed_common+0x8d/0xe0
    [] ? call_rwsem_down_write_failed+0x6/0x8
    [] ? down_write+0x11/0x12
    [] ? xfs_ilock+0x4b/0x60 [xfs]
    [] ? xfs_file_buffered_aio_write+0x4c/0x120 [xfs]
    [] ? generic_file_aio_write+0xb6/0xd0
    [] ? xfs_file_aio_write+0x147/0x2a0 [xfs]
    [] ? futex_wait_queue_me+0xbf/0x100
    [] ? futex_wait+0x156/0x200
    [] ? xfs_file_buffered_aio_write+0x120/0x120 [xfs]
    [] ? do_sync_readv_writev+0xb2/0xf0
    [] ? update_cpu_clock+0x149/0x2d0
    [] ? rw_copy_check_uvector+0x3e/0xe0
    [] ? do_readv_writev+0xa5/0x1a0
    [] ? xfs_file_buffered_aio_write+0x120/0x120 [xfs]
    [] ? do_sync_read+0xf0/0xf0
    [] ? generic_file_llseek+0x4e/0x60
    [] ? vfs_writev+0x37/0x50
    [] ? sys_writev+0x3c/0x70
    [] ? sysenter_do_call+0x12/0x26

    ReplyDelete
  28. Does it eventually complete the I/O or is it hung again with test2?

    ReplyDelete
  29. 404-test2 definitely hangs here as well, just in different ways than regular 2.6.39-ck1. With the test2 patch applied, I don't even get to a desktop without the processes that are supposed to actually spawn the various GNOME processes hanging. Stuff like simple package upgrades with apt hung too.

    ReplyDelete
  30. I use XFS with delaylog mount option.
    When I applied test2, the application(delug) was hung within few minutes after staring.

    Then I typed dmesg to see what happened.
    Actually, the following call trace repeated twice in dmesg:
    INFO: task flush-8:0:798 blocked for more than 120 seconds.
    Call Trace:
    [] ? submit_bio+0x48/0xd0
    [] ? schedule_timeout+0x12d/0x1a0
    [] ? bio_add_page+0x54/0x70
    .....

    Thus I saw total 3 messages:
    task flush-8:0:798 blocked for more than 120 seconds
    ...
    task flush-8:0:798 blocked for more than 120 seconds
    ...
    INFO: task /usr/bin/deluge:2492 blocked for more than 120 seconds.
    ....

    ReplyDelete
  31. Thanks everyone for your feedback. It looks like I have no choice but to try a test3 patch as well:
    bfs404-test3.patch

    Again, this replaces the other test patch and should be applied to a 404 patched kernel on 2.6.39. Could you please test and report? Thanks!

    ReplyDelete
  32. I test the test3 patch to verify the issue:
    "There is an application with heavy I/O load.
    It is a recorder that receiving video stream from ip cameras (using Buffer I/O)...."

    It seems that the application hangs again.
    INFO: task engine_main:3522 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    engine_main D 00000000 0 3522 3447 0x00000004
    f4d02a00 00000086 00eac42d 00000000 00000000 00000000 07caac96 e928da80
    84549982 0000002c 0002c3e0 0000002c 00000001 f4846120 00000041 c0643800
    c06a5a00 00000001 e928dbd8 c06a5a00 00000000 00000000 c0644004 c0643828
    Call Trace:
    [] ? try_to_wake_up+0x6e/0x100
    [] ? __alloc_pages_nodemask+0xf9/0x740
    [] ? rwsem_down_failed_common+0x95/0x100
    [] ? call_rwsem_down_write_failed+0x6/0x8
    [] ? down_write+0x12/0x20
    [] ? jfs_get_block+0x4c/0x2b0
    [] ? kmem_cache_alloc+0x8d/0xa0
    [] ? alloc_page_buffers+0x5f/0xb0
    [] ? nobh_write_begin+0x169/0x3a0
    [] ? jfs_write_begin+0x4f/0xb0
    [] ? jfs_writepage+0x10/0x10
    [] ? generic_file_buffered_write+0xfa/0x250
    [] ? __generic_file_aio_write+0x24b/0x4f0
    [] ? generic_file_aio_write+0x6b/0xe0
    [] ? wait_on_retry_sync_kiocb+0x50/0x50
    [] ? do_sync_write+0xb3/0xf0
    [] ? vfs_write+0xa0/0x140
    [] ? sys_write+0x41/0x80
    [] ? sysenter_do_call+0x12/0x26

    ReplyDelete
  33. Let me clarify the question - does it crash the machine? Does it become unusable after this warning? Does the warning go away if you increase the dirty ratio?
    echo 20 > /proc/sys/vm/dirty_ratio

    ReplyDelete
  34. It never crash the machine.
    If the app cannot write data to disk, the memory usage will grow up because there is a data queue.
    I check the value of dirty_ratio, and it is 20 already.


    There is a little difference than before.
    a. use bfs 404, the warning message appears within few minutes. then check by iostat, the I/O rate is zero, and the memory usage of this application starts growing up.

    b. use bfs 404+test2, it works well (I tested it overnight)

    c. use bfs 404+test3, I check memory usage and I/O rate. When the I/O rate is zero than the memory usage starts growing up.
    However, the warning message doesn't always appear. I only caught it twice

    ReplyDelete
  35. Ah okay so test2 was good for you. Sorry it was another anonymous poster that had hung tasks with test2 and deluge. See I'm losing track of who has posted what. The warning does not actually imply failure, but that I/O is taking quite a while to commit. Interesting that it behaves differently to mainline at all.

    ReplyDelete
  36. I suspect there may be another bug somewhere in there. I can't see what else I can do for that plug flushing code. I need to review more of the changes going into 2.6.39-bfs. Is bfs (401 or 404) on earlier kernels okay for you?

    ReplyDelete
  37. Hi Con!
    I also experienced problems with chromium as already posted above. Then I tried the test2 patch, for a while it kept going, until the I/O issues appeared.
    The interesting things:
    1. Without test2 patch, when chromium hanged, if I launched htop to find and kill chromium it would lock up also, showing nothing but a blank console.
    2. With test2 patch, when I noticed the applications hanging (specifically: gentoo's emerge command) I tried to issue a "sync" command, which just locked up as if it couldn't sync the data.
    Trying to shutdown or reboot the system with some applications hanged will always result in a locked system while trying to stop the syslog-ng daemon (at least for me). At that point, I have to use the magic keys, sync, remount ro and reboot.
    Please note that I use "crazy" dirty ratio settings by default:
    vm.dirty_background_ratio = 95
    vm.dirty_ratio = 95
    vm.dirty_writeback_centisecs = 15000
    which is done to save my SSD's remaining life as much as possible. I had no problems with 2.6.38.6-ck3 and those settings.
    I tried to lower them while testing 2.6.39 and the posted patches, but not even the default values would work.
    Thank you for the great work and effort.

    ReplyDelete
  38. Hey Neo2. Thanks for your testing and feedback. Those bugs should be fixed with the test3 patch. There appears to be one more bug though, and I've yet to find it.

    ReplyDelete
  39. I wonder if this is related to a mainline issue and you're just hitting it much easier on bfs?

    https://lkml.org/lkml/2011/5/10/454

    ReplyDelete
  40. test3 fixes the deluge issue.
    Thanks a lot.

    ReplyDelete
  41. No issues with virtualbox running test3 yet...

    ReplyDelete
  42. Well, for me it's broken again.

    ReplyDelete
  43. Frustration.

    Can you try building with slab instead of slub on test3?

    ReplyDelete
  44. Hi,

    I used test3 with SLAB.

    The JFS issue still exist:
    "There is an application with heavy I/O load.
    It is a recorder that receiving video stream from ip cameras (using Buffer I/O)...."

    ReplyDelete
  45. Thank you all for your testing and replies so far. Well this is all starting to seriously piss me off. As far as I can tell these problems started appearing after rc7, and I can't really isolate anything in particular, nor can I reproduce it locally. I may just have to pull 2.6.39-ck1 as a stable release and not support it till I can figure out wtf is wrong :(

    ReplyDelete
  46. For me, these work great:

    2.6.39-ck1 + test
    2.6.39-ck1 + test2

    while these will result in VirtualBox process getting stuck while starting a 64-bit Windows guest:

    2.6.39-rc7 + bfs 403
    2.6.39-ck1
    2.6.39-ck1 + test3

    ReplyDelete
  47. Okay thanks everyone. If you get a hung task, can you try and get the output of sysrq-t and sysrq-p for me please? They might be quite long so email me at kernel@kolivas.org .

    ReplyDelete
  48. test5 patch fixes the jfs issue....
    Great!!

    ReplyDelete
  49. The deluge with XFS issue happens again on test5 patch.

    Actually, I have two disks for deluge. One uses XFS, and the other one uses ext4. Only XFS has this issue.

    It seems that test2 path works fine for most people. Maybe the deluge with XFS issue is not caused by BFS.

    I'm building a kernel with SLAB to verify it.

    ReplyDelete
  50. Not working for me with test5, too. Only test2 worked so far.

    ReplyDelete
  51. Whoa, test5 already o.o. I just got another hung with chromium, i am not sure if this can be useful as i am still using test3 patch and SLUB but here is the sysrq(-t|-p) log http://pastebin.com/VPvM3PYf

    I have to reboot now so i will rebuild my kernel with SLAB + test5 to continue the tests.

    ReplyDelete
  52. Did some testing, test3 and test5 don't work. I still get the very same issues (except that with test5 the lockup appears sooner than test2/3).
    After testing test3 I was curious whether these issues would appear also with CFS or not.
    So I did some more testing and found out that sometimes my main ext4 FS exhibits some concurrency problems (leading to BUG and machine lockup), and I'm quite sure that it can be tracked down to the changes done to the FS in the 2.6.39 cycle.
    A lot of work has been done in order to maximize parallelism and delayed allocation with filesystems in general.
    What I did notice is that when I build my kernel with BFS, the lockups occour only on the XFS filesystem that contains some other data (my root and home are on ext4).
    I'm thinking that reverting either one or both of these two commits should partially fix or hide the issues (hopefully not breaking BFS code again).
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7eaceaccab5f40bbfda044629a6298616aeaed50
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b7ed78f56575074f29ec99d8984f347f6c99c914
    Basically, fsync and filesystem syncing have undergone substantial changes and rework, and maybe that is the root cause of all of the troubles.
    While CFS behaves quite well with these new implementations, BFS struggles (and I'm unable to tell why).
    But judging from the problems the ext4 FS is having, I guess this is not at all just BFS' fault...
    If you're wondering, I'm pulling all of my knowledge from here: http://kernelnewbies.org/LinuxChanges.
    I'm hoping in a very fast 2.6.39.1 as of now =)
    Hope to have helped a little.
    Keep up the good work ;)

    ReplyDelete
  53. Again thanks everyone for your testing.

    Hey Neo2. Thanks for your comments. You are, of course, right in saying the new block plugging code is responsible for this breakage. Having a task go to sleep that still is plugged is the major problem here and I'm trying to find a safe way to unplug it. The fact that BFS rapidly and easily reschedules something on another CPU is what's biting me here, and the subtleties of how best to work with the unplugging code without causing deadlocks or dereferences are failing me. The fact that testX fixes one filesystem while testY fixes another and testZ fixes yet another workload, suggests I'm still not tackling this correctly. All of this is compounded by the fact that I've never been able to reproduce these problems myself.

    So here's test6 :(
    bfs404-test6.patch

    ReplyDelete
  54. build with SLAB doesn't help to the deluge with XFS issue.

    Then I tried test6 and still has the issue.

    sysrq log:
    http://pastebin.com/1si2AA1u

    ReplyDelete
  55. As much as I hate to say this, I have to give up on 2.6.39 for now. I just don't have the time nor energy to fix this. I'm grateful for all your testing, but it's just going to have to go on hold and I'll have to support .38 kernels in the meantime until I have a revelation of some sort.

    ReplyDelete
  56. If you're having troubles reproducing - the only thing I've found which hangs is my build of Chromium. I've had days of uptime with BFS .404+BFQ so long as I don't run chrome. It hangs after a few minutes, unkillable even w/ kill -9. "pidof chrome" also hangs unkillable. Rest of the system appears unaffected until trying to shut down at which point everything hangs. No panics yet. I'll try debugging (having an unrelated issue building with symbols).

    No rush, I can live with 2.6.38 for now. :)

    ReplyDelete
  57. more information:
    About deluge with XFS issue.

    The machine is a UP process with preemption enabled.
    I observed that the kernel process--"flush" hanged first than it caused deluge dead.

    Maybe disable preemption is useful. I will try to find any information what I can figure out.

    ReplyDelete
  58. There was one more crazy thing to try. Just disable the flush plugging entirely.

    bfs404-test7.patch

    ReplyDelete
  59. Hi,

    Thank you for BFS.

    I wanted to ask whether anyone was building Ubuntu packages with the different test versions? I'm getting the strange hangs with 2.6.39-ck1, running on hardware. I'd love to test the different patches, but I have too much on my mind right now to rig up a build environment ...

    So are there any packages out there? Thanks!

    ReplyDelete
  60. Did anyone try the test7 patch?

    @krilli: no one is making ubuntu packages from the unstable test patches as far as I'm aware.

    ReplyDelete
  61. test7 http://pastebin.com/CPPjNngB

    ReplyDelete
  62. This one could tentatively be a winner. test1 and test2 left me with the same Chromium problems initially reported and test3-test7 didn't even allow me to get to my desktop. With test8, it's been 15 minutes so far without any noticeable fuckups. Hopefully this is the one and you can put this frustrating mess behind you once and for all soon.

    ReplyDelete
  63. Well that's a VERY reassuring sign, thanks! At least I have a postulated mechanism for what's going on now, but this needs more testing. The final version will be a little cheaper too, but I'll just wait till I get more people testing first. I must have jinxed myself by saying 2.6.39 seemed pretty good :s

    ReplyDelete
  64. Just installed a Windows Server 2008 guest in VirtualBox with test8. Couldn't get the installation to finish with test1-test7. This sure looks like a winner to me. Thanks Con!

    ReplyDelete