Tuesday, 6 May 2014

BFS 0.447, 3.14-ck1

Announcing a resync and update of BFS for linux kernel 3.14.x:

This is mainly a resync from BFS 0.446, but with the addition of the patches as offered by the generous users as seen in the comments here, Alfred Chen and Oleksandr Natalenko. The changes are to fix a circular locking issue on bootup that rarely hit some people, a fix for kvm soft lockups in SMP mode, and to remove some config options that should not be used with BFS.
What's interesting about working on this latest BFS is that I ran into all sorts of instability due to the new kernel that ironically worked out to be a very serious bug in 3.14.0 and was fixed in 3.14.1 with this patch:
 
commit 8e58cd80d042569da7af501de897c5e0538d99b0
futex: avoid race between requeue and wake
As is often the case, BFS is exceptional at bringing out race conditions and my machine was almost unusable with any significantly multithreaded application such as firefox which kept hanging. This was a scenario where my delay at syncing up the code worked to my advantage as 3.14.2 is working fine.
So here it is:
BFS by itself:
3.14-sched-bfs-447.patch

CK branded BFS:
3.14-ck1

Somehow I still forgot to include PF's patch for uniprocessor builds, though it's so uncommon to come across a uniprocessor these days! His patch is still valid and can be grabbed here to be applied on top if you need it:

0001-ck-3.12-fix-BFS-compiling-with-CONFIG_SMP-n.patch


Enjoy!
お楽しみください

76 comments:

  1. kernelOfTruth6 May 2014 21:30

    another reason why the kernel devs & Linus could give BFS a run from time to time :D

    thanks Con !

    your work and of the other devs (Alfred Chen and Oleksandr Natalenko) is very much appreciated =)

    ReplyDelete
  2. So that's what my Firefox hanging was!

    And I think that's the reason why I'm bumping my kernel version since the last major hang 20 minutes ago.

    ReplyDelete
  3. Nice, thanks. Will test it ASAP.

    Would like to know if my machine hangs again after hibernation. Will report anyway.

    ReplyDelete
    Replies
    1. It stucked with ksoftirqd/0 consuming 100% of CPU. But I suspect it's not BFS fault but TOI fault. Will test more.

      Delete
    2. For me it also takes a long time when resuming from suspend-to-disk, but it works. Without CK/BFS it wouldn't take much more time (though, I haven't measured the diff). I don't use TOI. The most prominent culprit might be Firefox that consumes and hogs RAM more than ever. I actually use the ESR 24.5.0, but the previous ones were remarkably more calm in these terms. And another one, but it's history now, the first 3.12 kernels never acted that slow.

      Regards, and -- my many thanks for the new BFS/CK to Con himself ! -- Manuel

      Delete
  4. Hi Con,

    First of all, thanks for all your hard work in bfs and ck-patchset. I've been using it for years and it's really beneficial with all the realtime audio applications I'm using.

    However, ever since 3.13 (and now with 3.14) kernel with ck patches I've had an issue where my system hangs after waking up from suspend/hibernation. It seems like the system isn't completely frozen, as it tries to connect to wireless network (although it fails to do so) according to gnome's network indicator, but keyboard, mouse and everything else is completely unresponsive.

    I'd like to see this issue resolved, but I really don't know where to look for clues, so I was hoping someone could point me to right direction :)

    I'm using graysky's prebuilt ck-kernel packages for arch linux, which also includes bfq io-scheduler on Asus X501A laptop. I can easily rebuild the kernel without bfq or change the kernel config if that's something I should do.

    Thanks in advance,

    ReplyDelete
    Replies
    1. Are you 100% sure it's con's patches on this? I've been having an issue with only on my sandy bridge laptop (Thinkpad T520). Same symptoms coming out of suspend, I also get the same kind of a freeze about once a week if I just let it sit. Switching to a vanilla kernel didn't help.

      Note, the suspend crash sometimes happens after a minute or two of normal responsiveness, so it can connect to wifi sometimes before everything freezes. Screen stays up, can't move mouse, wifi light stays flashing, nothing in logs upon reboot (I'm assuming it can't write).

      I intend to try the 3.10-x vanilla kernel to see if I still have issues. My nearly identical setup arrandale-based Thinkpad X201 has never had any of the crash/suspend issues.

      Delete
    2. I'm also having trouble with this. After resuming either from S3 or S4, a process named ksoftirqd/0 apparently gets overwhelmed with IRQs, using up one core completely. Apparently it's some issue with the wifi card since the problem disappears when this device is disabled. Also, when the problem does actually occur, it seems some modules are incorrectly reactivated. Every time a process or myself try to write anything to disk, the computer freezes.

      The problem is not present in the vanilla kernel, where I can suspend/hibernate and then resume normally.

      Delete
    3. Yes, this is quite similar to what I saw.

      Delete
    4. Sorry I actually had to reinstall my system for reasons unrelated to ck/bfs, and didn't have time to investigate this further until now.

      Looks like my suspend issues are caused by ath9k wireless module. Running 3.14.4-ck1+bfq+arch-linux-patches. If I modprobe -r ath9k, it looks like I can suspend and resume without any issues. If I modprobe ath9k after resuming from suspend, my system freezes again, just as it does when resuming with the ath9k modules loaded.

      Delete
    5. Can confirm the problem with hibernation and BFS on ath9k. Suspend is working fine, but hibernation leads to an CPU soft hung, further work is nearly impossible. Switching the scheduler to CFS resolves the problem. Tried it with different kernels (plain with ck, zen with additionals) always the same problem. The ksoftirqd process stucks.

      Con, if I could help to resolve the problem, let me know.

      Thanks.
      cu sysitos

      Delete
    6. Really, I'm glad to find out I'm not alone. Probably, things could be raped in ath9k drivers and not in BFS.

      But.

      ➜ pf-kernel git:(pf-3.14) git shortlog --no-merges v3.13..v3.14 | grep ath9 | wc -l
      135

      135 commits related to ath9k in between 3.13 and 3.14. Something definitely could be broken.

      Delete
    7. Thanks. If I had time I'd scour through those commits to see if there's something that gives a hint as to where the problem lies.

      Delete
    8. Yes, its definitive some problem in the collaboration between ath9k und BFS. Blacklisting the ath9k module and the wake up after hibernation does work fine. Loading than the ath9k module and the soft hung of the CPU starts.

      Btw. replaced my Intel WLAN mPCI card in my DELL Vostro with the Atheros to resolve the loosy throughput and breaks in the WLAN connection ;)

      CU sysitos

      Delete
  5. Thank you again for all your work

    ReplyDelete
  6. Thanks for the new release. I compiled a new kernel earlier this evening: Gentoo sources 3.14.3 with the CK patchset, BFQ, and UKSM patches. I decided to update my Mesa installation, and the system locked up while LTO linking Mesa. I'm on a an older, single-core system, and I didn't have many applications open. Basically just Konsole with GCC running, and the system monitor open showing LTO using 43% of the CPU. So something is still causing system hangs.

    ReplyDelete
  7. Thanks Con, 447 working fine here on 3.14.3. So far no issues.

    On an unrelated note to those using teh nvidia binary driver: I had to add the following patch to make 334.21 compile for 3.14.3: http://pastebin.com/UgnyrrH5

    ReplyDelete
  8. I'm also getting the system to hang on BFS 447 with using G+ Hangouts video calls and just oding regular web surfing in the background. It may not be the only hang scenario either.

    ReplyDelete
    Replies
    1. Well, caught similar hang with no visible reason :(. No logs available, just freezed machine.

      Delete
  9. If you don't mind, I'd like to ask another more-or-less OT question on here:
    Does someone of you have found a setting to ease resuming from swap/hibernation?
    Are there kernel config options that are critically involved?
    My original problem was that after resuming from hibernation it took about 10+ minutes to get a firefox with "too" many open tabs (100+) being actively usable again. And as a nasty side note: this is longer than a freshly started FF needs to read them from the web.

    By coincidence I've found something that was able to reduce this time to approx. 5 minutes: Setting /proc/sys/vm/page-cluster to 5 (default is 3 on here) what is said to logarithmically increase the readahead from swap.

    From 10+ down to 5 minutes is a good result, but I haven't faced this issue until mid 3.12.x kernels, but the cause can also be a different memory or online/offline management within the firefox releases in the meantime?

    Someone any ideas? Thank you for sharing them,
    Manuel

    ReplyDelete
    Replies
    1. I'm so sorry for having bothered you with this one.
      Now I've tried tuxonice as a kind of "workaround" -- but with the result that it's working much much better and faster than the default kernel hibernate/ suspend-to-disk + resume.

      I don't know what the kernel does wrong, that after resuming from disk firefox needs ages to assumingly (re-)read all its allocated memory causing it to be unresponsive until its swapin I/O is done.
      With tuxonice firefox is responsive almost at once after resume, so that I can recommend it without any doubt: It's, in fact due to its speed, even a real alternative to sleep/ suspend-to-ram in my opinion.

      tuxonice currently works on here on top of ck and bfq-v7r4 with 3.14.4 vanilla.

      Regards, Manuel

      Delete
  10. Hello all,
    Using the ck kernel from the repo-ck repository under Arch Linux, I noticed my wireless connection keeps on crashing every few hours.
    At times, it will resume on its own after a few seconds and at times a restart is needed.
    This does not happen on regular Arch kernel.
    dmesg logs and a more comprehensive description can be found on the following Arch forums post:
    https://bbs.archlinux.org/viewtopic.php?pid=1415693

    Thanks, Adam

    ReplyDelete
  11. Greetings.

    I gave BFS another try (had to abandon it due to it causing deadlocks in FFmpeg during movies transcoding), but this time, BFS causes Blender (tested with v2.68 and v2.70) to crash while exporting DAE meshes...

    Back to the vanilla kernel for me !

    ReplyDelete
    Replies
    1. Please give us more information about your running kernel version. Maybe, even post your .config to a hoster for later review.
      I assume, you've already checked, that an earlier revision of ffmpeg does NOT bother you?!

      Manuel

      Delete
    2. I am running kernel v3.14, obviously... v3.14.4 to be precise. The kernel config is pretty irrelevant (same issue on 3 different computers with different hardware/config).
      The problem I was reporting here was about Blender.
      I reported on this blog last year about encoding issues (was with mencoder, but also got the same kind of issue (deadlocks) with FFmpeg: and yes, I did try with several versions: IIRC with 0.99.8, 1.0.7 and v1.2.1).
      My guess is that there's a race condition somewhere in the BFS-patched kernel (i.e. it could be a problem with a kernel driver, for example... with ext4fs perhaps ?...).

      Delete
    3. Just for the crack of it I fired up Blender and exported some small scene to DAE without any problem. (Gosh, they make even the export dialog unbelievably complicated, not to mention the rest of the program. No wonder i can't use it :p)
      Anyway, that result was to be expected since application programs do not interact with cpu schedulers. if anything, BFS exposes races in OP's video driver, but they shouldn't be unique to Blender.

      Delete
    4. Keeping saying that the problems with BFS are just the result of it "exposing race conditions in other software" is not very constructive, especially when BFS renders the system unusable because of such problems.
      I'd expect more serious investigation of those problems by BFS' author, instead of systematically calling the fault on others' work without any proof of such faults.
      Each time I have been testing a BFS-enabled kernel, I ran into issues within hours or at most a couple of days: it should not be hard
      to reproduce these issues...
      In these conditions, it's not a big surprise that BFS didn't yet make its way (even as a "staging" feature) in the official Linux kernel...

      Delete
    5. Thanks for your carefully thought out comments. I picked one bug in the last release which was exposed by BFS. I made absolutely no such claims with any of the other problems people here are having, but I know how easily people can just lash out given my history and then say things like "systematically calling the fault on others' work"... Look hard and you will find no such systematic trashing here. I continue to maintain BFS in my own time as purely a fun project that some people find useful. If you expect more serious investigation I suggest you look to enterprise supported projects, not some random bit of code an anaesthetist hacked together in his spare time.

      Delete
    6. All i can say is that I tried to reproduce anonymous' alleged issue but could not. Since he doesn't provide any helpful information like stack traces, and nobody else has ever observed this alleged flaky behaviour i can only assume issues specific to his installations.

      Delete
    7. > In these conditions, it's not a big surprise that BFS didn't yet make its way (even as a "staging" feature) in the official Linux kernel...

      Because people still keep bringing this up: it never will. BFS doesn't scale too good in systems with dozens, let alone hundreds of cores and upstream isn't interested in maintaining multiple schedulers (same as why BFQ isn't mainlined - they want its features merged with CFQ, not put on its side).

      This is simply for maintenance and usability reasons.

      Delete
  12. Hi I'm trying to run the CK1 patchset on vanilla 3.14.4 Arch distro on a Samsung NP535 laptop (AMD Family 15h/Piledriver/BDVER2), but although the initramfs works just fine, I experience a race condition and lockup at login to TTY1.

    I've both compiled the kernel locally and used the precompiled packages in Graysky's repo-ck repo, using both generic and CPU specific kernel configurations and also tried the 3.13.11 kernel. I posted more details on the Arch forum here.

    Any ideas or what further info would be required?

    ReplyDelete
    Replies
    1. OK, problem turns out to be a conflict with my wireless kernel module, specifically ath9k which after blacklisting, everything CK/BFS/BFQ flavored runs just fine (but I'm unwirelessed, so not so fine)

      Delete
  13. Hi, I used my notebook last two day with 3.14.4 with 0447 and notice a huge regression. The visible impact is intel wifi driver crash when active for a while and error msg flush all over the dmesg. I rollback to 3.14.3 with 0447 and it became a litter better, it stands longer but still crashed. Then I rollback to 3.14.2 with *0446*(which I ported in the 0446 thread), system become stable and intel wifi drivers stands good to let me finish 2 TV plays.
    I check the change list from 3.14.2 to 3.14.4, from 3.14.2 to 3.14.3, I change from my 0446 bfs port to 0447 and there is intel driver file modification during 3.14.3 to 3.14.4. So l have a suspicion that 0447 may have some issue and 3.14.4 intel driver changes make it worse.
    For further verification, I will rebase my *0446* port to 3.14.3 and 3.14.4 and test it tonight, will post back.

    ReplyDelete
    Replies
    1. PS, another machine I used with 3.14.4 and 0447 is running fine.

      Delete
    2. @Alfred, I wonder if 3.14.5 will play better: some timer tick patches in the stable-queue actually are.

      runs all smoothly with my old apple mini core2duo. Very thanx to the whole supporter team of bfs!

      Greetings from suddenly hot summer Hamburg,
      Ralph Ulrich

      Delete
    3. Hi, here comes the test result last night.

      #1 kernel 3.14.2 with my 0446 bfs port has intel wifi driver crash while I compiling new kernel. It is identified as issue https://bugzilla.redhat.com/show_bug.cgi?id=1046495 , which is fixed in 3.14.4 mainline kernel.

      #2 kernel 3.14.4 with my 0446 bfs port, 4 test runs: reboot machine, active wifi and TV plays > 30mins. All passed.

      So if you want to help test and see if it solve your bfs issue, pls check my 0446 bfs port for 3.14.4 at https://bitbucket.org/alfredchen/linux-gc/commits/d463c14ca74aa93049c7135bbc6bfa7ef7201cfe/raw/ , ps, it's not a patch upon bfs 0447, it is a replacement patch.

      Or to be simple, you can download the kernel source code from https://bitbucket.org/alfredchen/linux-gc/get/linux-3.14.y-gc-test.tar.bz2 and test with your kernel config.

      I will look into the the delta of my 0446 port to 0447, and try make a patch upon bfs 0447.

      Delete
    4. Thanks Alfred, keep us informed of what you find.

      Delete
    5. I checked the delta in the weekend, just very minor difference in syscall of scheduler get/set attribute function and I don't think it is likely causing hang/crash issues.

      So I recompiled 3.14.4 with 0447 bfs and test again on my notebook. It turns out that system is stable as expected. My best guess is I must installed a wrong kernel image version to boot parition lastweek as 3.14.4, I have over-write it during test and can't check which actual version it is.

      In sum, my issue is caused by intel iwlwifi driver bug and it has been fixed in 3.14.4 mainline. It is *NOT BFS RELATED*, but it seems it get worse in 3.14.3 than it is in 3.14.2, which make me guess it is related to bfs b/c I upgrade bfs from 0446 to 04477 at that time too.

      Delete
  14. I forgot that things could be this speedy. Good work.

    ReplyDelete
  15. 3:14 ck1 kernel crashes all running programs in wine or any task that requires more processor as compliar programs, drivers, or video conversion and 3d rendering the kernel without bfs usually works with the kernel pf the symptom is the same, it crashes latch with wine or similar tasks.

    ReplyDelete
  16. Here is my attempt to port BFS to 3.15 kernel: https://gist.github.com/921853bb3e926e3fe5d1

    git tree: https://github.com/pfactum/pf-kernel/commits/pf-3.15

    Boots OK in QEMU, will test on real hardware a little bit later.

    ReplyDelete
    Replies
    1. Ah crap here we go again :P

      Delete
    2. Feel free to make my crappy port less crappy :D.

      Delete
    3. LOL I wasn't complaining about your port, just that I have to resync again.

      Delete
    4. Don't forget about my small extra patches, please.

      Delete
    5. Please point them out so I can consciously forget them again.

      Delete
    6. One option is to keep the patch in sync with linux-next

      Delete
  17. I guess the only patch left unmerged is uniprocessor-related fix here: https://gist.githubusercontent.com/pfactum/9332896/raw/0001-ck-3.12-fix-BFS-compiling-with-CONFIG_SMP-n.patch

    It seems that you've already merged other patches into 447.

    ReplyDelete
  18. My port of .447 to 3.15 is done. There are "usual scheduler improvement" in the core.c file, I also sync-up these changes in bfs.c.

    There are addition 3 patches recently I wrote based on .447

    #1 [BFS] Add missing attr.sched_flags for sched_getattr.
    Which I found the minor delta when debugging my issue in .447 on 3.14.

    https://bitbucket.org/alfredchen/linux-gc/commits/7c64a1257978efef73271a368ae3a69f4f6a1c51

    #2 [BFS] Refine locality to ranking and siblings/cache idle code.
    One thing I am trying to make some bfs improvement.
    https://bitbucket.org/alfredchen/linux-gc/commits/d38a9fd9f97b953ab8a00bc574a9dd23c303300d

    #3 [BFS] locality doesn't need to be kmalloc.
    An other thing.
    https://bitbucket.org/alfredchen/linux-gc/commits/acf66e57edd858a11742a7305b51aaa2b0e9b61c

    Ports and patches are tested in my working machine, no noticeable regression found by compiling the kernels. Suspend/Resume works.

    But there is a huge delay before system goes to suspend for the first time(by changes?), and I am still trying to finger out is it a HW setup/ kernel version related issue or not.

    ReplyDelete
    Replies
    1. Could you please post full patch?

      Delete
    2. OK, got it from your git tree.

      I'd suggest you to merge two extra patches: 1) SMP=n fix (mentioned above) 2) this one: https://github.com/pfactum/pf-kernel/commit/74cdef988172bd09abe664323a0890cf417eabba

      Delete
    3. My git tree is: https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-3.15.y-gc

      You can pick up the commits you are interested in.

      @post-factum
      1) patch is merged. 2) Thanks for point it out. I can't find a raw format of your commit in github, so I simply change the code and commit it in my git.

      PS, Happy World Cup time! :)

      Delete
    4. @Alfred Chen:
      I really like your repositories, as they easily provide several more useful and senseful extra patches in one source, this is already meant for 3.14.y-gc -- My assumption is that mostly your own patches (meaning not the well known CK/BFS & BFQ) do stabilize my system especially when dealing with hibernation/resume.

      Thank you very much for your work !!!
      Manuel

      Delete
    5. @Manuel
      Thanks for checking my git. Like the pf-kernel, I just pull some well known non-mainlined patches(bfs, bfq, phc, compiler options) in my git and I wrote little patches to improve boot-up time for my machines and trying to improve bfs recently. I am not sure whether my own patches contribute to stabilize your system, as I don't have hibernation setup. It may also comes from linux-stable kernel tree updates, I used to keep my -gc branch sync-up with linux-stable tree very 1 or 2 minor release before the next stable kernel release.

      Anyway, I am glad to know it helps for you. Thank you.

      Delete
  19. Also here is proposed fix for ARM platform from Ivan Shapovalov:

    https://github.com/pfactum/pf-kernel/commit/80f5dbe7c76aef3c7cb05c381c803eee8024f6b9

    ReplyDelete
  20. An update regarding ath9k issue: http://lists.tuxonice.net/pipermail/tuxonice-devel/2014-June/007501.html

    ReplyDelete
  21. Another small fix from me to avoid compiling error with CONFIG_DEBUG_ATOMIC_SLEEP=y

    https://github.com/pfactum/pf-kernel/commit/b5e2f75f42061a12ab074b5ed87c322c553ad9c1

    ReplyDelete
  22. One more update about deadlocks:

    http://lists.tuxonice.net/pipermail/tuxonice-devel/2014-June/007502.html

    ReplyDelete
  23. @ ac + pf
    thx for your work

    ReplyDelete
  24. It would be great to have BFS, fixes and patches contributed/suggested by AC and PF in a single patchset. Btw, I'd like to hear Con's thought on AC's patches.

    ReplyDelete
    Replies
    1. You can have a look at post-factums kernel here: https://pf.natalenko.name/
      and see a summary of what's included. There's also a link to his related forum with announcements upon new releases, reports, etc. To this -pf patch, you can add the small number of specialized patches from Alfred Chen's repo,.

      As Alfred Chen's repository clearly shows the entries (that can be reviewed and downloaded separately) and he omits UKSM, that does break my hibernate somehow, I like his approach more. (I'm also unable to save separate commits from post-factums github.)

      That CK/BFS, BFQ, and TuxOnIce, too, are essentially useful for a responsive Linux system, should already be known.

      Best regards, Manuel

      Delete
    2. Thanks for your reply, though I'm more interested in a BFS-only patchset, with the addition of those fixes Con often forgets to include :-) and AC's improvements.

      Delete
    3. I think my bfs-v447b-for-3.15.patch has the latest and best for BFS on 3.15, the finds by post-factum and AC's improvements, all in one patch

      https://github.com/apollinaris/random-patches

      Delete
    4. Thanks, for providing this one.
      But, please, can you transparently name _all_ particular patches _in_more_detail_ that went into your compilation?! Perhaps even on the github front page?
      I won't patch my kernel unless I know the source is trustworthy and this compilation patch may eventually conflict with already applied ones.
      Manuel

      Delete
    5. OK Manuel, go check Apollinaris again, better readme, patches broken out in a subdirectory
      Tony

      Delete
    6. @Tony /apollinaris:
      Thank you very much for your additional work! Now it's fine and more clear to all of us what you mean!
      Manuel

      Delete
  25. Can I extract the ck1 patch from AC's github so I can apply it to the vanilla upstream 3.15?

    ReplyDelete
  26. Sorry, currently travelling...

    ReplyDelete
  27. BTW, don't forget to also check the new BFQ release v7r5:

    [ANNOUNCE] BFQ-v7r5 for 3.13.0-3.15.0: https://groups.google.com/forum/?fromgroups=#!topic/bfq-iosched/VT96u5pbDLo
    [ANNOUNCE] BFQ-v7r5 for 3.0.0-3.12.0, plus 3.10.8+: https://groups.google.com/forum/?fromgroups=#!topic/bfq-iosched/n_CqETwVl9w

    Patches per kernel as usual in:
    http://algo.ing.unimo.it/people/paolo/disk_sched/patches/?C=M;O=D

    Have fun, Manuel

    ReplyDelete
  28. I noticed that most of times tasks take a little while before run, anyway it seems faster for destktop tasks than standard scheduler. Patched kernel

    ReplyDelete
  29. It's too sad, that I'm not able to track down, why firefox' CPU usage triples within approx. 10h of it's uptime. Of course, I can eliminate even more open tabs or disable addons like ABP. But doing this would falsify the result.

    Maybe, this has nothing to do with ck/BFS at all,
    Manuel Krause

    ReplyDelete
  30. Online Corner is the best Platform to earn money online from internet, Just Join Now and start earnings
    OnlineCornerz.com

    ReplyDelete
  31. Get your website on google top 10 Results, Best Search Engine Optimization Company in Pakistan
    Contact Now
    Skype : Jobz.Corner
    www.jobzcorner.com

    ReplyDelete
  32. Business at home...??? want to join the best business without any work, just invest and rest
    www.earningsclub.com

    ReplyDelete
  33. Earning is only for you, just spend 1 hour daily and earn upto $35 Daily with just clicking job, Join Now
    adsclickearning.com

    ReplyDelete