Thursday, 11 December 2014

BFS 460, linux-3.18-ck1

Announcing a resync and update of BFS for linux-3.18

BFS by itself:

3.18-sched-bfs-460.patch

-ck branded linux-3.18-ck1 patches:

3.18-ck1 patches

Uncharacteristically I found time to resync up quickly for this latest stable linux release. There are no new BFS features, but there have been a number of changes to stay in sync with mainline. Apart from keeping up with the usual churn in new releases, of which there was a modest amount this time, a number of other low level changes were committed making this much less of a trivial resync so some caution is warranted before blindly updating.

Hilf Danton pointed out a bug in the yield_to code (thanks!) which is now fixed. Since almost nothing uses this code you probably won't notice anything. He also pointed out some other now outdated components in BFS which are also updated. The above_background_load function has also been removed since the VM tweaks in older -cks no longer exist to use it. 

More substantially, I've reworked the plugged I/O code to match mainline now, which I had been reluctant to touch previously because of the deadlocks the unlocking and relocking in the scheduler code path introduced when the the first plugged I/O code made its way into BFS needing iterations of fixes - watch for any I/O misbehaviour/stalls. There are some changes to how mainline responds to idle CPUs so watch for any unusual behaviour there.

Having said that I've been using it for a while and not noticed anything out of the ordinary, but please report back if there are any issues.

Enjoy!
お楽しみください

67 comments:

  1. Great job with bfs v460! Been ticking away on my WS 6 days now with no issues.

    ReplyDelete
  2. Thanks so far Con, using it with the ZEN Kernel on 3.18.1. No problem here.

    Btw. Merry Chrismas and a happy new year.

    ReplyDelete
    Replies
    1. Hi Con,

      I must do a revision of my altitude. I had no problem on my laptop running the ZEN kernel (maybe I don't have identified some quirks as a problem ;) ). But on my server running the "same" kernel I run into big trouble with new BFS. Copying data from my external esata/USB drive to my XFS Raid5, leads to an reproducible error. After approx. 10secs the systems stocks for seconds to minutes and becomes unusable. High top values of >15.
      Some dmesg output:
      INFO: task kworker/1:0:18 blocked for more than 480 seconds.
      [ 960.118057] Not tainted 3.18.2-zen-server+ #16
      [ 960.118060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 960.118065] kworker/1:0 D 0000000000000001 0 18 2 0x00000000
      [ 960.118140] Workqueue: xfs-data/md0 xfs_end_io [xfs]
      [ 960.118145] ffff88022714fc98 0000000000000046 00000000aad73000 000000000000bdf0
      [ 960.118152] 000000000000bde8 ffff8802270c5fe0 ffff880225ef6740 ffff880227139620
      [ 960.118158] ffff880225e46d28 ffff880227139620 ffff8801ee273ca8 ffff8801ee273c90
      [ 960.118164] Call Trace:
      [ 960.118179] [] schedule+0x24/0x60

      and so on.
      Hard reset of the computer was necessary. (And than the resync of the raid starts from scratch and needs 10 hours :( )

      After some testing:
      Working: Zen-Kernel 3.17.7 with BFS, Vanilla Kernel 3.18.2, Zen-Kernel 3.18.2 with CFS
      Not Working: Zen-Kernel 3.18.2 with BFS
      And surprise: Working Zen-Kernel 3.18.2 with your bfs460-locked-pluggedio.patch

      Regards sysitos

      Delete
    2. Hi Con,

      another issue with the BFS for 3.18.
      Using BFS on an old CPU (Pentium M) with 32bit kernel 3.18 leads to a kernel panic during bootup (btw. debug preemptible kernel isn't set)

      Working: CFS and 3.18
      Working too: BFS with 3.17

      Regards sysitos

      Delete
    3. It seems like if -ck patch doesn't work on single core, single thread, 32 bit CPU.
      I have this problem on a via C3 CPU (post 22 jan 2015).
      Thanks

      Delete
    4. I found that SMP kernel options have to be enabled to avoid the idle task panic. See 3.19 comments. -jwh

      Delete
  3. Great job. 3.18.1 with ck1 is rock solid.

    ReplyDelete
  4. Hi Con.

    I think I've encountered a problem with your rework of the plugged I/O. I use btrfs and while I was doing scrub, which is pretty intensive I/O task, I've got a kernel oops. I've put the kernel log here: http://pastebin.com/xbvaia9a

    With kernel 3.17.7 everything is smooth. The same with kernel 3.18.1 and the CFS scheduler.

    Thanks
    Grzegorz

    ReplyDelete
    Replies
    1. I just tested btrfs scrub on my system after reading this, and it froze my system with 3.18.1 and BFS after a few seconds. I didn't test with vanilla 3.18.1 kernel yet, but I'm sure I was able to run scrub on 3.17.* kernel with BFS without any issues.

      Other than that I haven't had any issues with BFS on 3.18.1 kernel after 5 days of running it on my laptop.

      Thanks again ck for your work.

      Delete
    2. Thanks for that. I'll try and get a patch that backs out the plugged I/O changes out soon for you to try.

      Delete
    3. Thanks Con! I ran btrfs scrub twice after applying your patch and it finished successfully. No deadlocks.

      Grzegorz

      Delete
    4. the patch seems to solve the issue. at least I was able to run scrub on my btrfs partitions without freezes this time :)

      thanks

      Delete
  5. Hi CK
    now ubutnu lastest lts use 3.13 kernel.
    can you release bfs with smt nice patch for 3.13

    Thanks i believe you should do.

    ReplyDelete
  6. @ post-factum: Any further news upon TuxOnIce, other than the publicly available perhaps?

    @ all: I wish a Happy and Successful New Year to ALL of you,
    Manuel Krause

    ReplyDelete
    Replies
    1. http://lists.tuxonice.net/pipermail/tuxonice-devel/2014-December/007525.html

      Delete
    2. Funny you. I asked for news, other than publicly available. Thanks. Manuel

      Delete
    3. Mmmh. Now there appeared TuxOnIce patches for 3.19-rc6 and 3.18.5, but at least the 3.18.5 version is so unreliable, that I can't recommend it for now. I've only had 3 successful resumes of approx. 15 attempts (also trying different TuxOnIce settings) and the successful resumes only occurred with low memory load, not depending on a changed setting. :-(

      Best regards,
      Manuel

      Delete
  7. BFS is this CPU scheduler, on here.

    BFQ is a disk I/O scheduler, that you can find there:
    http://algo.ing.unimo.it/people/paolo/disk_sched/
    https://groups.google.com/forum/?fromgroups=#!forum/bfq-iosched

    Manuel

    ReplyDelete
  8. There seems to be a bug that causes plasma-desktop to fail to start correctly with 3.18.1-ck1.

    ReplyDelete
  9. FYI, I'm getting the following panic message using 3.18.1-ck1 w/ BFS .460:
    Kernel panic - not syncing: Attempted to kill the idle task!
    I have been using ck1 and ck2 with respective BFS' since 3.15.x through 3.17.6, without issue.
    -jwh

    ReplyDelete
    Replies
    1. This is now a known problem when "debug preemptible kernel" is enabled in combination with SMT nice. Disabling the former will fix it.

      Delete
    2. Bertrand Vieille22 January 2015 at 03:49

      I have the same problem :
      The system is a VIA-C3 CPU, 32 bits, one core, one thread, EPIA Motherborad.
      3-18-3 vanilla works fine : here is my kernel config : http://perso.crans.org/~bebert/ck/config-epia-nock
      3.18.3-ck1 crashes on boot (in the first seconds, just after Laoding Linux, BIOS Data check) : here is my config :
      http://perso.crans.org/~bebert/ck/config-epia-ck

      I have disabled "debug preemptible kernel" in both kernels.
      I have taken a picture of the screen with crash:
      http://perso.crans.org/~bebert/ck/CAM00199.jpg

      3.14.28-ck1 used to works well.

      I hope it can help...
      Thanks for all your work.

      BV

      Delete
    3. For completeness w/ others who may be searching...
      ------------
      I found that SMP kernel options have to be enabled to avoid the idle task panic. See 3.19 comments. -jwh

      Delete
  10. That is in reference to these last two matches for 'PREEMP' in my kernel config (which I've not changed, for your reference, versus 3.17.6, eg)?
    CONFIG_TREE_PREEMPT_RCU=y
    CONFIG_PREEMPT_RCU=y
    # CONFIG_PREEMPT_NONE is not set
    # CONFIG_PREEMPT_VOLUNTARY is not set
    CONFIG_PREEMPT=y
    CONFIG_PREEMPT_COUNT=y
    Or is 3.18 exercising something new in reference to this? While I'm here, much thanks for this kernel and BFS! I'm doing a test build of 3.17.7 right now, or I'd already be trying it out. :-)
    -jwh

    ReplyDelete
    Replies
    1. Whoops, I missed the 'debug' part in my kernel config search. So its this: http://cateee.net/lkddb/web-lkddb/DEBUG_PREEMPT.html
      ...however, CONFIG_DEBUG_PREEMPT isn't in my kernel config at all (or is that the problem?), and CONFIG_DEBUG_KERNEL isn't set; CONFIG_TRACE_IRQFLAGS_SUPPORT is 'y'.

      Delete
  11. Thanks con I appreciate your work!

    ReplyDelete
  12. Just noticed a kernel panic. My eth went down for a few min then came back up by itself.

    http://pastebin.com/ABk8vmQp

    ReplyDelete
    Replies
    1. This is not a kernel panic. This is WARN_ON().

      Delete
    2. Guess I should have read more closely.

      Delete
  13. [3.19]

    Porting to 3.19 at first appeared to be challenging, but in the end - I suppose the solution turned out to be quite elegant ;)

    https://github.com/kernelOfTruth/linux/commits/linux-3.19-BFS-460_3.18-to-3.19

    only a few minor changes were necessary:

    https://github.com/kernelOfTruth/linux/commit/51bad878c7245fe2cd803dcff38f3d8be93b1f73

    https://github.com/kernelOfTruth/linux/commit/36171598da5d82f6cc2f33870876cbe686908a52



    Not sure if it's placebo, Alfred Chen's additional patches (https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-3.18.y-gc) or commenting out of most of the cgroups & cpuset stuff (overhead):

    the desktop feels quite snappy :D

    Enjoy !

    ReplyDelete
    Replies
    1. It doesn't seem to be that easy to port. Following your approach, I got a very unreliable KDE desktop (mouse pointer lagging, Xserver crashing) when having 'a bit more' disk I/O including swap & /dev/shm. (CFS is doing well in a similarly configured setup.)

      I hope Con is aware of this to make a better release patch.

      Best regards,
      Manuel

      Delete
    2. Hm, did you try the branch with Alfred Chen's patches added on top ?

      perhaps that fixes lots of rough edges ?

      I had rsync backups, portage compilations (e.g. firefox) and others compilations in the background and everything is buttery smooth

      instabilities, lagginess, etc. related to swapping & /dev/shm

      seems to be related to be other things than solely the cpu scheduler (BFS)

      it's strange though that it's works fine with CFS

      can't really explain that

      also I don't have any additional time

      hope you guys figure this out =)

      Regards

      kernelOfTruth

      Delete
    3. Blah blah blah:

      what I *actually* wanted to write:

      it's at least working as well as on 3.17.8 with BFS and patches from Alfred Chen

      Delete
    4. Sorry, for making that noise.
      I had additionally adopted one old patch from 3.18 kernel related to my intel-gfx for the test, that apparently doesn't behave well on 3.19, (not only together with _your_ patches).

      Please, accept my apologies,
      Manuel

      Delete
    5. @Manuel:

      apologies accepted, no harm done

      Thanks for letting me know =)


      Currently I'm trying to figure out why the box locks up with Alfred Chen's ported BFS and not with my attempted port

      might be due to the fact that I added some additional patches that don't play well with ZFSonLinux and screw it up

      symptoms: it hardlocks as soon as X starts up and any serious work is attempted (e.g. git, chromium startup, etc. etc.)

      if I'm able to post a kernel lockup message or anything related or figure out that it is, indeed related to the port I'll post here

      other than that:

      Con, Alfred and all others involved to make BFS the best cpu scheduler for latency & desktop usage

      You guys rock !

      Thanks a lot !

      Delete
    6. just found out that my port has one BUG that makes it unusable for me:

      it hardlocks during attempt of a stage4 backup (tar & 7z)

      so - since I'm meanwhile also working on improving Alfred Chen's BFS port/version

      please defer to Alfred's BFS =)

      thanks

      Delete
  14. @post-factum: I've seen, that you've created 3.18-pf1. Thank you for your work!

    But why have you first imported and then reverted the official TuxOnIce patch for 3.18.x, to then patch with a "remote-tracking" git version of TuxOnIce? Please, can you explain your reasons for that?

    Best regards,
    Manuel Krause

    ReplyDelete
    Replies
    1. Addon: I would also be thankful if you could share your TuxOnIce related kernel and (maybe) userspace settings on here, as you've said, it is working well for you. For me it does not. :-(
      Thank you in advance,
      Manuel

      Delete
    2. > Please, can you explain your reasons for that?

      Merge conflict.

      > For me it does not. :-(

      Logs?

      Delete
    3. The sandboxed remote tracking version exactly matches the one derived by using the most recent 3.18.x patch from the TuxOnIce server (http://tuxonice.nigelcunningham.com.au/downloads/all/). I've reordered the 105 individual diffs by hand to prove this fact. So, ... "Merge conflict." cannot the complete truth.

      I had tested the 3.18.6 version of TuxOnIce and posted the results on here. When it was successful, I got logs of the success, of course. When it failed, it either failed in seeking to free memory, or saving atomic copy (after saving caches), or simply refused to load the saved image (swap). It simply hung in the middle of nowhere, and logs of these crashes were not available. Maybe I get time to retest with a freshly set-up 3.18.x soon.

      Most probably, you've also seen already, that the TuxOnIce for 3.19 highly differs from the 3.18 ("backported") version. Currently, I'm testing 3.19.0, for now without BFS/CK, and with BFQ with a slightly modified most recent patch from Nigel's server (original didn't apply cleanly). I'd need to test more, but it seems to work well. Also with my userspace settings (from 3.17.x):
      echo 1 > /sys/power/tuxonice/full_pageset2
      echo 1 > /sys/power/tuxonice/no_flusher_thread
      In kernel config I kept the checksumming pageset2 ON.

      Best regards,
      Manuel

      Delete
    4. > So, ... "Merge conflict." cannot the complete truth.

      Should I care?

      > In kernel config I kept the checksumming pageset2 ON.

      In 3.18 I did the same for my config.

      Delete
  15. My porting of 0460 to 3.19 is done during last weekend, the -gc branch kernel is up for 2+days, last night there are 2 fixes for issues found in -vrq branch been port-back to -gc and new -gc kernel has been up for 14+ hours till now.

    You can check and try my -gc branch with v3.19-gc tag at https://bitbucket.org/alfredchen/linux-gc/commits/tag/v3.19-gc

    I will write a detail changes for 3.19-gc later. Have fun with 3.19.

    BR Alfred

    ReplyDelete
    Replies
    1. thanks a lot !

      much appreciated

      Delete
    2. Really good work, indeed! Thank you very much.
      This port works very well together with BFQ-v7r7 and most recent TuxOnIce patch with kernel 3.19.0. Need to accumulate a bit more uptime, but so far, I don't face any issues (like reported here on 17 February 2015 at 10:14).
      Also, that hibernation with TuxOnIce works like a charm, makes me really lucky!

      BR Manuel Krause

      Delete
    3. Those who'd like to test 3.19-pf1 are welcome to git tree:

      https://github.com/pfactum/pf-kernel/tree/pf-3.19

      BTW, had to fix some BFS issue:

      https://github.com/pfactum/pf-kernel/commit/e6d91d4fe07406722e5ba14f3372d253d39cc15d

      https://github.com/pfactum/pf-kernel/commit/023e3065a8c1817bf8d86f2abc3296dd3bcf40fa

      Alfred? Con?

      Delete
    4. Thanks for point out these 2 issues.
      First one is confirmed. I don't have NUMA config, and don't notice when miss it.
      For the second, Em, as a funtoo user, I live happy without systemd, how it goes with CGROUP now? I think I can spend some time to look at it and make them not such dummy. :)

      And @pf, please consider drop the following commits you merged, they are kind of hard-coded and specify work good for CORE2 cpus

      1. Add XOR_PREFER_TEMPLATE to xor[v2].
      2. Use prefered raid6 gen function.

      Delete
    5. Hehehe, but they work well for CORE2 Cpus!!!
      Only.

      Manuel

      Delete
    6. Just push a fix to -gc branch, pls check the commit at https://bitbucket.org/alfredchen/linux-gc/commits/81196b0faa1ec127afd182cad2ac645ce9f3bad8?at=linux-3.19.y-gc

      Delete
    7. BTW, getting small oops on each boot:

      https://gist.github.com/6eca3bbfd39936689a97

      Delete
    8. I need to add a "Me, too." Haven't noticed it. I should check the full dmesg more often.
      Manuel

      Delete
    9. Any news on this, Alfred? At least, this WARNING doesn't result in any failures or irregular system's behaviour.

      @post-factum: Thank you for your engagement to improve TuxOnIce!

      Best regards,
      Manuel

      Delete
    10. Potential lockup fix:

      http://marc.info/?l=linux-kernel&m=141845621613624&w=2

      looks, like it also applied to BFS

      Delete
    11. superseded:

      https://github.com/kernelOfTruth/linux/commit/d2ec9015d103daa31995a39574090f38c380e9d2

      patch for BFS:

      https://github.com/kernelOfTruth/linux/commit/42f3f232dee65641f40d5b06b9e7cd0f07cd0310

      please check against: https://lkml.org/lkml/2015/1/22/465

      thanks

      Delete
    12. Sorry for the late reply, I'm out of town for CNY last week. 2 threads here, one at a time.

      The WARNING is introduced by the new added WARN_ONCE in set_task_cpu(), as CFS doesn't allow to set task's cpu while it is blocked. For BFS, it's no harm to set_task_cpu() and set_cpus_allowed_ptr() calls it for non-running tasks.

      This reminds 2 things:
      1. the gap between CFS and BFS, TASK_WAKING status seems not used in BFS, on_rq has different means in CFS and BFS.
      2. task's cpu is one of useful scheduling info to let us know which cpu the task is last run on, and this info is used to choose the best idle cpu for the task.

      IMO, the call to set_task_cpu() for non-running tasks in set_cpus_allowed_ptr is unnecessary as the allowed cpumask has been set and leave the last run cpu info there will cause no harm.
      So, my simple fix to this issue is remove set_task_cpu() calling in set_cpus_allowed_ptr(), and leave the WARN_ONCE in set_task_cpu(), but keep in mind that the condition may need to adjusted for BFS.

      Delete
    13. @kernelOfTruth
      Good Info. I'll like to try that patch. I'm fighting against a very weird boot-up issue related to preempt_schedule() or somehow recently.

      Delete
    14. @kernelOfTruth works for me, thanks.

      @Alfred Chen shouldn't we consider that to be locking issue as it happens only on SMP boot?

      Delete
    15. glad to be of help =)

      the commit needed an additional change to let it compile:

      https://github.com/kernelOfTruth/linux/commit/8abcf2fe2511b81b2b59ce28ebdb3388dc11d7e4

      so now there's no __cond_resched anymore in bfs.c

      Delete
  16. tell me anyone try tuxonice with kernel 3.19+bfs-ck or 3.18+bfs-ck in openssuse 13.2 and it work?
    i think kernel panic related to a bug in kernel 3.18 and 3.19 and not to bfs-ck.

    ReplyDelete
  17. Hi Nir,

    used the 3.19-pf kernel with tuxonice on opensuse tumbleweed, starts fine, but hibernating destroyed my ext4 superblock on my sdd. Thanks god (better to say Ted Tso ;-) ) it could be restored. Don't know, if it was accident or luck, but will not trying it again.

    Regards Mike.

    ReplyDelete
    Replies
    1. That's no good news! :-(
      I'm using a (self-)modified TuxOnIce with Alfred's 3.19.y-gc patches and BFQ v7r7 for several days now, without any issue. I don't use SSDs.

      Good luck,
      Manuel Krause

      Delete
    2. on opensuse 13.1

      Delete
    3. During the last days I've made up a test row of 27 (so far, and running) hibernates/resumes with this current kernel to possibly unreveal a BUG in kscreenlocker_greet. There had been no issues with the kernel (except for with kscreenlocker_greet or with the WARNING reported above.)

      Manuel

      Delete
    4. Some years ago, I've also tried the pf-kernel; or then after failure, adding the uksm-patches manually to the same base setup. They leaded to memory errors for me, in those days. I think you can phase them out in kernel config, also in the pf-kernel. If you'd leave out uksm, you'd most probably get the same kernel setup as I have.

      Manuel

      Delete