Tuesday 31 July 2012

BFS and -ck delays for linux-3.5.0

Once again I find myself writing a post saying there will be delays with the resync of BFS and -ck for the new linux kernel. This time the reason for most people would be a quite unexpected development. As you may have read on this blog last year, I got invited to interview with Google for a job as a software engineer and then in the end I got turned down due to lack of adequate breadth of knowledge. This was probably for the best for me anyway since I have a full time unrelated career and the jump would have been too great. Anyway a small company noticed the work I had done on cgminer with bitcoin and openCL work and asked if I was interested in writing some software for them. The work involves writing openCL frameworks so they can provide distributed computing capability to clients. They were quire happy to forego any of the regular interview details or pretty much anything that is normally involved in employing someone and before long we started talking contracts instead. Since the work itself actually looked like a lot of fun, I decided to go with the opportunity.

Anyway, long story short, I'm doing a little bit of contract work for them and my kernel work will take a slightly lower  priority in the meantime. I'm not abandoning it, but it will be delayed some more before the next release. Apologies for any inconvenience this may cause in the interim.

34 comments:

  1. Well, as long as the "Massive Power Regression" (tm) in 3.5 doesn't get fixed I don't need it anyway...

    ReplyDelete
    Replies
    1. FYI: http://lists.freedesktop.org/archives/dri-devel/2012-July/025628.html

      Delete
  2. have you seen this? I just followed the thread.

    http://lists.freedesktop.org/archives/dri-devel/2012-July/025718.html

    ReplyDelete
    Replies
    1. 6.5W recovered? Looks like they've found the culprit.

      Delete
  3. Yes. It's no point to switch to 3.5 before this known bug be fixed.

    And have fun in your contract work, ck.

    ReplyDelete
  4. Thanks for the post, CK. Was starting to worry about you ;) We're all here to test when you're ready.

    @Alfred - I disagree with your assertion that CK should wait. Even though 3.5.0 has a regression, it is likely to be fixed quickly in 3.5.1 or the like. Better to get the drop on the 3.5.x tree sooner than later.

    ReplyDelete
    Replies
    1. And the bug has been found now anyway.

      Delete
  5. Thanks for the update, CK. Have fun on the contract work. :)

    ReplyDelete
  6. no hurries, no worries.

    ReplyDelete
  7. no risk, no fun.

    ReplyDelete
  8. My simple port of BFS 424 to Linux 3.5 and 3.5.1. No guarantees.

    http://www.file-upload.net/download-4655945/3.5-sched-bfs-424.patch.html

    ReplyDelete
  9. Just my simple port of BFS 424 to Linux 3.5 and 3.5.1. No guarantees.

    http://www.file-upload.net/download-4655945/3.5-sched-bfs-424.patch.html

    ReplyDelete
    Replies
    1. Have you tried this patch yourself? This should not work. The error will be the same as applying patch-3.4-ck3 to kernel 3.4.6 or above. (I am not sure. I have not tried)

      Delete
    2. Also, after applying your patch, there is the following error when compiling:
      kernel/sched/bfs.c: In function 「sd_init_ALLNODES」:
      kernel/sched/bfs.c:6233:1: Error: 「SD_ALLNODES_INIT」 undeclared (first use in this function)
      kernel/sched/bfs.c:6233:1: Note: each undeclared identifier is reported only once for each function it appears in
      kernel/sched/bfs.c: In function 「sd_init_NODE」:
      kernel/sched/bfs.c:6234:1: Error: 「SD_NODE_INIT」 undeclared (first use in this function)

      Delete
    3. I'm running this patch for a few hours on top of Linux 3.5.1 without any problems, but contrary to you I have CONFIG_NUMA unset since it is not needed on my machine.

      Delete
    4. Please make a diff before doing the porting work. This is very important. Scheduler domain for ALLNODES option has been deprecated with mainline. So, I have fixed this on the RIFS releases

      Delete
  10. Second try, this time with CONFIG_NUMA=y hopefully correctly taken into account. Compiled with CONFIG_NUMA=y, CONFIG_NUMA_EMU=y and booted on a non-NUMA machine. No problems here so far. And no guarantees for you.

    http://www.file-upload.net/download-4659064/3.5-sched-bfs-424-sid-2.patch.html

    ReplyDelete
    Replies
    1. Can it compile with CONFIG_NO_HZ? CONFIG_NO_HZ cannot be enabled since kernel 3.4.6. Thanks.

      Delete
    2. At least it works here.

      Delete
    3. Thank you very much! It really works!

      Delete
    4. Thank you very much for testing and reporting back. Nice to hear that it works for you.

      Delete
    5. Thank you very much for testing and reporting back. Nice to hear that it works for you.

      Delete
    6. It works for me as well (with CONFIG_NO_HZ). Thanks _sid_!

      Delete
    7. This patch breaks compilation with BFS disabled.

      Delete
    8. @post-factum

      If you do not enable BFS, why do you use this patch?

      Delete
    9. @Kelvin

      No excuse — if there's an option to choose whether to enable BFS, kernel must compile in any case.

      Delete
    10. @post-factum:
      There's area for "staging". Whatever that should mean at all: These drivers are more unstable than BFS ever was.

      -ck has never been included into kernel. But kept stability over releases!!

      Private kernel patching goes onto own personal risk. See below.
      Using unauthorized patches for kernels that Con haven't checked would not benefit either.

      Maunel

      Delete
    11. @post-factum

      Thank you for letting me know. Here is an incremental patch that fixes this issue.

      http://www.file-upload.net/download-4666029/3.5-sched-bfs-424-_sid_-2-3.patch.html

      Delete
    12. @Maunel

      That's not the reason to seethe — I've just posted bugreport.

      @_sid_

      Thanks, will give it some try.

      Delete
  11. Con, are you completely distracted by contract work, now?

    I had been running patched to 3.4.8 for 25h and it hardlocked again, then.
    openSUSE 3.4.6 + BFS single + mm-drop_swap_cache_aggressively.patch + BFQ + inc. patches.
    3.4.7 went well for many many many days.

    I want to highly encourage you to improve your patch!

    Manuel Krause

    ReplyDelete
  12. @ Ralph Ulrich,
    your recently posted settings for BFS-kernels have up- and downsides.

    CONFIG_HZ_1000 works better for 1 CPU in favour of interactivity than _300.

    The following seems to be really only cosmetic:
    CONFIG_RCU_BOOST_PRIO=14
    (default=?)
    CONFIG_RCU_BOOST_DELAY=440
    (default=500)
    in my opinion.

    MPlayer lags in Sound vs. Video. If I revert, it's vice versa.
    These take up Con's considerations about interacivity versus throughput (AGP, PCI, etc.)

    Best regards,
    Manuel Krause


    You proposed @20120720:
    # CONFIG_NO_HZ is not set
    ...
    CONFIG_HZ_300=y
    # CONFIG_HZ_1000 is not set
    CONFIG_HZ=300
    ...
    # RCU Subsystem
    #
    CONFIG_TREE_PREEMPT_RCU=y
    CONFIG_PREEMPT_RCU=y
    CONFIG_RCU_FANOUT=64
    # CONFIG_RCU_FANOUT_EXACT is not set
    # CONFIG_TREE_RCU_TRACE is not set
    CONFIG_RCU_BOOST=y
    CONFIG_RCU_BOOST_PRIO=14
    CONFIG_RCU_BOOST_DELAY=440

    ReplyDelete
  13. But CONFIG_HZ_300 makes way better scores in worldcommuniygrid, howewer.

    ;-) Just to mention...

    Manuel

    ReplyDelete
  14. Hi Manuel,
    yes I have tested with coreduo. Only one cpu might not profit from my settings!

    CONFIG_RCU_BOOST_PRIO > 1 (default) is needed to not get time overflows with tools like htop.

    Manuel, to your previous observations, I also get this:
    linux-3.4.8 is buggy. linux-2.4.7 was much better with BFS!

    Ralph Ulrich

    ReplyDelete
  15. I used your patch for bfs kernel 3.5, and it works perfectly, is very good indeed, also put the locks and urw ukms and bld, and BFQ, compiled the kernel, and it works very fast, I used my atlon64 le-with 1GB of memory, under kde 4.9 on debian, my congratulations.

    ReplyDelete