Friday, 29 July 2016

BFS 472, linux-4.7-ck1

Announcing an updated BFS for linux-4.7 based kernels.

BFS by itself:
4.7-sched-bfs-472.patch

-ck branded linux-4.7-ck1 patches:
linux-4.7-ck1

This was quite a substantial merge effort this time around with a fair amount of changes in mainline kernel that affected the patch. Nonetheless everything appears to be working as planned in my limited testing. I'm unsure if the changes will fix the problems people had with suspend during the 4.6-bfs patches but the new code does touch that area. I was never affected on any of my machines so was unable to reproduce the problem in the first place.

In addition to the resync, a few minor changes have made their way into this release with respect to the way tasks preempt other tasks. See bfs470-updates.patch for details.

One other fairly significant change was properly hooking into the new schedutil parameters that drive cpufreq scaling governors. What I committed into bfs470 would not have been working properly in choosing the correct CPU frequency to run at and may have led to slowdowns and/or more power usage. This should be fixed in 472.

I should also mention that if, like me, you use the evil proprietary nvidia driver, the latest will not build with the current kernel and you'll need a couple of patches to get it working.

Enjoy!
お楽しみ下さい
-ck

EDIT: This patch will fix crashes when configured without SMT_NICE enabled:
bfs472-fix_set_task_cpu.patch
And will be applied to the next BFS release.

17 comments:

  1. Hi,CK,

    I think we need the following patch to avoid passing unexpected cpu value to set_task_cpu() when SMT_NICE is not defined.

    diff --git a/kernel/sched/bfs.c b/kernel/sched/bfs.c
    index 249cd0d..8a1fd2d 100644
    --- a/kernel/sched/bfs.c
    +++ b/kernel/sched/bfs.c
    @@ -1405,8 +1405,8 @@ static void try_preempt(struct task_struct *p, struct rq *this_rq)
    }

    if (likely(highest_prio_rq)) {
    -#ifdef CONFIG_SMT_NICE
    cpu = cpu_of(highest_prio_rq);
    +#ifdef CONFIG_SMT_NICE
    if (!smt_should_schedule(p, cpu))
    return;
    #endif

    BR Alfred

    ReplyDelete
    Replies
    1. Thanks Alfred, well spotted. That looks correct.

      Delete
    2. Wouldn't this be more efficient instead?

      --- linux-4.7.orig/kernel/sched/bfs.c
      +++ linux-4.7/kernel/sched/bfs.c
      @@ -1418,6 +1418,9 @@ static void try_preempt(struct task_stru
      * a different CPU set. This means waking tasks are
      * treated differently to rescheduling tasks.
      */
      +#ifndef CONFIG_SMT_NICE
      + cpu = cpu_of(highest_prio_rq);
      +#endif
      set_task_cpu(p, cpu);
      resched_curr(highest_prio_rq);
      }

      Delete
  2. That fixed a kernel panic for me thx Alfred

    ReplyDelete
  3. still can't sleep system.

    ReplyDelete
    Replies
    1. Can you tell which 4.6 minor version patch introduced that problem? If we can isolate which patch then I might be able to do something, but none of my systems are affected.

      Delete
  4. Hi Con,

    thanks for your work.
    Does your changes in 472 mean, that we could now use the new 'schedutil' cpufreq policy governor or should we stay with 'ondemand'. Or which one would you suggest?

    Thanks and Regards.
    sysitos

    ReplyDelete
    Replies
    1. Schedutil should work fine.

      Delete
    2. my system freezes completely if I switch to schedutils on 4.7-ck1.
      No issues with vanilla 4.7 kernel.

      Delete
    3. Eek. Looks like my pcs are all using the intel-pstate driver which is why I didn't notice. Oh well back to the drawing board.

      Delete
    4. If you need any further info, or help debugging, let me know. I'm using acpi-cpufreq on nehalem i5 cpu if that matters.

      Oh and thanks again for the BFS update. Keep up the good work as always :)

      Delete
  5. Hi CK,

    What is the scalability of the current patch? Should I use it on a 32-core workstation that may be under heavy workload? Most discussion onscalability seems happened before 2013 and I cannot find any recent clarification on this.

    ReplyDelete
    Replies
    1. Except for certain applications, BFS should be quite scalable on a 32 core workstation, especially for any applications that require lower latency (eg. networking based workloads for example.) If scalability is your prime concern, setting interactivity to off helps scalability further on BFS:
      echo 0 > /proc/sys/kernel/interactive
      Try it for yourself, but I doubt the areas that are less scalable are problematic but it depends on what your workload is.

      Delete
  6. Hi, sometimes there is a bunch of bugs about deadlocks, well here is a nice view on new tools for that. http://danluu.com/perf-tracing/

    ReplyDelete
  7. There's a bug when CONFIG_SMT_NICE isn't enabled. See the patches above for fixes or enable it in your config.

    ReplyDelete
  8. Con, any idea with this?

    https://twitter.com/poige/status/766632930961612800/photo/1

    (SMT patch applied)

    ReplyDelete
    Replies
    1. Looks like it's trying to do some scheduling before everything's set up. There have been a lot of cpu online/offline changes in mainline. I'll look into it when time permits.

      Delete