Wednesday, 7 September 2016

BFS 490, linux-4.7-ck3

Announcing yet another substantial update for BFS for linux-4.7 based kernels.

BFS by itself:
4.7-sched-bfs-490.patch

-ck branded linux-4.7-ck3 patches:
linux-4.7-ck3

Following on from the large update to BFS in 480 to skip lists, numerous regressions became apparent, the bulk of which were related to doing a poor job of signalling cpu load to the various cpufrequency governors. Some were affected badly, others not so, but there were plenty of helpful people giving feedback about those regressions which encouraged me to slowly but surely chip away at the problems. Additionally, there were some minor behavioural regressions which were oversights during the updates to BFS 480. Finally the rudimentary cgroup stub patch would crash the system.

As the number of patches required to address these issues got larger and larger, it became hard for people on this blog to keep up with the changes so I've released 490 which hopefully should address the bulk of these issues - there are patches in there that haven't been posted on this blog, but I've included all of them with a brief description in the incremental/ directory for your perusal.

Anyway it is much easier for people to grab the latest version which includes all of those changes, including the updated cgroups stub patch.

EDIT: Here's a patch to make cgroup stubs safer cgroup-stubs-safe2.patch

Enjoy!
お楽しみ下さい
-ck

65 comments:

  1. Thanks Con, your are hacking bfs faster than I benchmark it !
    So I skipped your lasts testing patches for bfs 480 and went straight to 490.

    I've put my results in a google spreadsheet as they are becoming quite big. You can find it here:
    https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing

    bfs 490 has improved a lot over bfs 480 ! Next I'll test it with interactive=0.

    I the meantime I've run linux 4.4+bfs with interactive=1.
    I've also updated the results for linux 4.7+cfs. The results I posted previously where for the stock archlinux kernel (4.7.2-1), whereas the kernel running bfs has several config options disabled (NUMA disabled, CONFIG_MCORE2 enabled, DEBUG_KERNEL disabled, FRAME_POINTER disabled, and others...), but that should barely make any difference.
    So now, 'cfs 4.7' kernel as the exact same configuration as 'bfs' kernels, and the comparison is all the more fair.

    Pedro

    ReplyDelete
  2. Thanks very much for doing those Pedro, it looks much more respectable now. As always my mini-hack to get the massively changed cpufreq code working was hopeless and it's only working better now that I did a more comprehensive patch for it.

    ReplyDelete
    Replies
    1. Thank you for your work.
      I've finished testing bfs 490 with interactive=0. Even if it's not the goal of bfs, throughput is indeed better with interactive=0 for single-threaded workload.
      Now regarding difference in responsiveness, I don't know how to test it. I'll wait for other users input.

      Pedro

      Delete
  3. I have a new, I tested the kernel just put in the repo, I had only one freeze, I am not sure yet if it is fixed my old proble, with the ck2 and bfs 480. in the other hand I have better temps with this kernel, vs oficial an older kernel of ck. Thank you very much for your work, I will test more the new kernel and see if the problem was fixed or not because when I put the older kernel the freezes went out

    ReplyDelete
  4. ===
    kernel/sched/bfs.o: warning: objtool: __schedule()+0x5f1: duplicate frame pointer save
    ===

    I believe that is OK, but just want to let you know.

    ReplyDelete
  5. Con, also, please take a look at this panic:

    https://gist.github.com/8c65b2c01f7182eb578dbd9b2ef8ffd3

    It occurs after doing poweroff in qemu, and I believe it is related to CPU cgroups support.

    ReplyDelete
    Replies
    1. Thanks pf!

      I'm not sure on the first and this is the second time it's been posted (presumably only shows up on gcc6+), but the second definitely is cgroup related. Can you get a backtrace for both of those?

      gdb vmlinux
      list *__schedule()+0x5f1

      and
      gdb vmlinux
      list *sched_offline_group+0x2a

      Thanks!

      Delete
    2. Sure, but I should recompile kernel locally instead of having it in OBS. Will re-check this in several hours.

      Delete
    3. Re-compiled kernel with debug info. Trying to do you've asked for, but get this:

      ===
      (gdb) list *__schedule()+0xa1d
      You can't do that without a process to debug.
      ===

      What I'm doing wrong?

      Delete
    4. Also, relevant panic for debugging kernel:

      https://gist.github.com/b89d670535b160b7648d1cd5b16fadf0

      And info obtained from addr2line:

      https://gist.github.com/7e7d152dcdde40470257bf58bfdf37e1

      Hope this helps.

      Delete
    5. Oh, managed that. Please, see "list" output:

      ===
      (gdb) list *(__schedule+0xa1d)
      0xba1d is in __schedule (kernel/sched/bfs.c:2237).
      2232 * do an early lockdep release here:
      2233 */
      2234 spin_release(&grq.lock.dep_map, 1, _THIS_IP_);
      2235
      2236 /* Here we just switch the register state and the stack. */
      2237 switch_to(prev, next, prev);
      2238 barrier();
      2239
      2240 return finish_task_switch(prev);
      2241 }
      ===

      Also:

      ===
      (gdb) list *(sched_offline_group+0x2a)
      0x977a is in sched_offline_group (include/linux/list.h:89).
      84 * This is only for internal list manipulation where we know
      85 * the prev/next entries already!
      86 */
      87 static inline void __list_del(struct list_head * prev, struct list_head * next)
      88 {
      89 next->prev = prev;
      90 WRITE_ONCE(prev->next, next);
      91 }
      92
      93 /**
      ===

      Let me know if you need additional info.

      Delete
    6. GCC6 seems to miscompile sometimes, for example firefox crashes with it too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=836533

      Delete
    7. I have a very similar stack trace while compiling from graysky's AUR package at 9c78234 (4.7.3-3). I'll try to build the kernel without cgroups and see how that goes.

      Delete
    8. By the way, the panic happens with both GCC 6.2.1 and 5.4.0.

      Delete
    9. Thanks pf. The duplicate frame pointer save warning is okay. I've posted a safe cgroups patch in the top post to address the other crash (generically.)

      Delete
    10. Con, second patch that does remove the code seems to fix the panic — I cannot trigger it anymore.

      I hope, that is the reasonable solution.

      Delete
    11. Great thanks. Hopefully that fixes all the crashes people were experiencing so I can move on and do more fun development :)

      Delete
  6. nothing, it's just weird I cant do nothing, now the fourth freeze and now without games or browser, watching videos, i turned with the oficial kernel again.

    ReplyDelete
    Replies
    1. Did you enable the cgroup stub feature?

      Delete
    2. I installed arch and let the default options except for dirty bytes and io scheduler for ssd and hdd, other thing is that I use the haswell linux ck kernel of repo of gravisky.. I am not sure if it is enabled sorry if i Don't help so much, you can see the options used in linux-ck aur page

      Delete
    3. other thing is that the linux ck1 don't give me any problem of freezes while ck2 and ck3 yes.. the problem starts with the change of bfs to 480

      Delete
    4. Thanks, probably still a bug somewhere in the new core code then. I'll keep looking but hopefully someone will capture a crash/backtrace for me to know where the problem is.

      Delete
    5. I am also experiances these freezes, around 4-5 times now, while playing games, watching videos... I am using linux-ck-haswell 4.7.3

      Delete
    6. Same repo. Still enables the experimental cgroups feature which I know now is unstable. I've asked graysky to disable them but he's currently busy.

      Delete
    7. i asked him too while others have less errors if the cgroups is disabled.. I will test the new kernel when he put in the repo and I will post here if the error is out or not, I am using now the bfs 472 in the older kernel for the pc with intel, the laptos haven't got this problems with freezes, I don't know why this differences with the same arch , the same packages.. the difference is the time of cpu and the cpu

      Delete
    8. in the aur page gravysky has changed the pkgbuild with the cgroups patches disable, it is only time that the repo need to put the new version.. I will test in a few hours or the next day

      Delete
  7. Con, with the new BFS 490 after every suspend / resume this appears in the logs:

    CPU: 1 PID: 16 Comm: migration/1 Tainted: G OE 4.7.3-bfs-skp #1
    Hardware name: Dell Inc. XPS L521X/0880F2, BIOS A16 12/17/2013
    0000000000000286 00000000a55cfa0b ffff88044caf7e48 ffffffff813e79f3
    0000000000000001 ffffffff81cc5b28 ffff88044caf7e78 ffffffff81406895
    ffff88044cae9800 ffff88044e801280 ffffffff81e59a20 0000000000000001
    Call Trace:
    [] dump_stack+0x65/0x92
    [] check_preemption_disabled+0xe5/0xf0
    [] debug_smp_processor_id+0x17/0x20
    [] smpboot_thread_fn+0x173/0x230
    [] ? sort_range+0x30/0x30
    [] kthread+0xd8/0xf0
    [] ret_from_fork+0x1f/0x40
    [] ? kthread_worker_fn+0x180/0x180
    smpboot: CPU 1 is now offline

    I had checked the logs and even 4.7.2 + 480 gave me some sort of similar dumps in the logs.

    CPU: 0 PID: 9 Comm: migration/0 Tainted: G OE 4.7.2-bfs-skp #1
    Hardware name: Dell Inc. XPS L521X/0880F2, BIOS A16 12/17/2013
    0000000000000086 000000003cd7157e ffff88044c997d88 ffffffff813d8bf3
    0000000000000000 0000000000000000 ffff88044c997dc8 ffffffff8108178b
    0000007d4c997e50 0000000000000001 ffff88045f217dd0 0000000000017d00
    Call Trace:
    [] dump_stack+0x63/0x90
    [] __warn+0xcb/0xf0
    [] warn_slowpath_null+0x1d/0x20
    [] native_smp_send_reschedule+0x3e/0x40
    [] wake_smt_siblings+0x70/0x80
    [] __schedule+0xa01/0xcd0
    [] schedule+0x35/0xc0
    [] smpboot_thread_fn+0xc0/0x160
    [] ? sort_range+0x30/0x30
    [] kthread+0xd8/0xf0
    [] ret_from_fork+0x1f/0x40
    [] ? kthread_create_on_node+0x1a0/0x1a0
    ---[ end trace 639864e7b4173949 ]---
    smpboot: CPU 1 is now offline

    As system was not affected in any visible way, I didn't even knew that errors were there.

    With BFS 472 there were no such errors.

    br, Eduardo

    ReplyDelete
    Replies
    1. Additionally, this gets printed to dmesg before trace:

      [ 3005.418397] Removed affinity for 631 processes to cpu 1
      [ 3005.418400] BUG: using smp_processor_id() in preemptible [00000000] code: migration/1/16
      [ 3005.418405] caller is debug_smp_processor_id+0x17/0x20

      br, Eduardo

      Delete
    2. The using smp_processor_id() in preemptible code BUG during suspend/resume cycle is a long existed bug in mainline but don't know why there is no fix for releases, maybe it is not triggered in cfs. Here is my fix for it which I can't stand for it last month. It's in my new -test branch but not yet released.

      commit 5894464e9238b794a03713046d50e5972d526fa2
      Author: Alfred Chen
      Date: Tue Aug 30 11:11:36 2016 +0800

      bfs: Fix mainline smp_processor_id() called in preempt code issue

      diff --git a/kernel/smpboot.c b/kernel/smpboot.c
      index 13bc43d..fc0d8270 100644
      --- a/kernel/smpboot.c
      +++ b/kernel/smpboot.c
      @@ -122,12 +122,12 @@ static int smpboot_thread_fn(void *data)

      if (kthread_should_park()) {
      __set_current_state(TASK_RUNNING);
      - preempt_enable();
      if (ht->park && td->status == HP_THREAD_ACTIVE) {
      BUG_ON(td->cpu != smp_processor_id());
      ht->park(td->cpu);
      td->status = HP_THREAD_PARKED;
      }
      + preempt_enable();
      kthread_parkme();
      /* We might have been woken for stop */
      continue;

      Delete
    3. Alfred, I'll cherry-pick this one for -pf :). Thanks!

      Delete
    4. Thx Alfred, I'll incorporate this into my build as well.

      Delete
    5. You're spot on there Alfred. I think even if you don't trigger it on mainline the code so clearly violates the preempt disabled requirement in such a short space that you could just generically submit a patch for it to mainline anyway.

      Delete
    6. This would be a good sample that bfs can trigger issues that mainline scheduler may not notice about.
      I will submit the patch or someone plz help to submit it. Now, the skiplist in bfs is lot more interesting than this, :)

      Delete
    7. @Alfred:
      If the patches' issue is so longstanding and so obvious, as Con states, you should simply post it to LKML, maybe with links to the regarding supportive posts.
      It's not satisfying to know of a BUG people may encounter, but need not.

      BR, Manuel Krause

      Delete
    8. @Alfred. I can submit it if you like.

      Delete
    9. @ck
      Thanks, I got your submit email.

      Delete
  8. CK I make some test with and without pstate driver, with both of them (pstate and cpufreq ) the kernel continues freezing, I don't know if it is a problem with memory(I have 8GB) or with swapping( I have 50 in vm for swap) but the problem is here, with the older kernel any freeze ocur.. but I think I will go to the older version to see if it is a new problem only with me or it is a new config in my pc, I will inform.. sorry for not to being more helpful

    ReplyDelete
    Replies
    1. nothing, in the older kernel isn't any type of freeze, I am here writing and gaming since 30 minutes with the same settings and kernel build of repo gravysky

      Delete
    2. Hey Alberto. I checked graysky's repos and they enable the cgroups feature which is still unstable so perhaps that's where your problem is coming from with this latest kernel (though other bugs may also be present that I don't know about.)

      Delete
    3. I was thinking about this and I have just knew that I had c states enabled in bios, it is the only thing that is different from my other 2 laptops, one with an older intel core 2 duo and other with amd.. may it be the problem? with the change of bfs behaviour may be the cores idle with c7 or c6 state don't wake up, this is one of the things that may cause the freeze.. I am not sure about the problem with cgroups because I have in the other 2 laptops the same repo of gravisky but one with kernel piledriver and the other with core 2 duo version and it didn't make me any freeze.. thanks again and i'll post in the aur page that the cgroups patch is unstable yet, i'll wait for other compilation for test and to get you know the result

      Delete
    4. If You are talking about CONFIG_CGROUP_SCHED, then I have it, but I don't experience any freezes or crashes on my Dell XPS 15 (i7 CPU).
      My kernel is 500Hz, BFS 490 patch, Generic 64bit compilatiion, low latency desktop preemption.

      Br, Eduardo

      Delete
    5. @ck As you are saying that cgroups support might be the reason for all these bugs but I am not sure if this is really the reason as it just works fine with BFS 472.

      Delete
    6. I'm not saying it's the reason for ALL the bugs but it's a KNOWN reason. It can't have worked fine with bfs472 because there was no way to enable CONFIG_CGROUP_SCHED in bfs472.

      Delete
  9. have you enabled c states in bios? I have not tried disabling because I downgraded the kernel before I thought that

    ReplyDelete
    Replies
    1. The c states won't be responsible. There's a real bug there somewhere in the BFS patch.

      Delete
  10. I did some tests of bfs vs cfs about throughput, but bfs is about minimizing latencies. I remember that a while back someone posted about this tool :
    https://github.com/iovisor/bcc/blob/master/tools/runqlat_example.txt
    So I gave it a try on two linux 4.7.2 kernels with the same config. One has cfs the other has bfs490. The bfs kernel is compiled with SMT_NICE and CGROUP_SCHED disabled.
    I wrote a basic script that load the cpu (i5-3210m) by building ffmpeg with make -j4, waits a few seconds, and then runs 'runqlat 10 2'. The build takes place in /dev/shm to prevent disk io from interfering with the results.

    The raw results are:
    for cfs: http://pastebin.com/gS8FfnmY
    for bfs490+interactive=1: http://pastebin.com/PsFwvVFn
    for bfs490+interactive=0: http://pastebin.com/zhqV0Kpe

    One can make all kind of maths and graphs with this (mean, std dev, median, ...), but before I do, I must first dust off my maths and then know if these results and the test are of any relevance (I think they are, but I know little about tasks scheduling and cpu).
    Con and other users, what do you think of it ?

    Thanks
    Pedro

    ReplyDelete
    Replies
    1. Ok I've put the data in the google spreadsheet:
      https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing

      It's a bit messy though.

      Pedro

      Delete
    2. Thanks. Those qlat graphs are mildly interesting, but they're measuring microlatencies which would be unnoticeable by humans. If any of them went beyond the 6 millisecond range they'd start being noticeable. It's good to see the values bound for both schedulers. The results might be more interesting as load is increased progressively further with higher make -j values. Additionally recent BFS patches have not been optimal due to issues with cpu frequency signalling so either try an older BFS, say 469 on linux 4.5, or the latest BFS 490, or disable cpufreq scaling by setting a performance governor while doing tests. Thanks!

      Delete
  11. bad news I have just tested the new version in the gravysky repo, the kernel is freezing too, the cgroups wasn't the problem, I saw the journalctl to see if there were a problem and in the log there isn't any error before the freeze..

    ReplyDelete
    Replies
    1. @Alberto, @Con,

      I have freezes as well with 4.7.3 + BFS490 (CGROUPS enabled), but I can trigger them 100% (at least it seems so) by changing laptop brightness (press fn+brightness up/down and here we go), locks up w/o blinking caps lock.
      BUT, I can trigger that in Ubuntu 16.10 and NOT in 16.04. Of course like a LOT has changed in 16.10 compared to 16.04, so it's difficult to say (for me at least) what is the reason, but in 16.04 I haven't had any lockups so far.

      I compile kernel in 16.04 with GCC 5.4 (default which comes with 16.04) then install it to 16.04 and 16.10. Alfred has Arch, so it most likely have newest stable packages for everything, 16.10 is in development, so it has rather new stuff as well. That's the only thing in common.

      What can I do to help You to track down the problem?
      Btw, Con, which distro/version do You use for development/testing?

      br, Eduardo

      Delete
    2. Did you add the cgroups safe2 patch?

      Delete
    3. it is supposed to be disabled by gravysky in the pkgbuild.. and the freeze is totally, i can't do nothing, reisub is not function and the brightness can't be controled because is a desktop where I have the problems, I am testing now if the freeze only occurs with the game or can be with other things(because the other kernel with 480 and 490 bfs patch freeze many times but I have no time with this kernel for tried this.. but I think the freeze will occur with browsing and gaming too.. I will post the results as soon I will able to do, in the forum or arch there are many people with haswell desktop and this problem, but with laptop haswell appear not to be freezes

      Delete
    4. Thanks alberto. In that case it seems to be somehow haswell related. Was bfs472 stable for you? If so I'll have to post a couple of different patchsets to see where the culprit lies.

      Delete
    5. yes, the bfs 472 patch was very stable for me, any freeze there. Other thing i was before testing again ck, the freezes come yet with browsing too, and other thing, there are people that have appeared with other architectures(silvermont ie) that have this freezes with rsync, I have the freezes with gaming, browsing playing videos.. I don't know why

      Delete
    6. https://bbs.archlinux.org/viewtopic.php?id=111715&p=113
      (see the lastest pages)

      Delete
    7. Alberto it looks like others are affected withOUT -ck patches. It could just be that the latest BFS makes it happen more easily and that it's a bug from mainline. I've looked hard at all the code I put in and can't see anything wrong so far. Give -ck4 a try when it comes out (shortly.)

      Delete
    8. @Con,

      safe2 seems ok. I didn't notice safe2 was availble, sorry.

      br, Eduardo

      Delete
    9. No problem. It's hard to keep up when I'm hacking this aggressively... ck4 is about to be released.

      Delete
    10. there are a minimun number of persons affected in the mainline that is related sure to other aspects, I have freezes ONLY with the ck kernel, I have been working like 3 or 4 hours and any freeze, with the 472bfs and the oficial kernel, since the bfs 480 I can not work more than 1 hour because the system freezes totally, I am sure the problem is related to the kernel patched because always occurs with that kernel, I will test the new patch and tell you if the problem continues here.. other thing is that with my other 2 laptops, the intel core 2 duo and pildriver cpu there are any freezes, only in my desktop haswell

      Delete
  12. the log stops to write before the freeze or the freeze is not logged in any form

    ReplyDelete
  13. Ok I've run the tests with cfs+acpi-cpufreq+performance and bfs490+acpi-cpufreq+performance, which lock the cpu frequency at maximum.
    Here are the results for increasing -j values.
    for bfs: http://pastebin.com/MSKHPYa0
    for cfs: http://pastebin.com/Bdjx6Dnp

    The latencies are higher and the distribution clearly becomes bimodal. Maybe it is because of kernel processes that runs at higher priority.
    The upper bound is higher with cfs than with bfs at small -j values.
    I've also put those data in the spreadsheet.

    With the runqlat utility it is possible to monitor the runqueue latencies for a specific PID. I was thinking of loading the system with a make, and then monitoring a specific process like a movie player or web browser. Would this be interesting ?

    Pedro

    Pedro

    ReplyDelete
  14. Hey Pedro those latency figures are VERY interesting because this is now showing how BFS is able to keep the latencies bound to under human perception rates while those on CFS start to blow out when the load is only 50% higher than your number of CPUs. I don't think you need to test specific processes as you're already getting the results you need. Out of curiosity was the sched autogroups feature enabled on CFS?

    ReplyDelete
    Replies
    1. Thanks for the explanation, and yes SCHED_AUTOGROUP was enabled on cfs.

      Pedro

      Delete
    2. That is probably why CFS looks ok when the load gets ridiculously high. It's not a fair comparison with that enabled.

      Delete