Tuesday, 7 May 2013

BFS 0.430, -ck1 for linux-3.9.x

Announcing a resync/update of the BFS and -ck patchsets for linux-3.9

Full ck patch:
http://ck.kolivas.org/patches/3.0/3.9/3.9-ck1/

BFS only patch:
http://ck.kolivas.org/patches/bfs/3.0/3.9/3.9-sched-bfs-430.patch

The full set of incremental patches is here:
http://ck.kolivas.org/patches/bfs/3.0/3.9/Incremental/


The changes to BFS include a resync from BFS 0.428, updated to work with changes from the latest mainline kernel, and numerous CPU accounting improvements courtesy of Olivier Langlois (thanks again!).

For those who tried the -ck1 release candidate patch I posted, this patch is unchanged. The only issue that showed up was a mostly cosmetic quirk with not being able to change the CPU accounting type, even though it appears you should be able to. BFS mandates high res IRQ accounting so there is no point trying to change it.

Lately my VPS provider (rapidxen) has been nothing short of appalling with incredible amounts of downtime, packet loss and IP changes without notification. They also repeatedly send me abuse complaints that  I have to respond to for my software being (falsely) tagged as viruses. Luckily I have a move planned in the near future - including where and how - when time permits, but if you find my server doesn't respond, apologies.



Enjoy!
お楽しみください

EDIT: There were some fairly dramatic CPU offline code changes to mainline (YET AGAIN!) and the changes to BFS to make it work were fairly significant so there may once again be issues with power off/reboot/suspend/hibernate. It gets tiresome watching the same code being rehashed in many different ways... "because this time we'll do it right".

42 comments:

  1. Consider moving to Hetzner. Their KVM VPS are cute.

    ReplyDelete
  2. thanks! now waiting for graysky's "official" benchmark before applying the patch :)

    ReplyDelete
    Replies
    1. Passes my tests. Pissed that I cannot post images here, I broke down and started by own blog :p

      Here is the link to the data and analysis.

      Delete
    2. quick q.

      I see in your tests you are compiling with -j9, isn't the i7-3770 an 8 core cpu and doesn't Con recommend compiling with -j(num of cores) with bfs?

      Also I can't post on your blog as you don't allow anon.

      Delete
    3. You are a very astute reader. I edited my blog post with the following explanation as well as unlocked it to anonymous posters (here comes the spam).

      "Note for those of you really paying attention: Both benchmarks used `make -j9 ...` even though the 3770k is a quad (4 physical + 4 virtual=8). I am aware that it is recommended NOT to use the x+1 formula for kernels running the BFS but felt that in order to fairly compare both schedulers, this needed to be held constant. That said, I have done some experiments where I varied the make flags (8, 9, 10) and found that there was no statistically significant difference on the BFS."

      Delete
  3. The release candidate ran for days without issues on my MacMini.

    Also I don't have any time accounting issues any more as with previous BFS patch releases. But this could have been a bug of 'make oldconfig': I just learned this method is broken when upgrading major kernel releases!

    A big warning: Use 'make defconfig' instead after a major kernel release! Greetings and many thanks from Hamburg, Germany
    Ralph Ulrich

    ReplyDelete
    Replies
    1. Sounds like delirium. I keep self-compiled kernel up-to-date since 2.6.16 using oldconfig — and got no problem.

      Delete
  4. "because this time we'll do it right"

    Hehe, so write your own and get it merged :).

    ReplyDelete
  5. Con, graysky has found out that increasing tick rate from 300 Hz to 1 kHz eliminates hanging on reboot. Could that be useful for fixing the issue properly and could you explain why it behaves so… unpredictably?

    ReplyDelete
    Replies
    1. That's hardly a fix if it depends on a certain Hz value being plugged in. I can't think of a reason offhand.

      Delete
    2. https://bugs.archlinux.org/task/35237

      That is the first bug report to my knowledge that shows this behavior without the BFS patched kernel. Would be really interesting to see if his/her problem goes away by increasing the tick rate...

      Delete
    3. Graysky that is MOST interesting - assuming he really is running a vanilla kernel. As has been the case before, BFS may simply be more likely to bring out mainline bugs due to its design expressing race conditions more easily.

      Delete
    4. ck, This is totally the case.

      Remember the problem that I mention to you

      glibc-2.17/posix/tst-waitid.

      With BFS, you get the problem with less than 50 iterations while with vanilla kernel, it is much rarer. It took over 3000 iterations to fail.

      Delete
  6. Is the breakage in rusage's timers fixed?

    ReplyDelete
  7. Con, another bugreport.

    After hibernating-resuming cycle I get strange vmstat output:

    ===
    [pf@spock]:[~][0]% vmstat
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    4294967295 0 0 12262312 51716 1016868 0 0 23 25 269 296
    8 1 91 0
    ===

    As you see 4294967295 value is abnormal. Same report came from another user:

    ===
    There is a problem though, now after hibernate-resume vmstat often displays
    4294967295 (0xFFFFFFFF) as the first value (The number of runnable processes),
    in addition to normal values like 0, 1, etc.. So the kernel gives wrong
    process statistics. (I couldn't catch this bogus value in /proc/stat where it
    supposedly comes from, though. Probably because each time I looked at the file
    the process that I used for this was counted as runnable by the kernel, thus
    resetting the number of runnable processes to zero.)

    I definitely don't observe this behaviour before hibernation. And I don't
    remember having this issue at the times of older kernels (3.5? 3.2?) when
    hibernation was usable to me.
    ===

    Observe that with bare -ck. How can I help to get that fixed?

    ReplyDelete
    Replies
    1. Right about now, about the only thing that would help me is another coder since I'm having trouble scheduling in time for BFS due primarily to real life commitments and less so other code distractions. I keep hoping for more time but it seems I only have less time with each kernel release.

      Delete
    2. Some time ago there was at least one skilled coder by the name Chen (RIFS patch), and also others who posted unofficial bfs ports or patches... Are you still around, guys? Can you lend a hand?

      Delete
  8. Hello, I try to compile kernel 3.9.4 in ubuntu 12.10 with ck patchset and BFQ but get this error http://pastebin.com/rJSitgZb. Someonbe can help me? Thanks

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Your error is just this one:

      /home/gorneman/Builds/Kernel39/linux-3.9.4/kernel/sched/bfs.c: In function ‘irqtime_account_hi_si’:
      /home/gorneman/Builds/Kernel39/linux-3.9.4/kernel/sched/bfs.c:2333:2: error: implicit declaration of function ‘nsecs_to_cputime64’

      Delete
  9. @ck:
    I don't like seeing you dropping patches. But when you're on the run...

    The following sub-patches seem to harm performance/ interactivity on my old computer:
    mm-kswapd_inherit_prio-1.patch
    mm-idleprio_prio-1.patch

    I'm making heavy use of /dev/shm that gets partially swapped out, as I use it as a ramdisk.
    2GB RAM
    4GB swap (on 2nd disk)
    3GB /dev/shm
    (Read the other setup from my previous postings, please.)

    With your 2 patches mentioned above, I'd get glitches and stuttering in video-playback while using /dev/shm with swapping in parallel.

    Maybe, I'm the last Unicorn you have, to test your patches with an old machine.

    Best regards,

    Manuel

    ReplyDelete
    Replies
    1. Hi Manuel
      I think these special patches target special use cases (battery life etc). I myself don't use them but use my own optimizations:
      rcu boost up
      no no-hertz
      300 ticks instead of thousand ticks.

      Greetings from Hamburg, from where in sunshine I can see the flooding bad weather in the east.
      Ralph Ulrich

      Delete
    2. Hi Ralph,
      nice reading you again!

      What does the 300 ticks vs. 1000 (=BFS default) better?

      Regarding the RCU boosting, I've made good experiences with the following:
      [CONFIG_TINY_PREEMPT_RCU=y is appropriate for my uniprocessor/-core system, btw.]
      CONFIG_RCU_BOOST=y
      CONFIG_RCU_BOOST_PRIO=99 <-- making it realtime
      CONFIG_RCU_BOOST_DELAY=331 <-- using a prime number to avoid reoccurrences, smaller than default=500

      Greets,

      Manuel

      Delete
    3. Manuel,
      NO_HZ is proposed to spare energy: When idle do no more irq wakeups. But I do instead:
      - no NO_HZ
      - CONFIG_HZ_1000 down to CONFIG_HZ_300
      This also safes energy when idle 70 percentage and without the problems of NO_HZ set. And - unsure about - it provides a better throughput on highload, because the less irq context switches !?

      Isn't a possible intervention of 300 times a second enough for me as user? I mean, even case of playing games? The unreasonable high 1000 Hz was introduced because of worst case experience of waiting times of over a second. But this prevents BFS as the scheduler!

      I set CONFIG_RCU_BOOST_PRIO=18 , which results a process rcub/0 with prio -19. I need this lower abs number than the process irq/21-b43 with prio -51, because I have seen my wify connection got unreliable otherwise. Manuel, htop shows with push F6 and select sorting column your system priorities.

      Ralph Ulrich

      Delete
    4. Ralph,

      thanks for remembering me of CONFIG_HZ_300. I know, you've proposed this some months or even a year or more ago. Dunno, why I've abandonned that setting. But what I definitely remember is I got better scores in WCG (www.worldcommunitygrid.org) with CONFIG_HZ_300.

      I completely don't understand your second paragraph beginning with "Isn't a possible...".

      But currently I'm running a 3.9.4 with my optimizations and your 300 Hz. Let's see.

      BTW: My htop (openSUSE 12.3) seems to have problems, that top doesn't show up: Orphaned processes that display, but that really aren't running any more. Maybe htop is dead? Or at least unusable?!

      Manuel

      Delete
    5. I thought 1000 + nohz was better for responsivity? Is 300 better for bfs on recent mobile cpus?

      Delete
    6. It's the old give some and get some topic. Interactivity for throughput in I/O or vice versa. The 300 Hz don't make ALL things better on my old system, it stays old. The same is valid for the BFQ I/O scheduler. You need to test it for yourself.
      For now we even don't have a 'Responsiveness-Benchmark' so far, what is a real shame!

      Manuel

      Delete
    7. Responsiveness-Benchmark? Try this one: https://github.com/pfactum/kernelat/

      Delete
    8. @post-factum:
      Thank you for providing your work to us. Can you, please, provide a more detailed description of what it does before I use it (on the page and in a in-tree readme) and why you call it "silly".

      Manuel

      Delete
    9. It's silly because I guess it's better to use perf subsystem.

      kernelat launches child process and measures time from calling system() to entering child's main(). Launching more children at the same time can give us some responsiveness value.

      Delete
  10. Greetings,

    Just to let you know that I tested your BFS patch v0.430 during a couple of weeks, but had to revert to a vanilla kernel because BFS causes mencoder to deadlock when using multiple cores to encode some videos (about 1 in 3 videos)...

    Thierry.

    ReplyDelete
    Replies
    1. @Thierry:
      Have you had any relevant log messages related to this issue? Would be a great help if you posted them. And... are you using additional patches?

      Thanks, Manuel

      Delete
  11. @Manuel

    No, no log message, just mencoder stopping dead in its encoding and sitting there, using 0% CPU forever. It was always during the second pass of a 2 passes encoding (VBR).
    And no, I'm not using any other kernel patch, just the vanilla Linux kernel...

    Thierry.

    ReplyDelete
    Replies
    1. Sounds more like a race condition in the unmaintained mencoder code brought out by BFS rather than a BFS problem itself.

      Delete
  12. I was reading about some power-efficiency scheduler patches who haven't been merged in mainline yet (at http://lwn.net/Articles/546664/ and http://lwn.net/Articles/552885/)

    I wonder if BFS needs something like this too and how does it compare in terms of power usage with cfs?

    ReplyDelete
  13. For those of you also using the BFQ I/O scheduler...

    they've brought out a new release: "v6r2 for 2.6.38 - 3.9.0" containing quite a bunch of fixes

    Announcement: https://groups.google.com/forum/?fromgroups=#!topic/bfq-iosched/BcT3HBmQO5M

    Patches: http://algo.ing.unimo.it/people/paolo/disk_sched/patches/3.9.0-v6r2/
    (for older kernels browse the parent directory)

    Best regards, Manuel

    ReplyDelete
  14. Any resyncs against 3.10 tree?

    ReplyDelete
    Replies
    1. Personally I usually wait for 3.x.1 before installing a new kernel unless it fixes some issue I'm having.

      Delete
  15. I second that:

    would be nice to know, yeah

    CFS is significantly less efficient than BFS and there are regularly from time to time small interruptions in e.g. audio playback - even when heavily tuned ...

    thanks !

    ReplyDelete
  16. Hello,

    I have several problems with the new kernel version (3.9)
    Any help provided is appreciated :)
    https://bbs.archlinux.org/viewtopic.php?pid=1293535

    ReplyDelete
    Replies
    1. Have you also tried with the official 3.9 (not ck)? Do the same errors appear?

      Delete
  17. thanks for sharing the article. i had been looking for this!
    Affordable Web Programmer

    ReplyDelete