Sunday 2 August 2015

BFS 463, linux-4.1-ck1

Finally a resync to linux-4.1 . Sorry I was just too preoccupied to get around to doing this, and I haven't directly addressed a few known problems that have workarounds, and it comes with a warning.

BFS by itself:

4.1-sched-bfs-463.patch

-ck branded linux-4.0-ck1 patches:

4.1-ck1 patches

The usual collection of resyncs and minor updates including pending fixes post 462.

This includes a fix for some uniprocessor build problems courtesy of Serge Belyshev. If you still have boot problems with uniprocessor builds the workaround is to create an SMP kernel.

I've finally bit the bullet and removed the block flush code from within the main schedule() call, in keeping with how mainline does it. This is a problem that has recurred every time I've removed this change from previous kernels and had to re-add it every time. Complete hangs under particularly heavy IO used to be the problem, please report back if these come back with this kernel, hence the warning.

On the previous kernel, some had crashes unless they enabled NUMA. I have no idea what caused these and have done no specific changes to address it. I don't want people to enable NUMA unnecessarily but if you have crashes this is the first thing to try and please report back.

Enjoy!
お楽しみください

22 comments:

  1. Links to patches should be 4.0/4.1/ instead of 4.1/4.1. Thanks for your work.

    ReplyDelete
  2. http://ck.kolivas.org/patches/bfs/4.1/4.1/4.1-sched-bfs-463.patch

    How about this one...

    ReplyDelete
  3. Thanks for the update. Its been runnng without any issues for 24h or so.

    Also, my system feels much smoother (esp. during heavy IO) than before compared to 4.0-ck1, although I didn't test if its the same with vanilla kernel.

    ReplyDelete
    Replies
    1. Looks like I commented too soon.

      Apparently btrfs scrub freezes my system with -ck patched kernel, just as it did with bfs460/3.18-ck1.

      I couldn't get kernel logs from the crash unfortunately, but I try to look into it when I have time.

      Delete
    2. I think I may have this issue. I was scrubbing btrfs via SSH command, and when I came back, my laptop had hard locked, not even the screen would turn back on from sleep.

      Delete
    3. There's a pattern here and it's almost certainly related to the warning I gave in the announce post. Here's a test patch that returns the code to the old behaviour which should fix this problem (and possibly others):
      bfs463-revert-unplugged.patch

      Delete
    4. I scrubbed 3 times i a row without a lockup, so I guess this patch is still needed.

      Delete
    5. Oops, forgot to mention, this is with the patch you linked included.

      Delete
    6. Thanks for the quick feedback. That seems like once again it's a pretty significant bugfix. It seems this is the only way to flush unplugged IO in BFS. Let's see if it has any impact on the other issues out there as well but sometime in the near future I'll include this patch in the next BFS.

      Delete
    7. Indeed, looks like revert-unplugged.patch fixes the issue.
      Thanks for quick fixing.

      Delete
  4. Hello and sorry for bad english.
    I have some weird thing happening with linux-ck k10, that i dont have with standart kernel. I'm using archlinux and repo-ck. It is not about some last releases, because i had this problem more then year ago.

    After some uptime(more then 6-8 hours) all sites in any browser load only after 10-15 seconds? in the same time there is no problem with connection. coz there is no packets lost when i ping some url or use curl/torrent clients to download something. At first i thought its something with r8169 module for my Realtek RTL8168, so i changed it to r8168-ck - but that didnt help. So maybe it is with bfs and my harware?
    Pls direct me which additional information do you need to help fix this problem, any logs etc. So i can post them somewhere. Did anyone ever have the same problem as me?

    ReplyDelete
    Replies
    1. Oner person has reported this problem with r8169 (it might have been you) but there is no code that touches that driver. What I suspect is happening is there's a race condition in the driver itself that is brought out by using BFS and is far less likely to happen on the mainline kernel. This means unless you can reproduce it on the mainline kernel and report the bug, I have no way to help you with it unfortunately.

      Delete
    2. Maybe it's not even with network(this module)? As i said connection are fine, just a browser behavior is weird. And this thing never happens with mainline kernel so i dont think i can reproduce it. Anyway, you don't think any logs when this will happen again would help you recognize a problem?

      Delete
    3. Interesting. I have this too. Is there no way this is just -ck related?

      Delete
    4. I didn't say there's no way... The real answer is I don't know. Perhaps get the output of 'top' running while it's happening to see if it's a problem of wasted CPU cycles or just sitting idle. Try your browser without plugins, try BFS without SMT nice.

      Delete
    5. Ok. I'll get top output next time this happens. But how actually i can check-disable(remove) SMTnice?

      Delete
    6. Apparently a hyperthreading tweak (Intel), but I'm running AMD Phenom:

      - http://www.phoronix.com/scan.php?page=news_item&px=MTc2NDQ
      - http://ck-hack.blogspot.fi/2014/08/smthyperthreading-nice-and-scheduling.html

      Delete
    7. Im running AMD too. So SMT off?

      here is 'top' output when i'm trying to open some site with this problem
      https://ptpb.pw/aoXP.txt and https://ptpb.pw/ORRr.txt

      Delete
  5. Hi Con,

    still having the crashes (complete freeze) with BFS under heavy IO (old AMD CPU).
    Copying some data (ok, its was a large amount of it) from USB drive to SATA drive freezes the whole system. Only hard reset was working. Hadn't any logs within the systemd-journal. (Maybe, because I stripped down the debugging in my kernel config).
    My second test on another computer (i7 laptop) leads to an crash during USB stick test with f3write/f3read.

    Sorry, that I must report this.

    Regards sysitos

    ReplyDelete
    Replies
    1. try the patch that ck posted on comments above.

      Delete