Announcing a resync and update of BFS for linux-4.0
BFS by itself:
4.0-sched-bfs-462.patch
-ck branded linux-4.0-ck1 patches:
4.0-ck1 patches
The usual collection of resyncs and minor updates only.
It includes the following changes:
- Minor tweaks to uniprocessor build (though enabling SMP will fix breakage if it still exists).
- Fix for tracing build failure
- SMT nice update to ignore kernel threads
- Decrease log level of locality information to debug
EDIT Fix for 4.0.2+: bfs462-rtmn-fix.patch
Enjoy!
お楽しみください
Did a uniprocessor build on Arch for testing (EeePC 701); still panics immediately. Will next verify that enabling SMP still fixes it.
ReplyDeleteThanks for the info and sorry. No surprise I guess since I honestly didn't put much effort into it when I found it would boot on my SMP machines with a UP build.
ReplyDeleteI've verified that enabling SMP fixes the panic, as expected. Things do seem a bit more sluggish...though I've only been booted into it for a few minutes. :-)
ReplyDeleteI also have not yet tried the patch mentioned by kernelOfTruth in the 3.19 thread: http://ck-hack.blogspot.com/2015/02/bfs-461-linux-319-ck1.html?showComment=1427417073374#c5057988316819204350
Update: I see that linux4.0 already has that patch applied, so I guess it didn't fix my (PREEMPT && !SMP) kernel panic for cpu_startup_entry, that occurs immediately at boot.
DeleteHello Con Kol
ReplyDeleteI have kernel panic or/and freeze when SDDM login manager starting ksplash. Sometimes it is intel drm coredump or network driver coredump under Linux 4.0 CK. I can't successfully login anyway under 4.0 ck kernel. I saw mouse and SDDM login splash begin animation but hangs. Only hard poweroff (push 5 sec. power button) was usefull for poweroff.
After power on and boot from stock kernel I had Recovery Jornal again.
It is something wrong with CK patch for 4.0.
I patched -mainline kernel with GCC patch and working fine - without any issues.
For SSD I used NOOP. For HDD I used CFQ (in stock) or BFQ (in -ck, -mainline) and I don't had Issues. Dynamic changing scheduler udev rules script.
I using this sources (small improvement in PKGUILD and small hidding patches - working like a charm with all kernels): https://github.com/FadeMind/archpkgbuilds/tree/master/linux-ck
Note: I booted from 3.19.4-ck kernel and PASS fine.
( graysky don't commit AUR package update to 4.0-ck jet)
Linux 3.19.4-ck boot fine. Here is dmesg from it: https://pastebin.com/pKitMCcK
Lsmod: https://pastebin.com/zwm3gfHS
inxi -Fxz https://pastebin.com/GQiRMaER
Kind Regards
Tomasz Przybył (FadeMind)
Wow, no idea on that one, sorry.
DeleteTomek, yours problem isn't CK-patch related, rather. I use this patch on linux 4.0 on Arch (with some other patches) with SDDM and it works ok. It's possible, that you are using some linux 3.19 related files to build it, rather and - maybe it's most important - your NVidia proprietary driver isn't for linux 4.0.
DeleteTry to build kernel with CK patch against of configs with Arch's linux 4.0 (in testing) and use it with nouveau or try to build nvidia-ck with patches for linux 4.0 (in testing, too and there is a thread on Arch's BBS about it, too).
... and one more, try this: https://bbs.archlinux.org/viewtopic.php?id=195729
ReplyDeleteThanks for reply.
DeleteI made screenshot about kernel panic: https://dl.dropboxusercontent.com/u/7244180/bug/Zdj%C4%99cie0004.jpg
(sorry - bad quality)
RIP skb_dequeue 0x4b/0x00
It's look like WLAN Card (Qualcomm Atheros AR9485 Wireless Network Adapter) don't have a time for management interrupts and driver just freak out.
Seems BFS CPU have bad config value and connection is too fast...
NVIDIA Drivers what I using are fully compatible with 4.0 kernel.
Take a look at: https://github.com/sirlucjan/aur
DeleteI'm linux-uksm-ck user, and I have AR9485, too - works good.
PS: Wejdź na archlike.darmowefora.pl - może się uda rozwiązać Twój problem.
@ Con Kolivas Intel Users have random kernel panic under Linux 4.0 CK1:
ReplyDeleteplease read this topic: https://bbs.archlinux.org/viewtopic.php?pid=1523304#p1523304
I have exactly the same issue like on this better quality than my screenshot: https://i.imgur.com/VI78toh.jpg
Regards
Hi, I also got kernel panics using archlinux [3] with linux 4.0 + CK1. I was unable to retrieve any useful message from the system logs, but I was able to take some pictures when the kernel panic happened at boot time [1] [2].
ReplyDelete--- system information ---
PC: Samsung NP530U3C-A03IT
IO Scheduler: deadline
CPU: Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz
WiFi card: Intel Corporation Centrino Advanced-N 6235
kernel: linux 4.0 + CK1 with microcode update
[1] http://i.imgur.com/VI78toh.jpg
[2] http://i.imgur.com/DeYitN7.jpg
[3] https://bbs.archlinux.org/viewtopic.php?pid=1523363#p1523363
I too tried the archlinux package, ivy bridge.
ReplyDelete3.19 with ck worked for me too, but 4.0 is all sorts of broken.
http://imgur.com/6MGZoiG
http://imgur.com/jLDENN9
Here is one full dmesg: http://pastebin.com/raw.php?i=kRcYiez1
I also have intel wifi:
04:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24)
Me2:kernel panics & system freezes with linux-ck-sandybridge 4.0-1 archlinux x86_64
ReplyDeleteNo problems found with linux-ck-atom 4.0-1 archlinux i686
DeleteWith linux-ck-haswell (4.0-1) on Arch Linux, I see SATA bus related errors (failed command: WRITE FPDMA QUEUED) and SATA link resets which let all mounts stall. The kernel does not panic though; I can escape with Ctrl+Alt+Del (also no SysRq needed).
ReplyDeleteI have posted the relevant syslog part on the Arch Linux forums: https://bbs.archlinux.org/viewtopic.php?pid=1523519#p1523519.
For those with instability, can you try the following patch on top please: bfs462-remove_unlocked_unplug.patch
ReplyDeleteJust tested with this patch, and I'm still having issues. Grabbed this from attempting to boot with it: http://pastebin.com/05pQJ3Mt
DeleteNo wireless involved, just ethernet, for what that's worth.
Thanks for testing it. It was a long shot anyway and your code path doesn't remotely look related. I don't have any further leads on your trace at this stage.
DeleteI also tested the patch and also for me it does not solve the issue. However I was able to get some information from the journal, maybe they can be useful:
Deletehttp://paste2.org/9cm5XPsm
http://paste2.org/dhxkazXP
http://paste2.org/04F17z3s
http://kr4d.com/rack/uploads/IMG_20150427_205003-A9vVmBG.jpg
ReplyDeleteA few days ago, i have installed a system-monitor tool. I've noticed that the CPU usage is strange with bfs. Core 1 is much less used than the others. The utilization of core 2 is better, but most of the work is done by core 3 + 4. As an example, the CPU utilization during the compression of a large file (left is with ck1, right is with cfs):
ReplyDeletehttp://imgur.com/ORSIGmV
copy of many files from HDD to USB-HDD with ck1:
Deletehttp://imgur.com/fcdxC0o
time tar -cjf archiv.tar.gz manjaro-kde-0.9.0-pre5-x86_64.iso
with ck1:
real 8m50.373s
user 8m31.739s
sys 0m3.263s
with cfs:
real 8m28.860s
user 8m9.377s
sys 0m7.819s
(in my previous posting I've compressed the file with ark (KDE))
That is interesting. Are you using nice levels at all anywhere in your environment or are they used automatically at all by your applications in question? Alternatively is there anything that might be setting CPU affinity for your applications?
DeleteNo, I do not use nice levels.
DeleteBut now that you mention it, i remember, that i have played with /sys/block/sd*/queue/rq_affinity a while ago. And ***, it still stands at 2, i have forget to reset the value back. I'll test it again with the default value.
Also, i have irqbalance installed. I'll test it without. If that does not work, i will build a new kernel, without any patches, only BFS.
thx
Unfortunately, all without success.
DeleteAny ideas?
Because you've made a lot of changes to bfs, maybe I should retest a previous version?
There actually aren't a lot of changes, just syncing with mainline changes. Definitely try an earlier release if you can to see if it behaves differently.
DeleteCK - A growing body of evidence seems to point to disabling NUMA as a cause of the panics under linux 4.0.x with ck1. I will report back once additional folks have a chance to test. At least 3 users have now reported no panics when NUMA was left enabled. More to come.
ReplyDeleteThanks graysky. While most of us don't actually enable NUMA in their config in the first place, it might help point to where in the code the fault lies.
DeleteNot NUMA related but here's a small change that is worth trying: bfs462-ist-change.patch
ReplyDeleteThe other test patch has been removed as it's of no use.
Thanks CK. I have incorporated this patch into 4.0.1-5-ck and asked those affected users to test it. Link to discussion thread.
DeleteOK. Several users (five as I type this) have reported that when NUMA is enabled and they are running linux-v4.0.1 + CK1 + bfs462-1st-change.patch, the kernel panics are back: discussion thread with details.
DeleteHi Con,
ReplyDeleteI would only confirm, that zen-kernel 4.0.1 with BFS (and BFQ) is running fine (no numa) here.
Running on i5 and i7 without a problem.
Regards sysitos
Ok, I must do an alteration to my posting.
DeleteIt seems only, that these kernels with BFS were stable, but during heavy IO (and network IO), all my machines with BFS are crashing.
First seen on my server, writing data over the network to the RAID5 NFS share leads to an crash. But it occurs also on my desktop machine, but more rare. Enabling/Disabling NUMA doesn't help. Only disabling BFS works. Tested with zen Kernel 4.00 .. 4.03.
So my last working kernel with BFS (on my server) is 3.17.x. Starting with 3.18 the crashes were starting. After your 3.18 patch to resolve it, the problems were evidently gone (as written by my already on your side), or only not enough stress tested by me.
Maybe the problem with the actual kernel is located in these old changes from 3.18, but this is only my guess.
PS: I know, the kernel line 3.17 is out of support, but I prefer at the moment an old kernel with BFS over an actual with CFS ;)
PPS: BFQ is enabled too in zen. But this doesn't affect the crashes (already tested)
Regards sysitos
Interesting. I was having some crashes recently while experimenting with btrfs commands on an external drive (mainly scrub or btrfs-convert). I was starting to think it could be due to bfs bugs and you have the exact same problem. Do you know if the rtmn patch fixes this bug?
DeleteOr maybe the update-inittask patch? Or NUMA? It would be nice to be using kernel 4.0 with BFS instead of reverting back to 3.18
DeleteTested different patches mentioned here, also the Numa, but no succes. At the moment using the actual zen kernel without BFS. And its working too ;) Think that BFQ is enough for a server.
DeleteIn my opinion, the break with BFS and heavy IO started with 3.18.
Regards sysitos
Is anyone else experiencing build failures compiling 4.0.2 and ck1?
ReplyDelete...
CC [M] drivers/net/wireless/rtlwifi/rtl8821ae/table.o
LD [M] drivers/net/wireless/rtlwifi/rtl8723be/rtl8723be.o
CC [M] drivers/net/wireless/rtlwifi/rtl8821ae/trx.o
LD [M] drivers/net/wireless/rtlwifi/rtl8821ae/rtl8821ae.o
LD drivers/net/wireless/built-in.o
LD drivers/net/built-in.o
LD drivers/built-in.o
LINK vmlinux
LD vmlinux.o
MODPOST vmlinux.o
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
LD init/built-in.o
arch/x86/built-in.o: In function `pvclock_init_vsyscall':
(.init.text+0x1744e): undefined reference to `register_task_migration_notifier'
Makefile:937: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1
Here's a fix for that build failure: bfs462-rtmn-fix.patch
ReplyDeleteWill be interesting to see if this is somehow related to the crashes too, though I still haven't figured out why people are getting them.
There's also this missing change from the original release: bfs462-update_inittask.patch
ReplyDeleteThanks for the patches. The users are reporting no effect with these two patches (ie still kernel panics) when NUMA is disabled. Just like before, if we enable NUMA, no one has reported a panic. I don't know if the NUMA status + CK1 is to blame for the panics or if it merely catalyzes them. We stand by to test any other patches you can offer up.
DeleteThanks as always. As I've been trying to say, NUMA is just papering over the issue as I don't expect anyone should have to enable numa for an ordinary kernel to work. However I don't actually know what the issue is so enabling numa is a decent workaround till I happen to find whatever it is. There is no numa specific code in the latest kernel so it's sheer coincidence and so far the circumstantial evidence points to the assembly changes in do_fork for x86 being responsible somehow. What exactly, I don't know and finding time to go through this with a fine toothed comb is hard.
DeleteRelated to SMP? Seems as though booting with maxcpus=1 stops the panics.
Delete@graysky
DeleteFrom your thread, I have noticed that -3 test kernel with my -gc patches set but NUMA disabled seems to work? Right?
If this is true, would you please try the -gc patches upon v4.0.2, if it was still confirmed true, please narrow the patch set down to this commit
https://bitbucket.org/alfredchen/linux-gc/commits/54665090c191462f8dd3c1aaedbeea17bef6edfc?at=v4.0.2-gc
this should be the only difference introduced in 4.0 release.
@AC - Not quite...
Delete4.0.1-3-ck has NUMA enabled and uses your patches --> no panics
4.0.1-4-ck has NUMA enabled and does not use your patches --> no panics
4.0.1-5-ck has NUMA disabled and uses CK's attempted patches --> panics
This trend was also confirmed in 4.0.2...
4.0.2-1-ck has NUMA disabled and uses CK's attempted patches --> panics
4.0.2-2-ck has NUMA enabled and uses CK's attempted patches --> no panics
So the only common thread I am seeing is NUMA disabled = panics :/
Do you still feel that using that commit + CK1 + NUMA disabled would be a worth-while experiment?
@graysky
DeleteWould you please double check the PKGBUILD of -3 and -4, as I can see, there are diff in the NUMA config session.
You're correct that there was a minor difference in the code, but in each case, the variable "$_NUMAdisable" was undefined thus the corresponding sed lines did not get called.
DeleteFor -3: see that line 19 was not defined and see line 164 where if the length of the var is non-zero, only then does the sed lines get called to disabled NUMA. Result = NUMA is enabled.
For -4, I actually commented out line 19 and lines164-178. Result = NUMA is enabled.
Thanks the explanation. Seems that you guys have to live with NUMA, but as CK said, it's not reasonable.
DeletePending/queued/review upstream changes:
ReplyDeletehttp://marc.info/?l=linux-kernel&m=143101842121867&w=2
[PATCH] sched/preempt: fix cond_resched_lock() and cond_resched_softirq()
Jumped right at me while taking a random look at the mailing list ;)
So what is going on the 4.0x BFS patch is causing kernel issues?
ReplyDeleteIf I knew, I'd fix it.
DeleteHi ck,
ReplyDeleteWOW fast answer!
Sorry, I meant also is the 4.0x kernel by itself a problem, or this is only because of BFS patch?
Ahhh sorry to hear mate! No worries don't let it get ya down, you've always done a great job, I'm sure you'll get it! :)
ReplyDeleteWell I am compiling 4.0.3 on my box the moment with these patches applied;
4.0-sched-bfs-462.patch
bfs462-rtmn-fix.patch
bfs462-update_inittask.patch
And I don't have NUMA compiled in...
I'll be rebooting with it in just a few minutes so I'll report back anything...
Cheers
Hi ck,
ReplyDeleteShould I use all 3 patches?
4.0-sched-bfs-462.patch
bfs462-rtmn-fix.patch
bfs462-update_inittask.patch
Also when are people experiencing the crashes, right no boot up, or running X?
I have an i7-3610QM and so far I have booted it up 3 times and I'm typing in X running it, all running good at this point in time...
I also use this in my autostart up for Openbox when I log into X;
sudo schedtool -n -20 -I `pidof X`
It's either/or. Either you work perfectly well or crash repeatedly. If it works for you now it won't start crashing.
DeleteHi ck,
DeleteOk looks good here...
Does it help for you for any of my system specs since it runs good for you?
If you need anytning let me know...
Also should I be using all 3 patches?
4.0-sched-bfs-462.patch
bfs462-rtmn-fix.patch
bfs462-update_inittask.patch
Cheers
Hey ck; for your reference, on my uniprocessor setup, which happens to be using graysky's Arch AUR package, I tested using NUMA enabled to see if it also resolved my boot panic like SMP does, in case they might be related. But no...I'm running 4.0.3 now: NUMA off, SMP/HT on.
ReplyDeleteHi ck,
ReplyDeleteI'm back, same mate from yesterday, and today X locked up on me. So looks like it is giving me some problems...
Hmm crap, guess I'll stick with 3.19x until this gets worked out.
Keep up the good work!
I'm now going back to re-test 3.19.8, to see if 4.0.x is such more unreliable with TuxOnIce. At least this happens to someone (me).
ReplyDeleteBR, Manuel Krause
OK, this issue is solved for me now. And it's most probably not related to BFS/CK (maybe triggered more often, though).
DeleteI was able to get rid of the unreliability of TuxOnIce {resume often hanging @"Doing atomic copy/restore"} by changing .config options related to my graphics driver:
The only working combination is to compile DRM into the kernel and i915 as a module (not both into kernel and not both as modules).
Tested on 4.0.4+BFS, 4.0.4 with -gc branch and 3.19.8 with -gc.
Best regards,
Manuel Krause
4.0.5 release removed rt_mutex_check_prio() in favour of rt_mutex_get_effective_prio(). I've fixed BFS for it here:
ReplyDeletehttps://github.com/pfactum/pf-kernel/commit/e32654bb6748455fc112ac6868bec0f9de67c061
Thank you very much! It works well for me on top of Alfred's current -gc.
DeleteBTW, can someone check whether this commit is also worth applying to BFS/CK ?:
"sched: always use blk_schedule_flush_plug in io_schedule_out"
https://github.com/torvalds/linux/commit/22f546a33bac11aea8af5e570f296234ecdd60d4
BR, Manuel Krause
I meant "adapting" or "adopting" rather than "applying". Now you know what I wanted to express. ;-)
DeleteCk-patchset for 4.1, please
ReplyDeleteAny progress on fixing the issues with the patch? I haven't had any problems with the patches because I've always had NUMA enabled (enabled by default on ubuntu configs), but I would like to see this issue resolved so we can move on to kernel 4.1.
ReplyDeleteI have ported BFS0462 to 4.1, please check it out at http://cchalpha.blogspot.com/2015/06/time-to-have-fun-with-kernel-41.html and have a try.
ReplyDeleteBR Alfred
I've used all 22 patches from your repository for kernel 4.1.3:
DeleteKernel is running fine on OpenMandriva and ROSA Desktop
http://mib.pianetalinux.org/forum/viewtopic.php?f=38&t=4602
What is the easiest way to get the patch(es)?
ReplyDeleteGalen
I should have been more specific. I meant, what is the easiest way to get your patches, Alfred?
DeleteGalen
Learn some Git!? :)
DeleteNext time maybe I can spend some time to pack them in a single patch file for download.
Thanks for your reply. I have experimented a bit with git, but I don't use it frequently, so I tend to forget. ;) I do some rpm packages for PCLinuxOS, so it is much more convenient to have a versioned patch file that can be added to the src.rpm.
DeleteThanks for all of you work. I will likely just wait for the next official -ck release.
Galen
Can we at least get a status update please?
ReplyDeleteSure. Nothing's happened for 3.1, and no progress has been made on fixing the non-numa build bug. As usual when I start working on it I will finish working on it shortly afterwards, but so far I've been too busy to do anything.
DeleteThank you. Much appreciated!
Deletetell me bfs in 6 cores (intel 5930) its better than the other options?
ReplyDeleteanyone have benchmarks CFS vs BFS in 6 physical cores? (not old benchmarks)
ReplyDeleteSend me a machine with 6-cores and I would be glad to test it for you :) The benchmarks I published nearly 3 years ago include a dual quad machine with and without HT enabled. If you are so including, you may use these underlying bash scripts to benchmark your 6-core machine and post the data here. I am happy to plot the results for you.
Delete