A couple of issues showed up with BFS 0.422, one being the "0 load" bug and the other being a build issue on non-hotplug releases. So here is BFS 0.423 and 3.4-ck2 (which is just ck1 with the BFS update) which should fix those:
and the increment only:
Good experience linux-3.4.2-bfs-423ReplyDelete
No slow down after some hours as with linux-3.3.
I suspect the sys-time bug before had issued some side effects, but that is just my FUD ...
Con Kolivas, thank you for your work!
Thanks Con - you are really quick :)ReplyDelete
What did you mean in "linux-3.4.2-12queuePatches-bfs423-full-ck2" with '-12queuePatches' in the other Blog-thread?! Something away from my radar?
Just gonna reboot now with 3.4.2+bfs-423+BFQ+mm-drop_swap_cache_aggressively.patch.
I dont use BFQ, but I run my system with
12 patches from stable-queue
all other ck2 patches
"12 patches from stable-queue"Delete
O.K. Your Distro is different from openSUSE, obviously. Also, openSUSE patches things in that I'm not aware of anyways.
Let's give my new kernel some uptime.
and many thanks to Con Kolivas for providing us with his works' effort
BTW, thinking about what Ralph Ulrich wrote in the other Blog thread, he felt the need to learn benchmarking...ReplyDelete
Do we have any tool to really benchmark "interactivity" within Linux desktop systems?
Isn't it like only having reports, that things got better/worse? (up to today)
@ Con Kolivas:ReplyDelete
A big THANK YOU for this BFS-423 patch. It really makes a difference. Now having it running for almost 22h and there is no slowdown as noticed with previous kernel&BFS combos (also mentioned by Ralph).
In addition the recovery time of suspected swapped out content of the desktop after making use of possibly swapped-out-shmfs is greatly reduced on here now.
It also looks like that the base CPU load of usually running processes on here has also dropped.
I don't have any insight on how this is related to your patch improvement since 422, but: Very nice experience, indeed.
Again, many thanks for your work!!!
@ Con Kolivas:ReplyDelete
NO, nothing o.k. at all.
I again got a complete system failure (like with 3.4.2 plus BFS 422). Just some minutes after my last posting, while only watching something from disk via vlc.
There's nothing in the logs, the machine hang up completely as last time.
Please, inspect the differences in the transition in detail.
For me in the meanwhile, I'll compile with a fresh install and will come back.
Again I got a complete lockup without any obvious reason (nothing special done, clearly nothing in the logs) yesterday. Around 24h of uptime again.ReplyDelete
Time to revert to 3.3.8 + D.H. patches and wait for 3.4.3. and BFS 425.
I am runing linux-3.4-bfs-423 for days without errors. And of today with linux-3.4.3rc1. This is still the better than mainline Linux!ReplyDelete
At the LKML Hillf Danton is publishing patches every now and then. These are bfs-420 based - named bfs 421 - but Linux-3.3 is dead now - end of lifetime.
If I hadn't have had so many random lockups with 3.4.x + BFS I wouldn't have had to have written about.ReplyDelete
At least 3 lockups in four days. That has never occurred that often, as of before 2.6.39.
I'm now using 3.3.8 with SLUB (instead of my previous SLAB) + my usual BFS setup + all of Hillf Dantons recent patches. Just to find the same lockups if they're not 4.2.* or BFS 422/423 related.
If it's kernel-3.4 related, I would not suffer.
Just a reminder, I do not consider Hilf's patches part of BFS.Delete
NOT only considering.Delete
If H.D. wants his work to become famous he should pack his stuff together and push them to any website.
Why do you insist on using Danton's patches? BFS works great without them.Delete
I do definitely not insist in Hillf Damon's patches per se.Delete
But, including Con Kolivas' self confident reply, what should I do now if 3.4.2 with BFS aborts after 24h? Without any helpout from him or others?
That's the only reason for me to go back to & propagate the last known good.
@Chen / X: Don't make up a flame against H.D. At least his patches work. Your first ones were simple NOGOs.
I've inspected my changed .config now really carefully as of 3.4.2 vs. 3.3.8, for possible wrong automatic choices. Dunno. There are many diffs, but nothing I'd suspect to be the culprit.
Alas, there's nothing to debug. A lockup after 24h without any logs is very non-specific and gives me nothing to work off. How does it compare to 3.4.2 withOUT BFS with the rest of the config the same?Delete
Have I made flame on H.D. ? No.
I am giving advice to H.D. that he suppose to pack up his works and push it to any kind of website(e.g www.danton.org, GoogleCode, github, ...) It is more better for users to review what you have done with your work.
Try 3.4.1 instead. I also have "lock-ups" with 3.4.2. The lockup comes from X11 ("EQ overflowing" infinite loop.) I have no idea whether this is BFS or NVidia binary driver related. It's so rare, that I couldn't be bothered to find out :-P Going to 3.4.1 cured it.Delete
@ Con Kolivas:Delete
Yes really, it's a pity that the system just simply stops working. I like to have provided you with some more useful BUG messages if it had been possible.
Now I'm running the 3.4.2 with a slightly different config WITH BFS as I saw I may have messed up some settings on the way from 3.3.8, 3.4.1 to 3.4.2. In the most dumb case someone may have shot me off from the web due to a malfunctioning firewall, in which case I would need to apologize for the noise that I made. But it's only up for 3h now.
In the next step I'd compare with the same kernel withOUT BFS as you suggested if it locks up again. BTW, there isn't a kernel command line switch to choose the CPU scheduler (like with the I/O schedulers) ?
Thank you for responding,
Compile Kernel with CONFIG_LOCKDEP=y
When it locks up, do the thing with Magic SysRq key
and have it display all held locks and backtraces off
My openSUSE kernels predefine DEBUG_KERNEL=y if I set EXPERT=y. And there's then already set CONFIG_LOCKDEP_SUPPORT=y, too.Delete
Did you mean this one? I don't have a pure "CONFIG_LOCKDEP" in 3.4.2.
But this wouldn't help any further anyways, when the _machine_ locks up (difference would be if only the kernel failed). I've even rechecked that there hadn't been any bad temperature issues and the hardware didn't change for months.
Mmmh, there's 3.4.3 out now: Should I wait for the next lockup (after only ~9h uptime) or just try the new kernel?Delete
But, let's give it some uptime...ReplyDelete
12 hours are nothing.
P.S. It should read "if they're not 3.4.* or BFS 422/423 related".
So, 3.3.8 with BFS & _SLUB_ are now at 26h of uptime.Delete
Saying, the lockup after 24h is not caused by SLUB in 3.3.8.
Subject: BFS-O(1) is now a correct algorithm.ReplyDelete
Con please take a look of this mail. ;-)
Recently linux-3.4.4rc-bfs I had some top time overflows atReplyDelete
Is this rpc related?
I am just deleting the one patch
I see about that and try again ....
@ Con Kolivas & all other on here as well:ReplyDelete
I've now spent some days chacking and reverting some config changes I made between 3.3.8 and 3.4.2/3 and testing the resulting kernels whether they run longer then 24h. One of them hardlocked after almost 32h.
Then I set CONFIG_JUMP_LABEL back to n (like I had with 3.3.8). And this one, including BFS, ran longer than 49h.
Would you consider that this option may harm the BFS or something else in such a way that the machine hardlocks? (gcc version is 4.6.2)
Does someone else have experience with this option?
Now I'll still need to test the standard scheduler with this option set to y although I don't like to run kernels without BFS. ;-)
Does BFS have some timing dependencies with side effects?ReplyDelete
Last help text sentence
Optimize very unlikely/likely branches
update of the condition is slower, but those are always very rare.
PS: I had also disabled this option
This is a low level change to how inbuilt expect functions are compiled by gcc into assembly utilising an x86 feature. Can't see how this is affected by BFS directly or indirectly.Delete
Yes, that is what I wanted to answer to Manuel. And I looked it up, when I saw that I myself had this disabled. And I have this rcuc Kernelthreads systime shows going crazy. I just compile my kernel with enabled CONFIG_JUMP_LABEL. Perhaps this gives more time for BFS to behave normal. As the last sentence in the help of the option says:Delete
"update of the condition is slower"
Con, wasn't it you, who looked into Solaris code just to recognize it as a saner framework? Which could easily come to the conclusion Linux source is more vulnerable to side effects... RalphDelete
And the other way round?: May _this option_ affect the way BFS works?ReplyDelete
IIRC, it made BFS snappier on my old hardware. But that's subjective. Perhaps Ralph would share his experience with us.
Thank you for your replies!
Manuel, this Option normally brings a performance boost. This is why I disabled it at first to have a more stable experience. But the last sentence in the help of the option: in rare cases there is a slow down to update conditions.ReplyDelete
This must have a side effect for BFS: I have no issues with BFS since I enabled this Jump_Label optimization.
Ciao from happy soccer Germany, Ralph
The "slow down" it mentions is when the branch is the opposite of what is predicted. The idea is that there is a branch point where we know that 99% of the time we do code A and 1% of the time we do code B. Normally it would cost a little overhead to do code A and a little more overhead to do code B. With this optimisation feature enabled, it costs NO overhead to do code A and MORE overhead than before to do code B.Delete
This has nothing to do with BFS.
Con, our issue is not performance here:ReplyDelete
1. Manuels system halts after a day
2. me (shutting down ervery night)
ps -e -o pcpu,bsdtime,stat,comm --sort -pcpu
gives me a rcuc/0 thread with 175 Million seconds run time
in rare cases (after hours).
Just short to correct the above:Delete
With BFS and CONFIG_JUMP_LABEL = y my system halts after 23-32h.
With BFS and CONFIG_JUMP_LABEL = n my system keeps running after 49h.
Now running standard scheduler with this CONFIG_JUMP_LABEL = y. 16h so far. I hope it breaks soon, I don't like it's experience.
And: I don't claim that it has anything to do with BFS, I only asked if it could possibly be so.
So what about your last test with CONFIG_JUMP_LABEL, Manuel? Results?Delete
Yes, yes. I'm still waiting for 3.4.3 + standard scheduler with CONFIG_JUMP_LABEL = y to fail. But even with 65h of uptime it keeps running without issues (but, of course, with a worse interactivity than BFS). Side note: The CFS kernel does slow down after a certain time running, too.Delete
Don't know what to do now. Meanwhile I compiled a 3.4.4 + BFS + CONFIG_JUMP_LABEL, ready for next reboot.
Hi Con, I have Hard freezes with BFS,ReplyDelete
I use zen kernel (with BFS and BFQ, linux 3.4.4). Command "mkfs.ext4 -L Diskname -c -v /dev/sdb1" (SATA drive on esata connector) leeds to an hard freeze within minutes during bad block test. Not even the Magic SysRq keys do work anymore.
So I compiled zen with CFS, no problem, no freeze.
Zen with BFS but without BFQ -> freeze.
So I tested it with Vanilla Kernel 3.4.4 and CK2 Patches -> freeze too.
Vanilla Kernel 3.4.4 with your patches but without BFS -> no freeze!
PS: I even tested to disable in the discussion mentioned CONFIG_JUMP_LABEL and BFS, but freeze too.
I use OpenSuse 12.1 and with the original Tumbleweed kernel 3.4.3 there is no such a problem.
Short Q: Which of your combinations did you test CONFIG_JUMP_LABEL disabled?Delete
short answer: Only the Vanilla Kernel with CK2 patchset and CONFIG_JUMP_LABEL disabled.
some more tests:
1. Hard Freeze with BFS and USB(2) Stick badblock test (command: "mkfs.ext2 -L "Stickname" -m 0 -c -v /dev/sdb1"). Freeze after approx. 5 Minutes and 30%. (remark: with esata drives the freezes comes within 2 minutes)
2. Starting in Runlevel 3 does not make a difference, freeze too with badblock test command (and nothing other tasks running)
Btw. running the command with CFS and the new RIFS from Chen do not have this problem.
Con, if you need additional infos or requests for this bug, I could try to help you.
Thanks and regards
Just adding to my reports.ReplyDelete
I abandonned the 3.4.3 CFS + JUMP_LABEL test after 3d0h10m. Rock stable but really predictably unresponsive.
The 3.4.4 with BFS(only) +JUMP_LABEL crashed after ~8h.
Two days ago I had a complete lockup with that kernel+config but WITHOUT JUMP_LABEL after some hours. So it really has nothing to do with that config setting.
I've tried RIFS but that doesn't work on non-SMP systems at all.
I feel a bit unsafe at the moment, when using BFS-patched kernels.
Ok looking at the pattern of lockups people are having, I'm reasonably sure it's the block plugging code which I changed going into this BFS release. I will put together an update soon that backs out those changes to the old proven mechanism. Thanks everyone for your bug reports.ReplyDelete
thanks for your work. You are right. I recently tested the Vanilla 3.3.8 with the old -ck1 patchset with the old BFS and my mentioned problems with the badblock test during mkfs are completly gone.
So its really the new BFS code.
Thanks so far.
This comment has been removed by the author.ReplyDelete
This comment has been removed by the author.Delete