Sunday 5 December 2010

Automated per session task groups comment.

Against my better judgement I sent an email to lkml. Here's the transcript since it summarises my position.

---
Greets.

I applaud your efforts to continue addressing interactivity and responsiveness but, I know I'm going to regret this, I feel strongly enough to speak up about this change.

On Sun, 5 Dec 2010 10:43:44 Colin Walters wrote:
> On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
> wrote:
> > What's your point again? It's a heuristic.
>
> So if it's a heuristic the OS can get wrong,

This is precisely what I see as the flaw in this approach. The whole reason you have CFS now is that we had a scheduler which was pretty good for all the other things in the O(1) scheduler, but needed heuristics to get interactivity right. I put them there. Then I spent the next few years trying to find a way to get rid of them. The reason is precisely what Colin says above. Heuristics get it wrong sometimes. So no matter how smart you think your heuristics are, it is impossible to get it right 100% of the time. If the heuristics make it better 99% of the time, and introduce disastrous corner cases, regressions and exploits 1% of the time, that's unforgivable. That's precisely what we had with the old O(1) scheduler and that's what you got rid of when you put CFS into mainline. The whole reason CFS was better was it was mostly fair and concentrated on ensuring decent latency rather than trying to guess what would be right, so it was predictable and reliable.

So if you introduce heuristics once again into the scheduler to try and improve the desktop by unfairly distributing CPU, you will go back to where you once were. Mostly better but sometimes really badly wrong. No matter how smart you think you can be with heuristics they cannot be right all the time. And there are regressions with these tty followed by per session group patches. Search forums where desktop users go and you'll see that people are afraid to speak up on lkml but some users are having mplayer and amarok skipping under light load when trying them. You want to program more intelligence in to work around these regressions, you'll just get yourself deeper and deeper into the same quagmire. The 'quick fix' you seek now is not something you should be defending so vehemently. The "I have a solution now" just doesn't make sense in this light. I for one do not welcome our new heuristic overlords.

If you're serious about really improving the desktop from within the kernel, as you seem to be with this latest change, then make a change that's predictable and gets it right ALL the time and is robust for the future. Stop working within all the old fashioned concepts and allow userspace to tell the kernel what it wants, and give the user the power to choose. If you think this is too hard and not doable, or that the user is too uninformed or want to modify things themselves, then allow me to propose a relatively simple change that can expedite this.

There are two aspects to getting good desktop behaviour, enough CPU and low latency. 'nice' by your own admission is too crude and doesn't really describe how either of these should really be modified. Furthermore there are 40 levels of it and only about 4 or 5 are ever used. We also know that users don't even bother using it.

What I propose is a new syscall latnice for "latency nice". It only need have 4 levels, 1 for default, 0 for latency insensitive, 2 for relatively latency sensitive gui apps, and 3 for exquisitely latency sensitive uses such as audio. These should not require extra privileges to use and thus should also not be usable for "exploiting" extra CPU by default. It's simply a matter of working with lower latencies yet shorter quota (or timeslices) which would mean throughput on these apps is sacrificed due to cache trashing but then that's not what latency sensitive applications need. These can then be encouraged to be included within the applications themselves, making this a more long term change. 'Firefox' could set itself 2, 'Amarok' and 'mplayer' 3, and 'make' - bless its soul - 0, and so on. Keeping the range simple and defined will make it easy for userspace developers to cope with, and users to fiddle with.

But that would only be the first step. The second step is to take the plunge and accept that we DO want selective unfairness on the desktop, but where WE want it, not where the kernel thinks we might want it. It's not an exploit if my full screen HD video continues to consume 80% of the CPU while make is running - on a desktop. Take a leaf out of other desktop OSs and allow the user to choose say levels 0, 1, or 2 for desktop interactivity with a simple /proc/sys/kernel/interactive tunable, a bit like the "optimise for foreground applications" seen elsewhere. This could then be used to decide whether to use the scheduling hints from latnice to either just ensure low latency but keep the same CPU usage - 0, or actually give progressively more CPU for latniced tasks as the interactive tunable is increased. Then distros can set this on installation and make it part of the many funky GUIs to choose between the different levels. This then takes the user out of the picture almost entirely, yet gives them the power to change it if they so desire.

The actual scheduler changes required to implement this are absurdly simple and doable now, and will not cost in overhead the way cgroups do. It also should cause no regressions when interactive mode is disabled and would have no effect till changes are made elsewhere, or the users use the latnice utility.

Move away from the fragile heuristic tweaks and find a longer term robust solution.

Regards,
Con

--
-ck

P.S. I'm very happy for someone else to do it. Alternatively you could include BFS and I'd code it up for that in my spare time.

---
EDIT:

And just for the sake of it I hacked up what a latnice patch would look like. Of course being unsupported by userspace means there's no point me supporting and promoting this change even on BFS.

http://ck.kolivas.org/patches/bfs/latnice/

28 comments:

  1. Mike Galbraith's reply was a little silly, to say the least, especially considering his closing message was correct and there's no issue with it (besides being written in a derogatory fashion).

    ReplyDelete
  2. "Against my better judgement I sent an email to lkml."

    That was your first mistake. The second one was trying to argue with a brick wall.

    Just keep blazing your own trail. Some of us march to the tune of a different drum with open minds and a genuine desire to make things better. Others can't see the forest for the trees!

    ReplyDelete
  3. I think the most important point for desktop interactivity are priority. And i agree with that syscall and 4 level of priority. Let's just have one interface for the shake of simplicity.

    one scheduler to rule them all will make thing complicated i think.

    ReplyDelete
  4. We all know what's going to happen, so please keep maintaining BFS.

    ReplyDelete
  5. Ah, LKML. I almost forgot what a circus it sometimes is, with all the replies having nothing to do with the core issue and everyone is trolling like no tomorrow. Fun.

    ReplyDelete
  6. Yeah I know what's going to happen. I just wanted to go on record saying I see it as a bad idea. I expect to largely ignore any responses I get since it'll just be chest thumping and I'm over pissing contests.

    ReplyDelete
  7. Con, have you seen Ingo's reply?

    To me it seems reasonable that the OS provides defaults, as long as those defaults can be changed.

    If I understood Ingo correctly this is not as you fear and Mike's patch is not a reintroduction of hard-coded heuristics into the scheduler.

    BFS could use group scheduling as well, to give better interactivity. Have you considered that or is this a CFS-only feature?

    ReplyDelete
  8. "They can be used by users as well, using cgroup tooling."

    I think this would be a problem for common user (non tech-savy one).
    All they understand are double-click the program and run it. The power lies on the developers. On one side, that latnice is a good idea. But on the other side, deskto-aimed distros can use cgroup to specify which apps runs on particular group, and leave the common user at peace.

    On seeing cgroup documentation, it looks kinda strange and complicated.
    I wonder why kernel devs making things more complicated.
    Why not create specific scheduler for specific purpose ? server, desktop, mobile, etc.

    ReplyDelete
  9. "I think this would be a problem for common user (non tech-savy one). All they understand are double-click the program and run it."


    Right, I've read the LWN summary of the previous discussion of this feature and apparently there's a Gnome patch that does something quite similar to that. There's a systemd patch as well that makes use of cgroups. I have not tried them myself.

    Those solutios do not require the user to run or configure anything.

    So it's not getting in the way of user-space solutions.

    BFS devs should listen to lkml feedback here. Sometimes it's a brick wall, sometimes it's a bunch of smart people trying to improve Linux. You are doing BFS users a disservice by ignoring everything that comes from upstream scheduler devs and from Linus.

    Forget the old flamewars and get on with life, instead of seeing a conspiracy in everything.

    ReplyDelete
  10. > Sometimes it's a brick wall, sometimes it's a bunch of smart people trying to improve Linux.

    You can be smart, have good intent, and yet be a brick wall at the same time. Mike is obviously very very smart, but he is a dick that will get beaten up silly in many workplaces.

    > You are doing BFS users a disservice...

    Now you are just being silly.

    ReplyDelete
  11. I do understand full well what the session patch does. I'm not that stupid :( If it becomes the default to be enabled on desktops then it is an heuristic that affects everyone though.

    ReplyDelete
  12. @Tom, as you can see in replays in the LKML, the responsiveness with BFS is even better than CFS with this patch. And Con had done grouping with BFS. The results were good, but in some (random?) situations, mplayer stutters, gnome applets done work correct and so on. He wrote his experience on the LKML, but it seems, that nobody is interested in. Shame.
    Autogrouping is a heuristic. I suppose here something, that the apps in the group are similar. And than the scheduler react. If this hypothesis is wrong, than the reaction is wrong. And come on, don't argue, that most times it's slightly better. (If my car mechanic says, the new brakes are mostly perfect and only in some situations are failing, I will switch my service instantly ;-)). A scheduler should react on the actual situation and not on suppositions, simple priorities should be given by the user (or some defaults by the distribution), because the user knows what he want (most time ;-) ). And that has Con written.

    I hope, that Con will support BFS in the future too. In my opinion (and you can look for tests comparisons for numbers on google) BFS is the best scheduler for linux desktops.

    CU sysitos

    ReplyDelete
  13. I'm satisfied with BFS just as it is. IMNSHO Mike and the other LKML trolls should really get a life... or they can wait until Con comes up with a new idea and copy it as usual.

    ReplyDelete
  14. As I desktop user I still find BFS to be far better than CFS (with or without these new fancy group patches).

    So thanks a lot for BFS, makes the linux desktop a bit more awesome ;)

    ReplyDelete
  15. "@Tom, as you can see in replays in the LKML, the responsiveness with BFS is even better than CFS with this patch."

    That's not the impression I got from reading the thread - got an URL to the reply you are referring to?

    ReplyDelete
  16. It seems obvious to me (and I'm probably wrong so don't take this too seriously), that any latency heuristic should be based around hardware I/O patterns. Reacting to keyboard/mouse input frequently should put your program at a higher priority, and audio I/O above that.

    ReplyDelete
  17. @Tom, sorry must pass. It seems, my mind was wrong. Was looking for the test, but didn't found it anymore on LKML. But I had read some other Weblogs for the 200 lines Kernel patch and I remember, that there were some results. Smiled and forgot than the patch. (But in my Firefox history I didn't find it anymore. Checked some sites. Sorry about it.) Maybe some others can provide some numbers?

    CU sysitos

    ReplyDelete
  18. @Tom, sorry my fault. I had in mind, that I had read it on LKML, but it seems, that it was on another blog. I checked the history log of my browser for the "miracle kernel patch", read again some of the sides, but didn't find it. So sorry. Maybe someone can give some test results?

    CU sysitos

    ReplyDelete
  19. Con, can you stop complaining for once at lkml? Ingo DID reply to your answer, and you are just plain wrong:
    http://lkml.org/lkml/2010/12/5/179

    If you just stepped back and took a deep breath you would understand that Linus and the other excellent Linux developers are listening to you, but you need to a good explanation and not some silly opinion you need to air.

    You are welcome to lkml when you have something of importance to say, instead of only telling others how it should be without providing any deeper technical comments. If not you only seemt to be a whiner. Linus DID create Linux, he knows what he is doing. He is the God of Linux. Many are competing for Linus attention, and sometimes you just don't get his attention. I know it hurts, but please stop complaining on him. Without Linus, you would not be famous, you must pay Linus some respect.

    ReplyDelete
  20. Greetings,

    To Con Kolivas:

    Although these patches might be using a "heuristic" approach, I found some nice perfomance improvements in my laptop. My I/O througpout (in a slow 4200RPM HDD) has increased significantly when copying/moving files between partitions. When I'm browsing with Google Chrome and compiling programs using all my cores, the system seems to be significantly more responsive than it was with the "magic 200+ line" kernel patch. Futhermore, the boot times are slightly faster ATM than they were before. =)

    To Linux lover:

    Yes, I agree we all have to respect Linus as the "God" of Linux, but we shouldn't forget he didn't create Linux only by himself, it was the community that turned linux into something useful. So, I think you're not being a Linux lover, but instead, a linux FUD'er...

    Keep up the good work with your BFS patches CK...

    Kudos from Portugal

    ReplyDelete
  21. @Anonymous. I know those patches do help some workloads. I have never disputed that.

    ReplyDelete
  22. @ck
    I just wondering how BFS distributing load on multicore cpu. I have 6 core AMD CPU. I am running folding with schedtool on core 4,5 and 6 on idle mode. That leaves core 1,2 and 3 doing (barely) nothing.
    When i am streaming 1080p movie from my server, i notice the load from LAN streaming and mplayer distributed on core 5 and 6 aswell. I forced to use schedtool -I -a 0 to make mplayer use core 1 (which is idling). My question is, is there any way BFS can distribute the load to unoccupy core (or least busy ones) automatically without the need -a X schedtool ?

    One other thing, i am guessing you're using intel core with ondemand cpu frequency scaling when developing BFS. IIRC, AMD and mobile CPU is being recommended (according to kernel documentation) to use conservative cpu frequency scaling. If i use conservative, sometimes i notice a pause when playing HD 1080p movie on complex scene. A scene where the CPU clock needs to be kicked in to the highest. So, i am using ondemand instead.

    Thank you for BFS, con ! :).

    ReplyDelete
  23. Hi. BFS distributes tasks evenly across all CPUs much more than a per-cpu design like mainline does. Thus if you use an app which uses affinity unnecessarily, it ends up being less effective. If you are using a distributed computing client that does not use all CPU cores, you're actually better off letting BFS do its own spreading of work to the cores than letting (Folding etc) bind to just some CPUs. Your question about BFS using "least busy ones" does not apply, as that is exactly what BFS does by default: It uses ANY cpu that is not idle in preference to waiting. If you see any CPU ever idle with BFS (unlike mainline) it's because there isn't enough work to keep all CPUs busy.

    As for the conservative governor, I think it is a poorly thought out one by someone who doesn't really understand scheduling. Throttling up the CPU to the fastest frequency whenever there is load is the best way to use CPU time (as the ondemand one does), instead of slowly ramping it up. If a CPU suddenly becomes busy, it actually takes less time and uses less power and generates less heat to do that work at the highest frequency, not a lower one.

    ReplyDelete
  24. @ck
    I understand now. Thanks :)

    ReplyDelete
  25. In testing CFS with the per-TTY patches for a few days I did notice (subjectively) some distinct cases of interactivity improvement. The most marked of these was in scrolling pages in Chromium, whose jerky rendering I had previously blamed on the browser or perhaps GTK. I do know that Chrome does a lot of client-side rendering, dumping large amounts of bitmap data into the X server which is why it's essentially unusable over a network. Might something like that

    All other issues where interactivity is affected seem related to high-ish I/O loads especially when dealing with lots of small files when compiling with ccache or running an svn up. The system sometimes grinds to a halt for minutes at a time thrashing away as though it's swapping even though it isn't, during which I can't even move the mouse. (there's plenty of disk cache headroom.)

    I should disclaim that I'm using BFS by way of the Zen kernel patches which make quite a few other modifications at least optionally. I'm sure that throughput and interactivity are better under BFS in many cases even though I would probably have to benchmark in order to detect them. I'm just curious why things flake out under some specific (and IMHO non-insane -j64) workloads.

    ReplyDelete
  26. hey ck, regarding the lack of userspace support for your approach: would it be possible to hardcode some of the latency values into your code? Ie, is it possible to identify a select few processes and give them latency values? Pick a few audio/video programs and give them high latency priority, and pick a few idle folding@home type programs and give them low priority.

    That way, folks could use and test your approach, if in just a few cases.

    ReplyDelete
  27. @chogydan that would involve heuristics and heuristics fail to work 100% of the time so I'm dead set against them, sorry. Best it's done by userspace.

    ReplyDelete