Saturday 4 November 2023

EEVDF & the mainline linux kernel scheduler

 A number of people have already asked me my opinion on the development of EEVDF on top of CFS for the mainline kernel. All of my previous schedulers - staircase first, followed by BFS, and finally MuQSS, were all EEVDF designs, so in principle at least you can imagine I'm mildly intrigued and pleased with this direction. I think it's the best known way to tackle interactivity and responsiveness in a CPU process scheduler.

Any qualms I may have about it would be the reluctance to move processes/threads from one CPU to another to achieve said goal of tackling the earliest eligible virtual deadline process first. As it is common for processes to be relatively "sticky" to per-CPU runqueues for cache warmth and throughput reasons, this ends up being orthogonal to the demands of scheduling for minimal latency first. 

In my original BFS design there was only one runqueue  for all CPUs which was the optimal design for a global EEVDF design but this would eventually not have scaled to many CPUs in throughput. This led to the development of MuQSS for which I moved to multiple runqueues, but soon found that sharing runqueues for latency reasons was more important than worrying about the last bit of throughput. This is why in configuration it was possible to choose the degree to which runqueues were shared to choose to optimise primarily around throughput or latency - the more sharing, the more latency focused the scheduler behaved. Sharing runqueues between shared cache CPUs provided the best compromise at the time, though modern CPUs have far more cores and threads which all share various levels of cache. 

Much like there are sorting algorithms which excel at different sizes (nothing beats insertion sort for up to ~16 variables), I expect runqueue sharing would exhibit a similar phenomenon and that it would actually be disadvantageous to have many runqueues for small numbers of CPUs. My random prediction based on older anecdotal observation is that number is also up to about 16 threads/cores per runqueue (provided they're all sharing at least some form of cache.)

As I've not looked at mainline kernel code at depth in years, and not at all at this new EEVDF development I cannot comment with any authority at all on the code nor implementation at this stage, but it's certainly an admirable goal and I'm cautiously optimistic about it.

1 comment:

  1. It interests me simply because I *can't use* a kernel that doesn't support cpu cgroups, and I'm sure that applies to zillions of people by now.

    ReplyDelete