Friday 26 May 2017

linux-4.11-ck2, MuQSS version 0.156 for linux-4.11

Announcing a new -ck release, 4.11-ck2  with the latest version of the Multiple Queue Skiplist Scheduler, version 0.156. These are patches designed to improve system responsiveness and interactivity with specific emphasis on the desktop, but configurable for any workload.

linux-4.11-ck2

-ck1 patches:

Git tree:


MuQSS

Download:

Git tree:


MuQSS 0.156 updates

- Fixed failed UP builds.
- Remove the last traces of the global run queue data, moving nr_running, nr_uninterruptible and nr_switches to each runqueue. Calculate nr_running accurately at the end of each context switch only once, reusing the variable in place of rq_load. (May improve reported load accuracy.)

4.11-ck2 updates

- Make full preempt default on all arches.
Revert inappropriately reverted part of vmsplit patch.

Enjoy!
お楽しみ下さい
-ck

I seem to have unintentionally deleted the -ck1 post, sorry about that.

34 comments:

  1. 4.11.3-ck1 (and probably other versions too) seems to frequently move the running process(es) between the CPUs. This causes a problem with cpu load accounting and thus cpu freq scaling (conservative governor).

    The symptom is that on 2 CPU system (FUJITSU ESPRIMO Mobile V6555 laptop w/Intel Core2 Duo T6570) with a single cpu-intensive process the frequency does not get raised at all. In this case probably both cores get 50-50% load, which is lower than the 80% default threshold of the conservative governor.

    If the process is pinned to one of the cores, the frequency of the core the process is pinned to rises to the maximum as expected.

    Running two cpu-intensive processes on this 2 core system raises the frequency of both cores as expected.

    Any ideas how to fix this?

    By the way, powertop seems to mess up something in the kernel and frequencies stay low after starting powertop. Changing the governor to something else and then back again to conservative fixes this issue.

    thanks
    Gabor

    ReplyDelete
    Replies
    1. It's intrinsic to the design to minimise latency that tasks will move around to get the lowest latency scheduling for them. If you want it to do that less, disable interactive mode:
      echo 0 > /proc/sys/kernel/interactive

      Delete
    2. Thanks, setting interactive to 0 indeed improves the situation a lot.

      Delete
    3. Thank you both for bringing this up and for clarifying! The combination of symptoms, design intentions and possible successful solution makes it easier to understand how MuQSS works under certain conditions. :-)
      BR, Manuel Krause

      Delete
    4. Indeed disabling interactive mode may also prolong battery life.

      Delete
  2. Ever since I started using 4.11 I keep getting these kernel panics:

    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 449 at net/ipv4/tcp_input.c:2819 tcp_fastretrans_alert+0x8e7/0xad0
    Modules linked in: ip6table_nat nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle ip6table_raw nf_conntrack_ipv6 nf_defrag_ipv6 xt_recent ipt_REJECT nf_reject_ipv4 xt_comment xt_multiport xt_conntrack xt_hashlimit xt_addrtype xt_mark xt_nat xt_tcpudp xt_CT iptable_raw nf_log_ipv6 xt_NFLOG nfnetlink_log xt_LOG nf_log_ipv4 nf_log_common nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ip6table_filter ip6_tables iptable_filter iptable_mangle ipt_MASQUERADE
    nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32c_generic btrfs xor adt7475 hwmon_vid iTCO_wdt gpio_ich iTCO_vendor_support evdev mac_hid raid6_pq nouveau led_class mxm_wmi wmi video psmouse ttm i2c_i801 drm_kms_helper lpc_ich skge sky2 drm syscopyarea sysfillrect asus_atk0110 sysimgblt fb_sys_fops i2c_algo_bit button shpchp intel_agp intel_gtt acpi_cpufreq tpm_tis tpm_tis_core tpm sch_fq_codel coretemp msr nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sd_mod ata_generic pata_acpi serio_raw atkbd libps2 uhci_hcd ehci_pci ehci_hcd ahci libahci usbcore usb_common pata_jmicron mpt3sas raid_class libata scsi_transport_sas scsi_mod i8042 serio
    CPU: 1 PID: 449 Comm: irq/19-enp5s4 Tainted: G W 4.11.3-1-ck-core2 #1
    Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238 09/30/2008
    Call Trace:

    dump_stack+0x63/0x83
    __warn+0xcb/0xf0
    warn_slowpath_null+0x1d/0x20
    tcp_fastretrans_alert+0x8e7/0xad0
    tcp_ack+0xe57/0x14f0
    tcp_rcv_established+0x11f/0x6f0
    ? sk_filter_trim_cap+0xb7/0x270
    tcp_v4_do_rcv+0x130/0x210
    tcp_v4_rcv+0xb39/0xcc0
    ip_local_deliver_finish+0xa1/0x200
    ip_local_deliver+0x5d/0x100
    ? inet_del_offload+0x40/0x40
    ip_rcv_finish+0x1eb/0x3f0
    ip_rcv+0x2b3/0x3c0
    ? ip_local_deliver_finish+0x200/0x200
    __netif_receive_skb_core+0x507/0xa70
    ? tcp4_gro_receive+0x11a/0x1c0
    ? try_preempt+0x160/0x190
    __netif_receive_skb+0x18/0x60
    netif_receive_skb_internal+0x81/0xd0
    napi_gro_receive+0xdb/0x150
    skge_poll+0x397/0x880 [skge]
    net_rx_action+0x242/0x3d0
    __do_softirq+0x104/0x2e1
    ? irq_finalize_oneshot.part.2+0xe0/0xe0
    do_softirq_own_stack+0x1c/0x30

    do_softirq.part.4+0x41/0x50
    __local_bh_enable_ip+0x88/0xa0
    irq_forced_thread_fn+0x59/0x70
    irq_thread+0x12f/0x1c0
    ? wake_threads_waitq+0x30/0x30
    kthread+0x108/0x140
    ? irq_thread_dtor+0xc0/0xc0
    ? kthread_create_on_node+0x70/0x70
    ret_from_fork+0x2c/0x40
    ---[ end trace a181bdf0ee69c250 ]---

    After awhile the computer just hangs.

    ReplyDelete
    Replies
    1. Try it on mainline and if it still happens report it upstream. MuQSS, like BFS, brings out races very easily so it may be hard to reproduce on mainline though.

      Delete
    2. https://bugzilla.kernel.org/show_bug.cgi?id=195835

      It doesn't lead to hang for me, however.

      Delete
  3. I get really high cpu load with 4.11. These processes have huge cpu spikes:

    rcu_preempt
    kworker/u8:
    irq/279-s-iwlwi
    ksoftirqd/1

    My fans are screaming. Stock arch kernel is calm. Can anybody else confirm?

    ReplyDelete
    Replies
    1. My CPU

      processor : 0
      vendor_id : GenuineIntel
      cpu family : 6
      model : 78
      model name : Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
      stepping : 3
      microcode : 0xba
      cpu MHz : 400.000
      cache size : 4096 KB
      physical id : 0
      siblings : 4
      core id : 0
      cpu cores : 2
      apicid : 0
      initial apicid : 0
      fpu : yes
      fpu_exception : yes
      cpuid level : 22
      wp : yes
      flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
      bugs :
      bogomips : 3601.00
      clflush size : 64
      cache_alignment : 64
      address sizes : 39 bits physical, 48 bits virtual
      power management:

      Delete
    2. Try without thread irqs enabled. Could be a driver issue brought out by threaded irqs.

      Delete
    3. Disabling CONFIG_FORCE_IRQ_THREADING? I have still these processes on top.

      Delete
    4. Mmmh, hasn't there been an issue with iwlwifi users some time ago, discussed on here?! Don't remember completely.
      Maybe switching from builtin to module or vice versa may help, or getting a fresh firmware.

      BR, Manuel Krause

      Delete
    5. Blacklisting iwlwifi does not help either.

      Delete
  4. Hopefully I'm not the only to report, but load averages are fixed. While workload is constant, the load average for the last minute now roughly represents the total amount of CPU being used.

    This also fixes high loads while nothing is happening - instead of a solid 1.00 or something close like 0.96, loads while idle basically look correct, either completely 0.00 or under 0.10.

    Thanks Con!

    ReplyDelete
    Replies
    1. Thanks for reporting back. I was pretty sure I'd fixed them but I thought I'd wait for users to confirm :)

      Delete
  5. Yeah I've been logging / graphing (munin, because it works for what I need) ever since the -ck2 bump and when I'm actually AFK and things are idle it does seem to show reasonable "very close to zero" loads. Good job it's working sanely :)

    ReplyDelete
  6. After months i have tried again a kernel patched with MuQSS on a netbook with an atom Z520.

    Again, kernel panic at almost every boot.
    Laptop is working without problems with an unpatched kernel and also with an old kernel with BFS patch.

    Is there anything that i could try?

    ReplyDelete
  7. Fwiw, I've been using MuQSS on my old netbook (Eee 701 with Celeron M ULV 353) for a long time, and with BFS before that. However, mine is UP, vs the Z520's SMT. Perhaps providing the panic info would help. Did you use the vanilla kernel's config?

    ReplyDelete
  8. I noticed something weird about how 'htop' reports CPU% for processes. In comparison, 'top' reports things close to what you'd think it should report.

    I tested with a busy loop in bash like this:

    while true; do true; done

    This shows 100% in 'top' but shows 83% in 'htop' in the CPU% column.

    Then next, I experimented with spawning a bunch of sub-shells with those busy loops, like in this example:

    for x in {1..8}; do while true; do true; done& done; sleep 10; kill $(jobs -p)

    This example is for 8 processes. They run for ten seconds and then get killed.

    I repeated this starting with 1 process and up to 12 processes, and this is what 'htop' and 'top' report in their CPU% column for those processes:

    num, htop, top
    1, 83, 100
    2, 70, 100
    3, 60, 100
    4, 52, 100
    5, 42, 84
    6, 35, 70
    7, 30, 60
    8, 26, 53
    9, 23, 46
    10, 21, 42
    11, 19, 38
    12, 17, 35

    I have a quad-core CPU (and no SMT). The output of 'top' seems to be kind of right, but I have a hunch it's also off and always calculating a result that's double of what's in 'htop'. It might just get clamped to 100%, and that makes the numbers up to 4 look good.

    Kernel is Linux 4.11.3 with 4.11-ck2 patches.

    ReplyDelete
    Replies
    1. I did another experiment with this perl one-liner:

      perl -E 'use Time::HiRes qw/time sleep/; $t = 0.010; while (1) { $t0 = time; while ($t > time - $t0) {}; sleep $t; }'

      This is supposed to be in a busy loop for 10ms, then sleep for 10ms, and then this all repeats. It's supposed to show 50% in the CPU% column of 'top' and 'htop', and it behaves exactly like that with a 4.11.3 kernel using CFS.

      When changing that "$t" in the while loop to "$t/2", "$t/3", "$t*2", "$t*3", it's supposed to result in 33%, 25%, 66%, 75% CPU usage, and that's again what happens with CFS.

      Then going to the kernel using MuQSS, the displayed numbers are jumping around a lot, so I have to guess the average. With CFS, the percentage shown was quite stable. The numbers I see with MuQSS are like this:

      expected, htop, top
      25, 33, 35
      33, 43, 47
      50, 55, 69
      66, 66, 81
      75, 71, 91

      The numbers displayed in top/htop were changing by over 10% from second to second, so they are really just guesses. I wrote down min/max values that I saw and used the average between those two. This wasn't needed at all when testing with CFS where top and htop showed pretty stable numbers.

      Delete
    2. The CPU accounting is done completely differently in MuQSS but you're possibly also seeing the sampling differences between 100Hz kernels and different Hz settings.

      Delete
  9. This is the first time I am using MuQSS and my workload is just focused on virtual machines and compilation tasks (QEMU/KVM).

    I've noticed the following:

    - Same performance like CFS but probably snappier (Just a feeling)
    - qemu/kvm processes are using much more cpu than before:

    These numbers (CPU utilization) were measures with htop while leaving the vms idle.

    Before (CFS):

    Windows 7 Idle: 3-7% CPU
    Windows 10 Idle: 4-8% CPU
    Ubuntu 17.04 Idle: 1-2% CPU

    After (MuQSS-156):

    Windows 7 Idle: 18-38%
    Windows 10 Idle: 11-41%
    Ubuntu 17.04 Idle: 1-3%


    Is MuQSS affecting KVM performance somehow? I am not sure why the Windows CPU utilization are so high but Ubuntu's CPU utilization hasn't changed.

    @ck Do you have any fix for this?

    - Nick

    ReplyDelete
    Replies
    1. The scheduler can't physically make the virtualised operating system use any more CPU and what you are seeing is almost certainly simply sampling error differences between CFS and MuQSS. The CPU accounting is performed differently by both schedulers.

      Delete
    2. Ah, I see. Do you have any idea why this sampling error is affecting the windows vm but not the ubuntu vm?

      And most importantly, is it possible to fix this? Actually, I don't care about the high cpu but the large difference between the ubuntu and windows vm worries me somehow.

      Delete
    3. Windows' timers work differently to linux and I'm guessing they happen to be landing at exactly the sampling points used in muqss. Fixing it is unlikely any time soon without knowing exactly what's causing it and I'm afraid I don't have such spare time to dedicate.

      Delete
  10. Hi, long time BFS/ck patches user here.
    In this patchset my gentoo box has 3 of my 4 cores always at 50% (atop, htop,top report the same).
    1 min load average is below 1. Any idea why? Is this normal?

    ReplyDelete
  11. https://docs.google.com/spreadsheets/d/14nHLMeJXOqxj-mlMk_vdb7yLhLOPRdNS4JkYMU0ArHI/edit?usp=sharing

    Just as I finish benchmarking, 4.11.9 comes out. I'm a very sad man.

    There seems to be some latency regressions from my last test, but I really haven't been keeping track of what the causes are. Colors aside, ck2 still relatively* meets more deadlines.

    Input latency with CFS on Optimus is still quite noticeable with primusrun, but no longer as much with PRIME (no sync); barely any difference with respect to MuQSS.

    ReplyDelete
  12. Hi, using ck patches for long time. I was experience a freeze on Xorg (firefox + gnome) every 2-3 seconds with the default 100Hz tick even before MuQSS so I had manually setting it to 1000Hz. First iterations were better I think but still getting that now with default 100Hz. Even with 1000Hz is still there but way less than before so might not even notice it. Any idea where I Can track down it's source? CPU: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz

    ReplyDelete
  13. Hi, any plans for a 4.12 release ?

    ReplyDelete
  14. MuQSS increases my CPU usage on idling, in other words, not stable, goes from 3 to 50 percent. Without MuQSS the idle is at 0-1% any suggestions?

    ReplyDelete
    Replies
    1. Suggestion is ignore it; it's a sampling issue difference and doesn't actually use more CPU.

      Delete