tag:blogger.com,1999:blog-6469704299235308349.post8594479449521932730..comments2024-02-09T16:24:46.087+11:00Comments on -ck hacking: BFS issues and linux 3.3ckhttp://www.blogger.com/profile/02904761195451530213noreply@blogger.comBlogger18125tag:blogger.com,1999:blog-6469704299235308349.post-58199643828047639002012-03-23T00:11:19.663+11:002012-03-23T00:11:19.663+11:00No that's fine, thanks! If you were using sche...No that's fine, thanks! If you were using schedtool -D then those numbers are normal and you have not experienced the rare bug. Reboot and enjoy!ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-32525469326869445332012-03-23T00:01:08.989+11:002012-03-23T00:01:08.989+11:00I have just in this moment compiled linux-3.3 with...I have just in this moment compiled linux-3.3 with bfs-420 and want to reboot and test recompile chromium with my Gentoo using<br /><br />PORTAGE_NICENESS=19<br />MAKEOPTS=" -j2 --quiet"<br />CFLAGS=" -march=native -O2 -pipe -Wno-unused "<br />CXXFLAGS="-march=native -O1 -pipe -Wno-unused "<br /><br /><br />in my /root/.bashrc<br />schedtool -D $$<br />I comment this out for next reboot!<br /><br />Or do you want me to test 418 further?<br /><br />Ralph UlrichAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-13471213036052369562012-03-22T22:51:05.549+11:002012-03-22T22:51:05.549+11:00Thanks Ralph. None of the 420 changes will go beyo...Thanks Ralph. None of the 420 changes will go beyond possible fixes for that issue compared to 418 that you're already running. I see you have some at 42+ in your testing, but I need to confirm that you have not started anything SCHED_IDLEPRIO as well, as those numbers would be normal for that. The other things is to experience a dramatic slowdown with everything else while it happens. Do you have that?ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-72766906508840698682012-03-22T22:47:21.336+11:002012-03-22T22:47:21.336+11:00I see a new BFS-420, but this my experience was wi...I see a new BFS-420, but this my experience was with bfs-418, therefore I tell it here. This night I compiled chromium on my mac-mini dual-core 64bit machine and using nice, which resulted htop:<br /><br />1734 ral 1 0 374M 19396 8360 S 0.0 0.5 1:34.62 ├─ /usr/bin/yakuake <br /> 4599 ral 1 0 24132 2416 1864 S 0.0 0.1 0:00.34 │ ├─ /bin/bash <br />10703 root 1 0 43372 1416 1072 S 0.0 0.0 0:00.01 │ │ └─ su -<br />11165 root 42 0 24928 3356 1964 S 0.0 0.1 0:00.15 │ │ └─ -su<br />26710 root 47 0 24248 2488 1436 R 4.0 0.1 0:01.50 │ │ └─ htop<br /> 4586 ral 1 0 23960 2220 1752 S 0.0 0.1 0:00.00 │ ├─ /bin/bash <br /> 4612 root 1 0 43368 1412 1068 S 0.0 0.0 0:00.01 │ │ └─ su -<br /> 4626 root 42 0 24928 3360 1928 S 0.0 0.1 0:00.73 │ │ └─ -su<br />30480 root 42 19 249M 146M 3760 S 0.0 4.0 0:37.86 │ │ └─ /usr/bin/python3 /usr/bin/emerge -auvDN wor<br />11059 portage 42 19 4216 608 472 S 0.0 0.0 0:00.03 │ │ └─ [www-client/chromium-17.0.963.83] sandbo<br />11077 portage 42 19 28892 4784 1704 S 0.0 0.1 0:00.07 │ │ └─ /bin/bash /usr/lib64/portage/bin/ebui<br />11122 portage 42 19 29036 4104 944 S 0.0 0.1 0:00.00 │ │ └─ /bin/bash /usr/lib64/portage/bin/e<br />11138 portage 42 19 26532 2376 1652 S 0.0 0.1 0:00.00 │ │ └─ /bin/bash /usr/lib64/portage/bi<br />11140 portage 42 19 404M 387M 1012 S 0.0 10.5 0:31.78 │ │ └─ make -j2 --quiet chrome chro<br />29272 portage 42 19 12680 1368 932 S 0.0 0.0 0:00.00 │ │ └─ /usr/x86_64-pc-linux-gnu/<br />29274 portage 42 19 21380 6000 1232 S 0.0 0.2 184467440│ │ ├─ /usr/lib/gcc/x86_64-pc<br />29273 portage 80 19 170M 141M 4920 R 81.0 3.8 0:01.79 │ │ └─ /usr/libexec/gcc/x86_6<br /><br />I hope you can copy-paste and see. Perhaps easier to see these percentage numbers which is much too high vor syslog (the first number):<br />41263268 21114581:29 Ss syslog-ng<br />73.0 0:00 RN+ cc1plus<br />77.5 0:01 RN+ cc1plus<br />7.0 52:09 Ss+ X<br />3.1 23:25 Sl kwin<br />1.7 13:16 Sl firefox<br />0.8 0:37 SN+ emerge<br />0.7 0:31 SN+ make<br /><br />Ralph UlrichAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-63774977184361374962012-03-22T09:50:57.393+11:002012-03-22T09:50:57.393+11:00Thanks for those as usual. Note that you can add l...Thanks for those as usual. Note that you can add links to your comments with ordinary html by linking <a href="http://s14.postimage.org/s3vhdkxlt/anova.png" rel="nofollow">anova.png</a><br /><br />I'll be posting a release candidate shortly for BFS with a few minor code changes as well. I notice how easy it is to type cfq when you meant cfs too ;)ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-90973905533594934942012-03-22T05:23:05.011+11:002012-03-22T05:23:05.011+11:00CK - I ran the standard make benchmark on 3.2.11/3...CK - I ran the standard make benchmark on 3.2.11/3.2.11+bfs v0.416 and 3.3.0/3.3.0+bfs v0.418 running it a total of 15 times. As expected, there is a statistically significant difference between both pairs of kernels (cfq/bfs under 3.2.11 and cfq/bfs under 3.3.0) with the bfs patched kernels out-performing the cfq kernels.<br /><br />Kudos as usual!<br /><br />The 'make benchmark' is compiling the linux 3.3.0 via 'make -j16 BzImage' and timing the result, the repeating 14 times. This is done via a bash script. Why 16 threads? I tested this on a dual quad system (with HT enabled).<br /><br />.post h3 {<br />background:url(http://s14.postimage.org/s3vhdkxlt/anova.png) no-repeat;<br />margin:.25em 0 0;<br />padding:0 0 4px 30px;<br />font-family:trebuchet ms;<br />font-size:140%;<br />font-weight:normal;<br />line-height:1.4em;<br />color:$titlecolor;<br />}<br /><br />Here is the ANOVA plot of the data: http://s14.postimage.org/s3vhdkxlt/anova.png<br /><br />Here is the data table for those who want to analyze it on their own: http://pastebin.com/7W0HADvHgrayskyhttps://www.blogger.com/profile/16133632514577609343noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-85753863595605764152012-03-21T20:38:33.672+11:002012-03-21T20:38:33.672+11:00I have just tried compiling kernel and KDE 4.8.1 o...I have just tried compiling kernel and KDE 4.8.1 on my Intel Core i7-2670QM with nice -n19 make -j8 on 3.3 with BFS 418. It looks like I am not affected by this bug either. I will try to compile something bigger as soon as possible and see if it will work fine.vojtahttps://www.blogger.com/profile/10363081974754584260noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-35699650901927757792012-03-20T23:28:25.227+11:002012-03-20T23:28:25.227+11:00Thanks for trying. This is why I was appealing for...Thanks for trying. This is why I was appealing for help here, because I can't reproduce it either and I haven't found what specific circumstances create the problem. Make -j4 niced to 19 is a very very very common workload on my quad core and I don't see it.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-84489442026657038502012-03-20T23:23:10.803+11:002012-03-20T23:23:10.803+11:00I just tried testing for the bug with 3.2-ck1 by c...I just tried testing for the bug with 3.2-ck1 by compiling a kernel with "make -j4" niced to 19 on a quad cpu/core VM. The highest value I saw for priority was 41. It seems I am unable to induce the behavior this bug.tux9656http://tux9656.blogspot.com/noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-8412570212375862122012-03-20T20:32:31.718+11:002012-03-20T20:32:31.718+11:00Okay, fixed the CPU accounting and bumped the vers...Okay, fixed the CPU accounting and bumped the version up to 418. There's an incremental called 417 stuff in the test/ directory if anyone just wants the fix.<br />http://ck.kolivas.org/patches/bfs/test/bfs417-stuff.patchckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-5715249164559178712012-03-20T19:47:52.538+11:002012-03-20T19:47:52.538+11:00Thanks. Fixed. There is a known bug to do with dis...Thanks. Fixed. There is a known bug to do with displayed accounting but that shouldn't affect its behaviour.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-70619905461944223212012-03-20T19:02:29.846+11:002012-03-20T19:02:29.846+11:00ck. Please fix the patch link. You add blogger web...ck. Please fix the patch link. You add blogger website link before the real link. :DAlfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-71669770844471479322012-03-20T17:49:18.131+11:002012-03-20T17:49:18.131+11:00Looks like I overestimated how much work it would ...Looks like I overestimated how much work it would be. Putting a link in the post for a preview patch.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-48846949396772682092012-03-20T15:57:01.640+11:002012-03-20T15:57:01.640+11:00Furthermore, 3.3 appears to have one of the bigges...Furthermore, 3.3 appears to have one of the biggest CPU scheduler code churns in a long time, meaning it will involve quite some work.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-35528925684606932942012-03-20T11:33:38.173+11:002012-03-20T11:33:38.173+11:00I haven't done the port to 3.3 yet because all...I haven't done the port to 3.3 yet because all my BFS time is being consumed with tracking down this bug instead.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-32556108563567107382012-03-20T11:32:36.716+11:002012-03-20T11:32:36.716+11:00Could you release BFS beta for 3.3 to give it a te...Could you release BFS beta for 3.3 to give it a test? I've got no problem you mentioned.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-25307943850959364452012-03-20T11:27:34.127+11:002012-03-20T11:27:34.127+11:00That would be much appreciated. I'm guessing i...That would be much appreciated. I'm guessing it happened somewhere in the transition from late BFS 3xx to 416 so you wouldn't have to go back very far. I'm expecting it to be in the early BFS 400s, guessing linux 2.6.39ish. Judging by the symptom, I don't think it would matter whether it's hyperthread or not, but it might need to at least be SMP, dual core or dual thread or more. I'm led to believe it shows up after some time, so perhaps there's some accumulated counter that needs to go off before it becomes a problem. They giveaway would be a PRI > 41 for a non SCHED_IDLEPRIO task, and it exhibits this only in the setting of a niced load, where nice 19 has the most dramatic effect. It should be the same within a VM as a real machine.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-53563716300759458092012-03-20T11:19:20.181+11:002012-03-20T11:19:20.181+11:00I might be able to do some regression testing. Wh...I might be able to do some regression testing. What exactly would I be looking for? Higher values in the PR column? Would a virtual machine work to test this or could that create false positives? Would older hardware be more likely to expose this bug? My main machine is quite new (socket FM1) so I don't think that I would be able to run older kernels too far back, but I do have a PowerPC G3 Mac that should easily be able to run anything from 2.6.0 and up. What is the earliest release you suspect would exhibit this bug? Could the bug be specific to compiling with a particular version of gcc? Did the bug show up for you on a machine with hyperthreading?tux9656http://tux9656.blogspot.com/noreply@blogger.com