tag:blogger.com,1999:blog-6469704299235308349.post1082901681163482039..comments2024-02-09T16:24:46.087+11:00Comments on -ck hacking: BFS 0.400ckhttp://www.blogger.com/profile/02904761195451530213noreply@blogger.comBlogger52125tag:blogger.com,1999:blog-6469704299235308349.post-65341724867374314922011-04-20T01:36:46.889+10:002011-04-20T01:36:46.889+10:00Oh, sorry, I've missed that.
I guess then, a b...Oh, sorry, I've missed that.<br />I guess then, a better strategy would be splitting the list_for_each_entry loop into two loops: one for the idx < RT and the rest.<br />(I don't really know if that will be beneficial at all, that was just something small that bugged me & I thought to share).<br /><br />btw,<br />Thanks again for your wonderful work on BFS !Igalnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-26639529511190557212011-04-19T21:10:01.538+10:002011-04-19T21:10:01.538+10:00Thanks, but that breaks affinity so I can't us...Thanks, but that breaks affinity so I can't use it.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-44119275347898849182011-04-19T21:05:18.389+10:002011-04-19T21:05:18.389+10:00Hi, as long as we are pasting patches here,
This o...Hi, as long as we are pasting patches here,<br />This one might help: <br /><br />--- linux-2.6.35.12-ck/kernel/sched_bfs.c 2011-04-19 11:47:07.666203441 +0300<br />+++ linux-2.6.35/kernel/sched_bfs.c 2011-04-19 11:51:54.250203441 +0300<br />@@ -2754,15 +2754,16 @@<br /> if (idx >= PRIO_LIMIT)<br /> goto out;<br /> queue = grq.queue + idx;<br />+ if (idx < MAX_RT_PRIO) {<br />+ /* We found an rt task */<br />+ edt = list_first_entry(queue, struct task_struct, run_list);<br />+ goto out_take;<br />+ }<br />+<br /> list_for_each_entry(p, queue, run_list) {<br /> /* Make sure cpu affinity is ok */<br /> if (needs_other_cpu(p, cpu))<br /> continue;<br />- if (idx < MAX_RT_PRIO) {<br />- /* We found an rt task */<br />- edt = p;<br />- goto out_take;<br />- }<br /> <br /> /*<br /> * Soft affinity happens here by not scheduling a task withIgal Shilmanhttps://www.blogger.com/profile/09060864716595936467noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-51651267391959190092011-04-18T21:26:42.108+10:002011-04-18T21:26:42.108+10:00I do agree to enable dynamic tick (tickless) to sa...I do agree to enable dynamic tick (tickless) to save more power. I think ordinary desktop user (office apps, internet, movies) could benefit from it. Since office apps and internet doesn't use much cpu power. While internet streaming via flash and playing movie can use GPU instead of CPU.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-71384792186456834832011-04-18T11:21:12.169+10:002011-04-18T11:21:12.169+10:00Yes, 1000Hz absolutely is recommended for anything...Yes, 1000Hz absolutely is recommended for anything that is used as a desktop, no matter how many CPUs you have.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-65467515372697852412011-04-18T11:18:17.602+10:002011-04-18T11:18:17.602+10:00Short question ck -- CONFIG_HZ_1000 is recommended...Short question ck -- CONFIG_HZ_1000 is recommended, is CONFIG_HZ_250 preferable for a quad-core? I've asked about multi-core processor behavior wrt hertz before but was told by #kernel that they didn't know. /headscratch<br /><br />Thanks for your efforts and code, as always.Ranguvarhttps://www.blogger.com/profile/10394067304455247206noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-86137643236057244922011-04-16T23:56:38.067+10:002011-04-16T23:56:38.067+10:00--- linux-2.6.35.orig/kernel/sched_bfs.c 2010-08-3...--- linux-2.6.35.orig/kernel/sched_bfs.c 2010-08-31 16:32:43.859976211 +0200<br />+++ linux-2.6.35/kernel/sched_bfs.c 2010-08-31 16:33:53.943976073 +0200<br />@@ -129,7 +129,7 @@ int sched_iso_cpu __read_mostly = 70;<br /> * The quota handed out to tasks of all priority levels when refilling their<br /> * time_slice.<br /> */<br />-static inline unsigned long timeslice(void)<br />+static inline int timeslice(void)<br /> {<br /> return MS_TO_US(rr_interval);<br /> }_sid_noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-62325264227783799622011-04-16T20:13:16.896+10:002011-04-16T20:13:16.896+10:00Hi Con!
I'm on a Phenom II X4 CPU and I can co...Hi Con!<br />I'm on a Phenom II X4 CPU and I can confirm that 0.400 brings substantial improvements with scaling governors (I'm using conservative).<br />I can also confirm what Ralph Ulrich said, 0.400 seems way smoother than 0.376 to me. I'm on 2.6.38.3 and the patch applies cleanly.Neo2http://www.faskatech.netnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-12513260852007232432011-04-16T15:08:14.638+10:002011-04-16T15:08:14.638+10:00Well spotted _sid_ , thanks!. (To everyone wonderi...Well spotted _sid_ , thanks!. (To everyone wondering about this, it's an optimisation, for an unnecessary test, not a bugfix, but it's worth doing).ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-49635728570681677372011-04-16T11:25:37.225+10:002011-04-16T11:25:37.225+10:00--- linux-2.6.38.orig/kernel/sched_bfs.c 2011-04-1...--- linux-2.6.38.orig/kernel/sched_bfs.c 2011-04-15 00:42:27.000000000 +0200<br />+++ linux-2.6.38/kernel/sched_bfs.c 2011-04-15 00:45:15.521436353 +0200<br />@@ -1379,8 +1379,8 @@ static void try_preempt(struct task_stru<br /> if (rq_prio < highest_prio)<br /> continue;<br /> <br />- if (rq_prio > highest_prio || (rq_prio == highest_prio &&<br />- deadline_after(rq->rq_deadline, latest_deadline))) {<br />+ if (rq_prio > highest_prio ||<br />+ deadline_after(rq->rq_deadline, latest_deadline)) {<br /> latest_deadline = rq->rq_deadline;<br /> highest_prio = rq_prio;<br /> highest_prio_rq = rq;_sid_noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-67923985611251641992011-04-16T07:50:54.329+10:002011-04-16T07:50:54.329+10:00yeah, turbo mode is only on the core i7 i5, im not...yeah, turbo mode is only on the core i7 i5, im not sure about i3, but only on the latest generation of intel, and I know amd has its own turbo in its latest generation, but I don't know anything about itchronniffhttps://www.blogger.com/profile/16926752650876677664noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-658865554751393732011-04-15T23:29:06.578+10:002011-04-15T23:29:06.578+10:00Core2 did not have turbo mode. Dynticks will save ...Core2 did not have turbo mode. Dynticks will save you more power (when idle) but is unlikely to have any performance advantage.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-56820583974727104572011-04-15T17:52:56.015+10:002011-04-15T17:52:56.015+10:00chronniff, what is the keyword of that cpu capabil...chronniff, what is the keyword of that cpu capability turbo if you cat /proc/cpuinfo ?<br /><br />Are these dyntick effects probably also for<br />Intel core2 duo 2009 (I think without turbo) ?Ralph Ulrichnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-58683039643394217382011-04-15T11:32:09.875+10:002011-04-15T11:32:09.875+10:00haha, my approach may not have been scientific but...haha, my approach may not have been scientific but I did run the test a second time on each kernel to make sure I was getting consistent data, and each time it was nearly identical. yeah, I think u were right when u said that it enables the threads that aren't in use to stay more idle since they don't have to wakeup as much, and if u look at Intel's in depth explanation of turbo mode, if all threads are active it will overclock all of the cores up to a certain frequency (I think mine is 3.34ghz), but if a single thread is under load while the others stay under a certain frequency, that thread will will be able to overclock even further (I think mine is 3.46).chronniffhttps://www.blogger.com/profile/16926752650876677664noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-6873739663296959082011-04-15T10:04:31.249+10:002011-04-15T10:04:31.249+10:00@Ralph: There is overhead to dynticks, but if your...@Ralph: There is overhead to dynticks, but if your machine spends any time idle, there are advantages to it as well. On balance, it's probably better to enable it so I'll be changing the faq to suit.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-18062261059334263382011-04-15T00:51:50.357+10:002011-04-15T00:51:50.357+10:00chronniff has better performance with
dynamic tick...chronniff has better performance with<br />dynamic ticks on<br />obviously !<br />But I don't understand a word:<br />Shouldn't dynamic have some overhead ?Ralph Ulrichnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-2748402200162347942011-04-14T11:28:45.167+10:002011-04-14T11:28:45.167+10:00Looks pretty convincing, thanks! Of course in the ...Looks pretty convincing, thanks! Of course in the ideal world you'd conduct multiple benchmarks average them out, check standard deviation, do a statistical test and blah blah blah, but sometimes things are brutally obvious, like this.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-14312666798981319822011-04-14T11:28:38.010+10:002011-04-14T11:28:38.010+10:00oh, I and I forgot to mention that the cpu is a qu...oh, I and I forgot to mention that the cpu is a quadcore with both hyper threading and turbo mode enabledchronniffhttps://www.blogger.com/profile/16926752650876677664noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-40909915904993392932011-04-14T11:20:20.737+10:002011-04-14T11:20:20.737+10:00here is the output for the kernel build test, I bo...here is the output for the kernel build test, I booted in into recovery mode in ubuntu, which i believe is supposed to emulate runlevel 1 since ubuntu uses upstart. <br />cpuinfo (first couple lines):<br /><br />vendor_id : GenuineIntel<br />cpu family : 6<br />model : 26<br />model name : Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz<br />stepping : 5<br />cpu MHz : 3201.000<br />cache size : 8192 KB<br /><br />results:<br /><br />dynamic ticks off:<br /><br />real 1m26.390s<br />user 1m17.076s<br />sys 0m8.056s<br /><br />dynamic ticks on (tickless):<br /><br />real 1m23.710s<br />user 1m13.180s<br />sys 0m9.012schronniffhttps://www.blogger.com/profile/16926752650876677664noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-7391548863519867112011-04-13T21:02:40.278+10:002011-04-13T21:02:40.278+10:00The overhead of dynticks is why I disabled it and ...The overhead of dynticks is why I disabled it and at the same time reduced to Hz_300 for better throughput of my MacMini-2009.<br /><br />As the discussions make clear there are a lot of possibilities to tune and to get an optimized configuration at the end for your very particular machine ...Ralph Ulrichnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-13299017913668539052011-04-13T16:02:10.484+10:002011-04-13T16:02:10.484+10:00Thanks chronniff. While it may "appear" ...Thanks chronniff. While it may "appear" to use threads less, that's simply the way the work will be spread out to the cores. Instead of jumping to any free core, it will try to keep the same running thread on the same core. So if your load is only 3 out of 8 threads, the same 3 cores should appear loaded. That's actually an improvement. As for the benchmark, I need something that's actually single threaded to be run on that machine with and without dynticks if possible. The good old kernel build is simple enough. Boot into init 1 (if possible) and go into a clean kernel directory and do:<br />make clean && make allnoconfig && make -j8 && make clean && time make<br />with and without dynticks enabled and report the results of the time output please. The make -j8 is just to cache all the files in ram, but the 'make' without jobs will run mostly single threaded, which should invoke turbo mode.<br /><br />Thanks!ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-90320232094853475052011-04-13T15:53:47.597+10:002011-04-13T15:53:47.597+10:00I would be more than happy to benchmark the tickle...I would be more than happy to benchmark the tickless and non tickless kernels with bfs400 on my i7 960 quad with HT and Turbo mode(v1), just let me know what test to use, should I just time an x264 encoding? I should mention that, despite what I was seeing the other day with i7z, that the system does in fact seem to 'feel' a bit snappier with dynamic ticks turned off<br /><br />The Hyper Threading thing I metioned is that on previous bfs versions load was always balanced almost perfectly even over all 8 logical cores on my system. But this new release seems to keep all the load of the first four cores (especially the first two), while the remaining logical cores are almost always left idle. After reading your post more thoroughly, I'm guessing this is due to trying to ovverload one cpu while keeping the other threads relatively idle so that the one under load can take advantage of the turbo mode overclocking most effectively since it can hit even higher frequencies if it is just one thread overclocking and a time. The cpu behavior, however, behaves the same as always if I use make -j8 or anything putting the cpu under full load.<br /><br />(oh yeah, and the fglrx bug in dmesg was just because I had preemptive kernel debugging enabled in the kernel by accident)<br /><br />After a day of real work everyday use I certainly notice the increase in my systems responsiveness. Thanks again for ur hard work, and congrats on yet another significant improvementchronniffhttps://www.blogger.com/profile/16926752650876677664noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-21613433814292080912011-04-13T10:09:44.266+10:002011-04-13T10:09:44.266+10:00@graysky: Dynamic ticks adds overhead. The main pu...@graysky: Dynamic ticks adds overhead. The main purpose for dynticks is to reduce power consumption on mobile devices, and it doesn't reduce power by any significant amount on an ordinary desktop, so the added overhead and latency of using it normally isn't worth it. As discussed earlier, though, if you have a turbo enabled CPU, it *might* help it to go turbo more. As for the CPU frequency, no, I mean 1000 absolutely, regardless of the number of CPUs.<br /><br />@vojta: That's very interesting indeed. I don't have a HT CPU on my desktop, but I'm sure the bugs to do with nvidia slowdown are related.<br /><br />@anonymous: CPU frequency scaling is not reliably fast to speed up on desktop CPUs (as opposed to mobile CPUs), and desktop CPUs don't throttle very much speed-wise, and the power savings are miniscule compared to the intrinsic ability of the CPU to idle its unused parts. This is all on a desktop CPU. Mobile CPUs are very different. However, nothing is absolute. The newer CPUs are much better in this regard (last year or so).ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-48399939681135414162011-04-13T06:51:04.316+10:002011-04-13T06:51:04.316+10:00Hey,
I was trying to figure out what's wrong ...Hey,<br /><br />I was trying to figure out what's wrong with that slowed down 2d performance on nvidia cards. bootopper suggested (http://code.google.com/p/chromium/issues/detail?id=71276) to disables and then re-enable HT cores on cpu and it really helped, at least for me. :)<br /><br />I'm posting this just for your information. :)vojtahttps://www.blogger.com/profile/10363081974754584260noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-21684792976277672702011-04-13T06:06:07.237+10:002011-04-13T06:06:07.237+10:00CK - two questions:
1) Why disable dynamic ticks?...CK - two questions:<br /><br />1) Why disable dynamic ticks?<br />2) According to the help item for Timer frequency, "the timer interrupt occurs on each processor in an SMP environment leading to NR_CPUS * Hz number of timer interrupts per second."<br /><br />As I read that, my quad core chip would have a timer frequency 4x higher than what I select. You recommend a setting of 1000 Hz for desktops. Does this mean I need to select 300 Hz for my case (4x300=1,200 Hz)?grayskyhttps://wiki.archlinux.org/index.php/Kernel26-cknoreply@blogger.com