tag:blogger.com,1999:blog-6469704299235308349.post4218790215666805811..comments2024-02-09T16:24:46.087+11:00Comments on -ck hacking: BFS 0.449ckhttp://www.blogger.com/profile/02904761195451530213noreply@blogger.comBlogger56125tag:blogger.com,1999:blog-6469704299235308349.post-43896470201564902492014-08-04T00:02:44.759+10:002014-08-04T00:02:44.759+10:00So switched my minipci WLAN card again. Hopefully ...So switched my minipci WLAN card again. Hopefully the last one, because its a nightmare to do this on a Dell Vostro 3750. With other words, can't test anymore the hibernate bug with the ath9k wifi card and bfs.<br /><br />Btw. Using now an Intel 7260 wifi card with bluetooth, good/excelent performance on 2,4GHz and even in 5GHz area its great and this with only 2 antennas. (Wasn't aware, that this does work at all).<br />And hibernate/suspend does work with BFS ;) <br /><br />PS: But remember, there are different models with different specs on the market!<br /><br />Regards sysitosMikehttps://www.blogger.com/profile/12391045215046883684noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-41912491621020214172014-07-28T18:25:42.951+10:002014-07-28T18:25:42.951+10:00I was having lockups too with the latest kernel + ...I was having lockups too with the latest kernel + ck, this fixed it for me<br /><br />https://bugzilla.kernel.org/attachment.cgi?id=142231<br /><br />probably unrelated but might help someone else.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-29076788896925777352014-07-27T16:15:36.245+10:002014-07-27T16:15:36.245+10:00Thanks. Looks convincing. I might run with it with...Thanks. Looks convincing. I might run with it without even bothering with the check at all then.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-50998331666919544822014-07-25T15:53:12.050+10:002014-07-25T15:53:12.050+10:00Alfred, plz read my detailed bugreport here:
http...Alfred, plz read my detailed bugreport here:<br /><br />http://lists.tuxonice.net/pipermail/tuxonice-devel/2014-June/007500.html (several emails there for June and July).<br /><br />I've reported that unloading/modprobing module raises the same issue, even in more controlled manner.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-89164608700601832082014-07-25T13:21:30.818+10:002014-07-25T13:21:30.818+10:00Sorry, detail steps should be
1. unload modules be...Sorry, detail steps should be<br />1. unload modules before suspend.<br />2. load modules after resume<br />3. deal with upper layer service/applicationAlfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-58022680723954658522014-07-25T13:18:36.310+10:002014-07-25T13:18:36.310+10:00Guys, would you want to try if this kind of workar...Guys, would you want to try if this kind of workaround works for you. It's not a solution anyway.<br /><br />When I played with eee-pc years ago, the wifi driver doesn't work after resume from suspend. So I have to unload the wifi driver module(and maybe iw module) and modprobe it back after system resume.<br /><br />I think you need to play with plugin scripts in /etc/pm/sleep.d, also you also need to restart upper layer services/application to get it works.<br /><br />Good lucky.Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-52924498799427117752014-07-25T13:09:55.210+10:002014-07-25T13:09:55.210+10:00Let's explore it a little detail
a. expand the...Let's explore it a little detail<br />a. expand the task_rq() macro, task_rq(p)=>cpu_rq(task_cpu(p)),task_cpu(p)=>task_thread_info(p)->cpu,task_thread_info(task)=>((struct thread_info *)(task)->stack);cpu_rq() is the final step and it is a per_cpu variable which basiclly var_address + per_cpu_offset. So rq is looked up by cpu index in task thread info.<br />b. If we consider all the possibility that we call task_rq() without any lock or irq, that turns out it may crash the kernel or rq could be any value <br /><br />rather than the zero. Checking "dereference" NULL is not enough for all cases here. I have done a little test to show returns of cpu_rq(0~19) while my machine just have 2.<br />[ 14.903959] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx<br />[ 21.450042] cpu_rq(0) ffff88013fc12880.<br />[ 21.450049] cpu_rq(1) ffff88013fd12880.<br />[ 21.450055] cpu_rq(2) 0000000000012880.<br />[ 21.450057] cpu_rq(3) 0000000000012880.<br />[ 21.450061] cpu_rq(4) 0000ffff00012880.<br />[ 21.450063] cpu_rq(5) 0000000000012880.<br />[ 21.450066] cpu_rq(6) 0000000000012880.<br />[ 21.450068] cpu_rq(7) 0000000000012880.<br />[ 21.450071] cpu_rq(8) ffffffff817c858b.<br />[ 21.450074] cpu_rq(9) 0000000000012880.<br />[ 21.450077] cpu_rq(10) 0000000000012880.<br />[ 21.450080] cpu_rq(11) 0000000000012880.<br />[ 21.450083] cpu_rq(12) 0000000000012880.<br />[ 21.450086] cpu_rq(13) ffffffff8103fc50.<br />[ 21.450090] cpu_rq(14) ffffffff8103fbd0.<br />[ 21.450092] cpu_rq(15) 0000000000012880.<br />[ 21.450096] cpu_rq(16) ffffffff8103fc10.<br />[ 21.450099] cpu_rq(17) 0000000000012880.<br />[ 21.450102] cpu_rq(18) 0000000000012880.<br />[ 21.450105] cpu_rq(19) 0000000000012880.<br />c. There is no checking in mainline wait_task_inactive() too. If bfs cause issue here than mainline does and this checking is just for specifical arch, it should consider around by arch macro.Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-26033198907422778082014-07-25T10:11:51.233+10:002014-07-25T10:11:51.233+10:00@pf No difference with ps_enable=1, always tested ...@pf No difference with ps_enable=1, always tested with this switch, because for me it seems to give some better connection stability. Same experience as ooo.Mikehttps://www.blogger.com/profile/12391045215046883684noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-47318440026379730032014-07-25T07:32:42.016+10:002014-07-25T07:32:42.016+10:00@pf, okay, that's what I did except didn't...@pf, okay, that's what I did except didn't enable power_save with iw.<br />That didn't seem to make any difference though: still lockup after second suspend.ooonoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-4040909540183191972014-07-25T06:20:03.581+10:002014-07-25T06:20:03.581+10:00@ooo, I've done this:
modprobe ath9k ps_enabl...@ooo, I've done this:<br /><br />modprobe ath9k ps_enable=1<br />iw dev wlp1s0 set power_save onOleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-15632215737514170872014-07-25T05:05:27.365+10:002014-07-25T05:05:27.365+10:00@pf, how did you enable the powersaving? with ps_e...@pf, how did you enable the powersaving? with ps_enable=1 option?<br />I tried that but my system still locks up after second suspend.<br /><br />However this time my system actually worked a few seconds longer after resume than usually, and only locked up when wireless started connecting, which apparently takes a bit longer with ps_enable=1.ooonoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-90488034432017977282014-07-25T03:52:19.220+10:002014-07-25T03:52:19.220+10:00You'd laugh, Con, but… the issue is fixed by t...You'd laugh, Con, but… the issue is fixed by turning on ath9k powersaving (default is off).<br /><br />I don't know what to say ;). Everything I know about it is that ath9k powersaving has been implemented using timers, and, probably, they are related to CPU scheduler locking issue as well.<br /><br />Anyway, I'm not 100% sure, but my machine survived after two hibernation cycles. Will test more.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-68965020123973698042014-07-24T13:54:04.066+10:002014-07-24T13:54:04.066+10:00Then how are you going to know if there's a co...Then how are you going to know if there's a codepath during the unlocked IRQs context switch on other architectures that doesn't dereference rq briefly enough for it to be NULL, which is all that is testing for?ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-17135293049669485572014-07-24T13:46:06.918+10:002014-07-24T13:46:06.918+10:00I am just talking about #2 [BFS] Add WARN_ON_ONCE ...I am just talking about #2 [BFS] Add WARN_ON_ONCE if rq is dereferenced in wait_task_inactive() in my second reply.<br /><br />Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-61760751177207023282014-07-24T07:40:26.528+10:002014-07-24T07:40:26.528+10:00Thanks guys. @PF I've been staring at the lock...Thanks guys. @PF I've been staring at the lock debugging photos you took as they seem the key to finding this issue though I'm still stumped as to what it is about that particular module that makes it happen. I'm still looking whenever I have time and we'll nail this sooner or later..ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-64442969425110409242014-07-24T07:13:10.808+10:002014-07-24T07:13:10.808+10:00Unfortunately, this patch doesn't fix the issu...Unfortunately, this patch doesn't fix the issue for me as well.<br /><br />Nevertheless, I've found out that ksoftirqd hangs just here:<br /><br />http://lxr.free-electrons.com/source/kernel/softirq.c#L663<br /><br />__cond_resched() call from within run_ksoftirqd() eats lots of CPU time in the case of system hang.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-65046325836688820362014-07-24T06:27:31.283+10:002014-07-24T06:27:31.283+10:00Hi Con,
tested only with the new patch. For me it ...Hi Con,<br />tested only with the new patch. For me it seems to be better, but the CPU hung still remains (for suspend and hibernate). But now after hibernating the WLAN does reconnect, was not the case without it. Dmesg shows multiple times, that CPU0 hung for min. 23 sec, top shows a kernel threads eating the CPU ;) and goes to values of 20 and more. System isn't usable anymore.<br />But fine to see, that you are working on it. Thanks again.<br /><br />Regards sysitosMikehttps://www.blogger.com/profile/12391045215046883684noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-41496666428112937782014-07-24T03:52:45.296+10:002014-07-24T03:52:45.296+10:00I tested this against 3.15.5 and sisrevert patch (...I tested this against 3.15.5 and sisrevert patch (nothing else from pending/ or test/). However the suspend issue remains with ath9k.<br /><br />There seems to be some change though: Previously I could usually suspend and resume once without any issues, and the system would freeze only after second resume.<br /><br />with sched_affinity_locks patch, 3/5 of the times my system froze right after the first suspend/resume. The other two times my system froze after second resume.<br /><br />So maybe there could be some change for worse, or it could just be coincidence. <br /><br />anyway, thanks again for your continuing efforts for solving this issue :)ooonoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-36882515327841072382014-07-24T01:42:19.250+10:002014-07-24T01:42:19.250+10:00Hi Con,
for testing the ath9k suspend bug which p...Hi Con,<br /><br />for testing the ath9k suspend bug which patch should I use on top of the BFS 3.15 patch? Only this one, or both from /testing with or without all the patches from /pending?<br /><br />Thanks.<br />Regards sysitosMikehttps://www.blogger.com/profile/12391045215046883684noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-81560022813013810192014-07-23T23:32:16.819+10:002014-07-23T23:32:16.819+10:00Here's another test patch that there is a smal...Here's another test patch that there is a small chance it might help this issue:<br /><a href="http://ck.kolivas.org/patches/bfs/3.0/3.15/test/bfs449-sched_affinity_locks.patch" rel="nofollow">bfs449-sched_affinity_locks.patch</a><br /><br />Note to others, unless patches are in the pending/ directory I'm not planning on including them. These test patches are just experimenting for this particular issue.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-68849703945316284782014-07-23T18:02:54.181+10:002014-07-23T18:02:54.181+10:00What's the percpu macro in reference to? Do yo...What's the percpu macro in reference to? Do you mean the optimisation you removed in your first set of patches? Checking to see if SMT siblings exist is CPU architecture dependent action and not one based on hard code. Running an SMT kernel on an AMD cpu will only find no siblings once it is running.ckhttps://www.blogger.com/profile/02904761195451530213noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-31798686436421762712014-07-23T17:56:32.206+10:002014-07-23T17:56:32.206+10:00@ck
Thanks for the review.
1. I am not adding the ...@ck<br />Thanks for the review.<br />1. I am not adding the extra function, mainly just sync-up the function name changes like ttwu_post_activation -> ttwu_do_wakeup. As for the flags, I will check if they are used by other functions and try to clean them up.<br />2. After expand the percpu macro, basically it is var_address + per_cpu_offset, despite the different ways to determining per_cpu_offset for different arch, it is very unlikely it returns an zero pointer.<br /><br />I have 2 more minor changes these two weeks, will post out after testing.Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-13262425736781924862014-07-20T17:11:48.742+10:002014-07-20T17:11:48.742+10:00I am just testing these patches:
bfs449-fix_get_no...I am just testing these patches:<br />bfs449-fix_get_nohz_tt.patch<br />bfs449-fix_idle_warn.patch<br />bfs449-fixnr_cpu_ids.patch<br />bfs449-pstate_scaling.patch<br />bfs449-remove_wlist.patch<br />bfs449-sisrevert.patch<br />bfs449-sis-test.patch <br /><br />Suspend to ram is OK, Hibernate I dont use and have disabled in .config . <br />My system as ever: MacMini Core2Duo<br /><br />Thanks from a happy user from hot summer Hamburg, Germany<br />RalphAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-55358508567463406912014-07-16T15:48:40.248+10:002014-07-16T15:48:40.248+10:00For me the issue happens after TOI usage, but as y...For me the issue happens after TOI usage, but as you can see other ppl face it even without TOI.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-6469704299235308349.post-49751689648588908702014-07-16T09:08:26.509+10:002014-07-16T09:08:26.509+10:00Hi Con. No, don't use Tux On Ice, so BFS + ath...Hi Con. No, don't use Tux On Ice, so BFS + ath9k isn't okay. Tested it with TOI too, but same hung problem.<br /><br />regards sysitosMikehttps://www.blogger.com/profile/12391045215046883684noreply@blogger.com