Monday 25 October 2010

Minor BFS 357 Bug(?) on 2.6.36

I received a bug report from someone running BFS on 2.6.36. They hit this BUG_ON in the new code present only in 2.6.36:

+static void try_to_wake_up_local(struct task_struct *p)
+{
+ struct rq *rq = task_rq(p);
+ bool success = false;
+
+ BUG_ON(rq != this_rq());

This looks fairly straight forward and this code path is only used by the new worker code in 2.6.36. However it shouldn't be hit unless something else is calling this function (indirectly via schedule()) wrongly. Anyway they hit it it seems via the iwlwifi code. No idea how, but it's actually harmless to wake up a task on another runqueue in BFS, so simply removing this BUG_ON fixes it.

Here's a patch to apply to BFS if you're running it on 2.6.36 and run into this bug:
bfs357-worker_fix.patch which just removes the BUG_ON. However it makes me wonder if this bug is in mainline and only those who hit this bug can confirm or otherwise by running 2.6.36 vanilla and there's no point reporting it when it's so vague.

3 comments:

  1. I searched for at bugzilla.kernel.org but didn't find anything yesterday. I had to restart my wifi (funtoo wpa_supplicant service) hourly before your worker_fix.

    ReplyDelete
  2. Thanks for the feedback. I assume the patch fixed the problem for you then?

    ReplyDelete
  3. Yes, fixed! Before, using kernel 2.6.35 with bfs I had to restart my broadcom wifi once in 5 hours. But I think this was also with not bfs-patched kernels. Really better now it seems...

    Yesterday I upgraded my Funtoo (Gentoo) Kde-4.4.5 to Kde-4.5.2 using a 1.8 GiB tempfs without errors and without feeling interrupted and disturbed when surfing the internet.

    ReplyDelete