Monday, 25 July 2011

3.0 BFS delays

Hi all

I haven't blogged much lately because I've been distracted from kernel hacking by bitcoin mining. For some crazy reason I took it upon myself to make mining software that did what I wanted, writing it the way I write kernel code. Anyway since it's unrelated I haven't posted about it here before, but if anyone's interested, the development thread is here:

http://forum.bitcoin.org/index.php?topic=28402.0

Now about the kernel. To be honest I haven't followed the development of 3.0 almost at all, being totally pre-occupied with other things as I've taken time out from work as a sabbatical while I reassess work-life balance, long term career management (even to the point of considering changing line of work - anyone need a c programmer?) and spend time with family, friends and random other personal development things. No, I'm not quitting kernel development any time soon (again).

Anyway the thing is I'm going with Interplast next week to Nauru (of all places) as a volunteer anaesthetist for needy children for 10 days. I'm not sure if I'll find time to port BFS to 3.0 before then, or if I'll be able to do it while I'm actually there (doubtful). So just a heads up that it might be a while before we BF the 3.0 kernel.

24 comments:

  1. I upgraded to 3.0 just two days ago, and it isn't as bad as I expected without BFS.

    Is it just me, or did upstream get much better in the last release with multimedia/desktop workloads?

    Oh, and best wishes for Nauru :-)

    ReplyDelete
  2. How much did you expect and how much did you get?
    Whatever it is you are talking about.

    "Show me the money!" -- from Jerry Maguire

    ReplyDelete
  3. @RealNC

    Perhaps it is due to improvement in DRM?

    ReplyDelete
  4. I haven't played with bitcoins yet but I'm going to give your new app a go. If I get any I will send them over :)

    ReplyDelete
  5. Does BFS take into consideration NUMA?

    ReplyDelete
  6. @MikeyB, quoting from http://ck.kolivas.org/patches/bfs/bfs-faq.txt

    NUMA aware?

    It is NOT NUMA aware in the sense that it does any fancy shit on NUMA, but
    it will work on NUMA hardware just fine. Only the really big NUMA hardware
    is likely to suffer in performance, and this is theoretically only, since
    no one has that sort of hardware to prove it to me, but it seems almost
    certain. v0.300 onwards have NUMA enhancements.

    Try it for yourself

    ReplyDelete
  7. @myself
    Nah, I spoke too soon. Starting Jack + LMMS + various synths shows latency audio glitches that don't exist in BFS :-P

    ReplyDelete
  8. I am just planning with others a new Debian unstable derivate. I had hoped we could show up with using a linux-3.0-bfs kernel. Too bad there are no other people taking care of bfs :(

    ReplyDelete
  9. @Ralph
    It is too bad BFS can't get into the mainline as an optional scheduler just like we have three different disk schedulers (noop deadline cfq). This would take some pressure off CK for development of BFS and also allow input of others in a collaborative sense.

    @CK
    Everyone has a deep appreciation for the work you do! Thank you for it!

    ReplyDelete
  10. I can wait a couple of weeks. Thanks for bfs and for your volunteering for needy children.

    ReplyDelete
  11. apropos mainline:
    I think Linus had rejected introduction into mainline, because he thought it would complicate development to have different models of schedulers. So I wait for a moment when a new linux version will be patchable by a previous bfs version without errors. This indicates end of linux scheduler development. Optimal moment to request inclusion of bfs into mainline!

    But: If the actual mainline scheduler has no optimum this moment will never ocure ...

    ReplyDelete
  12. The BFS second anniversary is upon us. Here is some gift requests.

    Lobby phoronix.com to rerun their benchmarks: http://www.phoronix.com/scan.php?page=article&item=bfs_scheduler_benchmarks

    Then redo the only scientific comparison I know: http://www.cs.unm.edu/~eschulte/data/bfs-v-cfs_groves-knockel-schulte.pdf

    Unfortunately, a one-off test and a plot does not do it for BFS. We need a whole box nicely wrapped and presented to make BFS a happy kid.

    Any takers?

    ReplyDelete
  13. Bfs is not about benchmarks: We want a responsive Desktop even if benchmarks go worse. Because it is about the Desktop and many of us are using battery powered notebooks, the longer sleeping cpus the better!

    Are there any tests around there to take these preferences into account?

    ReplyDelete
  14. Try to clock your battery (not watch) while encoding 1000 or so mp3's. That's a benchmark for you.

    ReplyDelete
  15. Rerunning the benchmarks is a good idea; However, if Phoronix don't feel like spending time on it, we could do it ourselves ...

    Phoronix might be willing to share the setup code etc., so at best it is just a question of setting up a machine and doing ./run_benchmark.sh :)

    ReplyDelete
  16. (I have emailed Phoronix, btw, giving them a little background story and asking if they're interested in rerunning the benchmark.)

    ReplyDelete
  17. >Try to clock your battery (not watch) while encoding 1000 or so mp3's

    Yeah, phoronix had some experiments using a watt meter ...

    ReplyDelete
  18. Gah, I reverted from 3.0 back to 2.6.39-bfs. Vanilla 3.0 still has big problems under load. It seems they're never going to fix their issues.

    ReplyDelete
  19. @MikeyB:

    I have been running it on IBM x445 summit, not numaq. 2 smp nodes, both with 2 xeon HT enabled processors, works like a charm, unfortunately no time for performance statistics.

    @con:

    "anyone need a c programmer", yes i do, what's your salary. pm me on irc.

    ReplyDelete
  20. Uuuuh, I had a look about patching linux-3.0 with Bfs: This is going to be a big task for Kolivas.

    At first it seems easy, one function to transfer. But there is a new feature which will cost Con a lot. There had been work done in mainline linux-3.0 to implement restrictions on LinuxContainers. Huge task I guess to make a Bfs patch now ....

    ReplyDelete
  21. Let's hope not :-(

    ReplyDelete
  22. I dont understand all the talk about 4096 cpu Linux servers? All of them are HPC servers, essentially a large cluster. The largest SMP servers for sale today, have 32 or 64 cpus. There are no larger SMP servers for sale. The biggest IBM Mainframe z196 has 24 cpus.

    The Linux trick is to connect lot of nodes on a fast switch, and then use software to make it look like a single kernel. This is how Linux servers can have 4096 cpus: it is a large cluster on a network emulating a single kernel.

    For instance, SGI Altix Linux server works this way. If you study the SGI Altix Linux customers, they all do HPC work (embarrasingly parallell work). None use such servers for SMP work.

    Also, scale MP which has up to 8192 cores, works this way:
    http://www.theregister.co.uk/2011/09/20/scalemp_supports_amd_opterons/
    "Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a shared memory system, ScaleMP cooked up a special hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes. Rather than carve up a single system image into multiple virtual machines, vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space."

    ReplyDelete
  23. I dont understand all the talk about 4096 cpu Linux servers? All of them are HPC servers, essentially a large cluster. The largest SMP servers for sale today, have 32 or 64 cpus. There are no larger SMP servers for sale. The biggest IBM Mainframe z196 has 24 cpus.

    The Linux trick is to connect lot of nodes on a fast switch, and then use software to make it look like a single kernel. This is how Linux servers can have 4096 cpus: it is a large cluster on a network emulating a single kernel.

    For instance, SGI Altix Linux server works this way. If you study the SGI Altix Linux customers, they all do HPC work (embarrasingly parallell work). None use such servers for SMP work.

    Also, scale MP which has up to 8192 cores, works this way:
    http://www.theregister.co.uk/2011/09/20/scalemp_supports_amd_opterons/
    "Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a shared memory system, ScaleMP cooked up a special hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes. Rather than carve up a single system image into multiple virtual machines, vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space."

    ReplyDelete
    Replies
    1. Indeed, mentioning 4096 CPUs in the BFS patch is mostly tongue-in-cheek because virtually all systems with any realistic availability are <= 64 logical devices. Nonetheless, scalability is an issue at 64 devices with the current BFS patch, though how big an issue, and in what areas, is up for debate. At low loads I expect BFS will not remotely have any scalability issue even at this number of cores/threads.

      Delete