Yesterday, I submitted my first program modification to the computer language shootout game, which is a change to the thread-ring program. If you've followed the shootout at all, you will probably know that haskell dominates at this benchmark, with the next contender (Mozart/OZ) being 50% slower than haskell already.
While i was having a look at the code, I noticed that the haskell entry was forkIO'ing a lot of threads, which is exactly what we want. But we weren't quite forking enough; one of the 503 threads was the main thread. Bad idea!
I learnt a few weeks ago, while working on my AVar package, that communicating between bound and non bound threads (forkIO'd vs. main) when using things like MVars can be really slow. And this is exactly what was happening:
main = do a <- newMVar . read . head =<< getArgs z <- foldM new a [2..ring] thread 1 z a
as you can see, the last/first thread was being run in main, so once per loop around the ring, we were talking to main via an MVar, and then talking to another forkIO'd thread via another MVar. So here's what I did:
main = do a <- newMVar . read . head =<< getArgs z <- foldM new a [2..ring] ret <- newEmptyMVar forkIO (thread 1 z a >>= putMVar ret) takeMVar ret
Two and a half lines of code. What were the results? Well, when the program was run with n = 50,000,000, the original ran in ~14.2s, with -N{1,2,3,4}, my modified one took on average ~9.3s, which is about a 33% improvement. It should be noted that when both programs are compiled without -threaded, there is almost no difference in speed at all (and they both run considerably faster than the non threaded version). Need to talk to the shootout peeps and find out if it's against the rules to compile without -threaded, and get an even more massive speed boost.
-- Axman

Woah, amazing. It looks like Haskell won't lose this benchmark anytime soon (yeah, benchmarks are not *that* important, but they make a very nice marketing - and it's aways nice to be able to have speed when you need it).
Did you also checked if the chameneo benchmark could be improved in the same way?
Cool. Back when I submitted the first Haskell entry for this benchmark Haskell actually beat C (which was the fastest back then) by six times. It was six times faster! Those were fun times :-)
Hey Fernando,
I just had a quick go at seeing if i could speed up the chameneo-redux benchmark at all by using the same idea, but it didn't make any difference :(
Haha, wow, that really is amazing Josef. Can't have been that great though right? i mean, the current one is 19 tomes faster, and my one should bring it to more than 20x :P
Of course, I'm kidding, that really is awesomeness. I bet i know just how good that would feel too :D
-- Axman
Alex, I realize didn't make my point very well. Six times the speed of C seems minuscule these days. What I was hoping to communicate was the initial shock of just how much faster Haskell was compared to everything else in the benchmark at the time. And it wasn't like I had tried to optimize the program or anything. I just wrote the most obvious thing I could come up with since Haskell didn't have a submission for that particular benchmark and it just beat the crap out of C. So, yes, times are even better now compared to C but other languages are much closer. Back then we were six times better than the closest combatant.