Icon Tracing Functions in Python


Icon Still Things and Moving Things


Icon water.cpp


Icon Making Poetry in Piet


Icon Learning Motion Manifolds with Convolutional Autoencoders


Icon Learning an Inverse Rig Mapping for Character Animation


Icon Infinity Doesn't Exist


Icon Polyconf


Icon Raleigh


Icon The Skagerrak


Icon Printing a Stack Trace with MinGW


Icon The Border Pines


Icon You could have invented Parser Combinators


Icon Ready for the Fight


Icon Earthbound


Icon Turing Drawings


Icon Lost Child Announcement


Icon Shelter


Icon Data Science, how hard can it be?


Icon Denki Furo


Icon In Defence of the Unitype


Icon Maya Velocity Node


Icon Sandy Denny


Icon What type of Machine is the C Preprocessor?


Icon Which AI is more human?


Icon Gone Home


Icon Thoughts on Japan


Icon Can Computers Think?


Icon Counting Sheep & Infinity


Icon How Nature Builds Computers


Icon Painkillers


Icon Correct Box Sphere Intersection


Icon Avoiding Shader Conditionals


Icon Writing Portable OpenGL


Icon The Only Cable Car in Ireland


Icon Is the C Preprocessor Turing Complete?


Icon The aesthetics of code


Icon Issues with SDL on iOS and Android


Icon How I learned to stop worrying and love statistics


Icon PyMark


Icon AutoC Tools


Icon Scripting xNormal with Python


Icon Six Myths About Ray Tracing


Icon The Web Giants Will Fall


Icon PyAutoC


Icon The Pirate Song


Icon Dear Esther


Icon Unsharp Anti Aliasing


Icon The First Boy


Icon Parallel programming isn't hard, optimisation is.


Icon Skyrim


Icon Recognizing a language is solving a problem


Icon Could an animal learn to program?




Icon Pure Depth SSAO


Icon Synchronized in Python


Icon 3d Printing


Icon Real Time Graphics is Virtual Reality


Icon Painting Style Renderer


Icon A very hard problem


Icon Indie Development vs Modding


Icon Corange


Icon 3ds Max PLY Exporter


Icon A Case for the Technical Artist


Icon Enums


Icon Scorpions have won evolution


Icon Dirt and Ashes


Icon Lazy Python


Icon Subdivision Modelling


Icon The Owl


Icon Mouse Traps


Icon Updated Art Reel


Icon Tech Reel


Icon Graphics Aren't the Enemy


Icon On Being A Games Artist


Icon The Bluebird


Icon Everything2


Icon Duck Engine


Icon Boarding Preview


Icon Sailing Preview


Icon Exodus Village Flyover


Icon Art Reel




Icon One Cat Just Leads To Another

Parallel programming isn't hard, optimisation is.

Created on Jan. 17, 2012, 2:31 p.m.

I'm standing in the living room and I ask my flatmates if any of them want to go to the pub. I then count the positive replies. I've just computed something in parallel. An exact instance of map-reduce. And it comes as naturally (if not more naturally) to me as the sequential approach.

We have the human intuition and analogy. Almost all parallel problems can be expressed as "Don't touch that while I'm doing something with it", or "Everyone do something and then let me gather the results". The only remaining hard part is the locality information. For example a particle system where a single particle is only affected by the nearest ten other particles. And then, most locality instances can be expressed simply by how the data is partitioned in map reduce.

Parallel concepts don't need to be hard, but the great contradiction is that whenever we are concerned with making processing parallel, we are also concerned about performance - and reasoning about the performance of parallel computation is hard. Any kind of sequential assertions you make can be thrown out the window. Suddenly the hardware and cache become a major issue, and what was before a relatively small search space, comprising of special data structures, C tricks and inline assembly, has become this huge hulking monster of possibilities.

I've been working on using OpenCL to do non-conventional rendering techniques and my exploration has lead me toward needing to implement an instance of the Marching Cubes algorithm. Something I've done before. What has me tied up this time is the vast number of possible parallel approaches at arms reach. I can at least vaguely reason about how difficult the various approaches are to implement, but when it comes to predicting the expense of them I'm somewhat in the dark. Big O notation can't help me here.

To give you an idea, one approach is a to fit marching cubes into a map reduce pattern - which is temping for it's conceptual simplicity but unfortunately reduction can't be completely automated in OpenCL and would require a variable number of reduction passes depending on the data size. It ends up more complicated than it seems. Another option is to essentially put a lock around two buffers and fill them up as workers finish, but I have no idea how much congestion this would cause. I could even do most of the computation in a very fast, single dispatch, but then I would need a final clean-up pass which would have to run sequentially and may end up very expensive. None of the approaches are really trivial enough to just mock up and test out, and I can't find a whole bunch of testimonies either.

Until programmers can begin to reason about the expense of parallel programming without resorting to benchmarks, it is always going to be somewhat of a struggle for them to adopt it. This is something that hardware developers, language developers and academia all need to work on together.

For a great lecture on the subject and fascinating delve into the hardware take a look at this by Bill Dally.

github twitter rss