Icon Neural Network Ambient Occlusion


Icon Three Short Stories about the East Coast Main Line


Icon The New Alphabet


Icon "The Color Munifni Exists"


Icon A Deep Learning Framework For Character Motion Synthesis and Editing


Icon The Halting Problem and The Moral Arbitrator


Icon The Witness


Icon Four Seasons Crisp Omelette


Icon At the Bottom of the Elevator


Icon Tracing Functions in Python


Icon Still Things and Moving Things


Icon water.cpp


Icon Making Poetry in Piet


Icon Learning Motion Manifolds with Convolutional Autoencoders


Icon Learning an Inverse Rig Mapping for Character Animation


Icon Infinity Doesn't Exist


Icon Polyconf


Icon Raleigh


Icon The Skagerrak


Icon Printing a Stack Trace with MinGW


Icon The Border Pines


Icon You could have invented Parser Combinators


Icon Ready for the Fight


Icon Earthbound


Icon Turing Drawings


Icon Lost Child Announcement


Icon Shelter


Icon Data Science, how hard can it be?


Icon Denki Furo


Icon In Defence of the Unitype


Icon Maya Velocity Node


Icon Sandy Denny


Icon What type of Machine is the C Preprocessor?


Icon Which AI is more human?


Icon Gone Home


Icon Thoughts on Japan


Icon Can Computers Think?


Icon Counting Sheep & Infinity


Icon How Nature Builds Computers


Icon Painkillers


Icon Correct Box Sphere Intersection


Icon Avoiding Shader Conditionals


Icon Writing Portable OpenGL


Icon The Only Cable Car in Ireland


Icon Is the C Preprocessor Turing Complete?


Icon The aesthetics of code


Icon Issues with SDL on iOS and Android


Icon How I learned to stop worrying and love statistics


Icon PyMark


Icon AutoC Tools


Icon Scripting xNormal with Python


Icon Six Myths About Ray Tracing


Icon The Web Giants Will Fall


Icon PyAutoC


Icon The Pirate Song


Icon Dear Esther


Icon Unsharp Anti Aliasing


Icon The First Boy


Icon Parallel programming isn't hard, optimisation is.


Icon Skyrim


Icon Recognizing a language is solving a problem


Icon Could an animal learn to program?




Icon Pure Depth SSAO


Icon Synchronized in Python


Icon 3d Printing


Icon Real Time Graphics is Virtual Reality


Icon Painting Style Renderer


Icon A very hard problem


Icon Indie Development vs Modding


Icon Corange


Icon 3ds Max PLY Exporter


Icon A Case for the Technical Artist


Icon Enums


Icon Scorpions have won evolution


Icon Dirt and Ashes


Icon Lazy Python


Icon Subdivision Modelling


Icon The Owl


Icon Mouse Traps


Icon Updated Art Reel


Icon Tech Reel


Icon Graphics Aren't the Enemy


Icon On Being A Games Artist


Icon The Bluebird


Icon Everything2


Icon Duck Engine


Icon Boarding Preview


Icon Sailing Preview


Icon Exodus Village Flyover


Icon Art Reel




Icon One Cat Just Leads To Another

You could have invented Parser Combinators

Created on Dec. 1, 2014, 5:19 p.m.

In my book Build Your Own Lisp I use a Parser Combinator library I wrote for C called mpc to teach readers about languages and parsing. Lots of people have asked me how this works, curious as to if they might be able to do it themselves. A few even dived into the mpc source code!

But unfortunately the implementation of mpc in C is quite complex due to various language limitations. Additionally most resources online about Parser Combinators can be quite cryptic for a beginner. This is because much of the text is geared toward people who already know Haskell, the language they were invented using. For those unfamiliar with Haskell, following along can be pretty difficult.

But the ideas behind Parser Combinator libraries are really quite simple. It is more than possible for a beginner to implement a Parser Combinator library in a high level language such as Javascript in just a few hundred lines of code. In fact, you could have invented Parser Combinators...

You start with a function which reads the first character from some input. If this first character is a, then it returns a and advances the input by one. If it isn't, it doesn't advance the input, and returns a failure.

But your boss wants this function to work for any character, not just a. Now normally you'd add a parameter to this function which specifies which character to look for, but today you try something different. You create a new function which takes this character as input, and returns the function you had previously - adapted to look for this specific character. This function you call lit.

function lit(c) {
	return (function(input) {
		var r = inputRead(input);
		if (r == c) {
			inputAdvance(input, 1);
			return c;
		} else {
			return failure;

Your boss thinks you're mad, but you think you're pretty clever, because instead of writing the function you needed, you generated it using another function.

var parser = lit('a');
var result = parser(input);

Now your boss wants you to check if the first character is one of two given characters. He tells you to stop messing around and update the original function to take two character parameters, but instead you decide to again write a function that generates the one your boss was talking about. This one takes as input two functions generated from lit, and returns a function that tries each one in turn, returning the first one that succeeds. This new function you call or.

function or(parser0, parser1) {

	return (function (input) {
		var result0 = parser0(input);
		if (result0 != failure) { return result0; }
		var result1 = parser1(input);
		if (result1 != failure) { return result1; }
		return failure;

You use this in combination with lit to do what your boss asked of you.

var parser = or(lit('a'), lit('b'));
var result = parser(input);

You also make a function that takes as input two calls to lit, and returns a new function that tries each in series. If any of them fail, it fails and rewinds the input, while if both of them succeed it returns the string of both the characters joined together. This you call and.

function and(parser0, parser1) {

	return (function (input) {
		var pos = inputGet(input);
		var result0 = parser0(input);
		if (result0 == failure) { inputSet(pos); return failure; }
		var result1 = parser1(input);
		if (result1 == failure) { inputSet(pos); return failure; }
		return result0 + result1;

You show your boss that this can be used with lit when you want to parse something and then something else.

var parser = and(lit('a'), lit('b'));
var result = parser(input);

Your boss says he has a new request that will break your approach. He asks you to check for either ab or cd at the start of the input, and if either one is present convert the characters to integers and return a list of them. He says this can, and should, be done with just one function, the conventional way.

Annoyingly all your functions are set up to deal with characters and strings, not integers. You wonder if you can make them more general to work across different types. You think that or is probably already quite general - it just returns the result of one parser or the other. So the problem is just with the and parser, but this can be made more general if it returns a list of the results. Now neither and or or rely on character types as output of the parsers.

function and(parser0, parser1) {

	return (function (input) {
		var pos = inputGet(input);
		var result0 = parser0(input);
		if (result0 == failure) { inputSet(pos); return failure; }
		var result1 = parser1(input);
		if (result1 == failure) { inputSet(pos); return failure; }
		return [result0, result1];

This means and and or can take as input any functions that act in a given way - these are any functions that take some input and either fail, or produce some output. These type of functions you call parsing functions and they're what you've been generating. But wait a second - the functions returned by and and or are parsing functions themselves! All they do is take some input and either fail or produce output. So we can construct a parser from and or or, and because the output itself would be a parsing function we could give it as input back to and or or without worrying things will break.

var parser = or(
	and(lit('a'), lit('b')),
	and(lit('c'), lit('d')));
var result = parser(input);

Even so, you are still a bit unsure about how to approach conversion to integers. It would be nice if you could have a function which calls some other function with the result of a parse. Then you could just pass in a integer conversion function to it and use lit as before. No worries - we'll generate it again. You write a function which takes as input some parsing function, and some other function, and returns a function that runs the parsing function and, providing it didn't fail, returns the other function applied to the result. This function you call apply.

function apply(f, parser) {
	return (function (input) {
		var result = parser(input);
		if (result == failure) { return failure; }
		else { return f(result); }


Now you can finally write your boss' request as the following.

var parser = or(
	and(apply(toInt, lit('a')), apply(toInt, lit('b'))),
	and(apply(toInt, lit('c')), apply(toInt, lit('d'))));
var result = parser(input);

You can even tidy up all those repeated function calls by making a lit_to_int function.

function lit_to_int(c) {
	return apply(toInt, lit(c));

var parser = or(
	and(lit_to_int('a'), lit_to_int('b')),
	and(lit_to_int('c'), lit_to_int('d')));
var result = parser(input);

To deal with daily requests you add more and more functions to your library. You add a function that takes one parser and returns a function that runs it zero or more times, collecting all the results in a list (many). You add a function takes three parsers and returns a function that runs them in order and returns the result of the middle one (between). You add utility functions for matching identifiers and for ignoring whitespace around other parsers. The more you use your library the more you realize the code you write is more or less identical to the textual specification from your boss. For complex parsing tasks you can just look up a grammar and map it more or less directly to your library. It is as you give it the problem and the solution is found as if by magic.

A sweet name is only one thing is left before you release your invention upon the world - Parser Combinators.

github twitter rss