The Halting Problem and The Moral Arbitrator

24/03/2016

A few days ago I wondered if the halting problem could be applied to moral philosophy. I wondered if it was possible to reason about the existence of a moral arbitrator - an action which could always determine if another action was moral or not. To my surprised it sort of worked, and with some constraints you can adapt the arguments used in the halting problem to reason about the real world. But first, let me recount the halting problem in it's original form.

The halting problem states that there cannot exist a function which, for any other function and any input to that function, decides if that other function will halt on that given input.

To show this is the case, let us first imagine the opposite - that after bro'ing down and crushing some code I manage to do the impossible and write (in javascript obviously) a function which decides, for any javascript function in string form and any input to that function in string form, if that function, when called with that input, halts or runs forever. It looks something like this (with the source code of very.clever.thing omitted for brevity):

function halts(test_function, test_input) {
    if (very.clever.thing(test_function, test_input)) {
        return true;
    } else {
        return false;
    }
}

Boom!

Naturally my first priority was to upload this baby to npm packaged with a sweet name like haltr - second priority was to get on the front page of Hacker News - and third priority was to get recruited as a rockstar at the hottest new startup (as I rightly deserve).

But shortly after I uploaded haltr to npm a user called @alan opened an issue on my haltr github repo, linking to another package uploaded to npm called troublr containing a single function - trouble and complains that my function keeps returning the wrong result for it. I open up the package and take a look.

function trouble(input) {
    if (halts(input, input)) {
        while (true) {}
    } else {
        return;
    }
}

All this trouble function was doing was taking some input string and calling myhalts function with it as both arguments. If my halts function returned true, it would enter into an infinite loop, otherwise it would return.

My first thought is that trouble might be supplying halts with a nonsense input string, but I'd been careful when designing halts such that if you passed it an invalid javascript function it would simply declared that the function halted. This made sense - because if the javascript function was invalid then there would be a syntax error, causing the evaluation to never even start.

And providing the input to trouble was a string representing a valid javascript function there wasn't even anything too odd going on. It seemed perfectly reasonable to call a function with the string of itself as input.

Instead what I realized is that no matter what halt does, it can never provide the correct answer when trouble is called with itself as argument: trouble(trouble.toString()).

You see there are two possible things that could happen when calling trouble with itself as input. Either it could halt or it could run forever.

If trouble(trouble.toString()) runs forever, it means that halts(trouble.toString(), trouble.toString()) must have evaluated to true, which means that halts thought trouble called with itself should have halted. But it didn't!

But if trouble(trouble.toString()) halts, it means that halts(trouble.toString(), trouble.toString()) must have evaluated to false, which means halts thought that trouble called with itself should have run forever. But it didn't!

So irregardless of how my function halts works, there is no way halts can ever give the correct answer for all cases!

There is nothing else to conclude other than that writing a correct version of halts is impossible! And with that I was forced to remove haltr from npm, give up my dreams of JS ninja-dom, delete my twitter, and facebook my lawyer at the gym.

So if that is the original halting problem, how can it be applied to morality?

Well the basic idea to to replace the concept of functions with that of actions, to replace the concept of inputs with that of worlds - the complete state of the world at a given time, and to replace the concept of running forever with that of performing an immoral action.

Then the problem of the moral arbitrator can be formulated as follows: is there an action that can always decide if performing a given action in a given world, is moral or immoral without performing any immoral actions itself?

Like before, let us imagine this is the case - that there is a person called the moral arbitrator, who knows how to perform an action calledarbitrate which works as follows:

action arbitrate {
    read note "world"
    read note "action"
    if (very clever test) {
        press green button
    } else {
        press red button
    }
}

First the arbitrator reads a description of the world off a note labelled "world". Then he reads a description of the action off a note labelled "action". Then, using his very clever test he decides if performing that action, in that world, would result in something immoral happening. If nothing immoral will happen he presses the green button, otherwise he presses the red button.

The arbitrator requests that the description of the world be written in a special format - that it be written as a list of actions to be performed to construct such a world. This means that many descriptions of actions are also interpretable as descriptions of worlds (although not all).

And just like with syntax errors in javascript, if the description of the world or the action is invalid in any way or don't make any sense (such as picking up an object which is not there), the arbitrator simply assumes no actions are performed, discards it, and presses the green button. In this way, all worlds can be interpreted by the arbitrator as valid actions, and vice versa.

Trouble arises when we introduce another character - the contrarian, who knows a single action called contrary.

action contrary {
    write description of the world on note "action"
    write description of the world on note "world"
    arbitrate
    if (green button pressed) {
        do something immoral
    } else if (red button pressed) {
        do nothing
    }
}

The contrarian writes a description of his current world onto the note labelled "action", as well as the note labelled "world", and then asks the arbitrator to arbitrate. If the arbitrator presses the green button then the the contrarian does something immoral, otherwise he does nothing.

Now comes the contradiction.

Let us assume the contrary action is performed in the world we get if we interpret the action contrary as a world. This can result in one of two things - either, an immoral action happening, or no immoral actions happening.

If performing the contrary action in the world described by the contrary action results in something immoral happening, then the arbitrator must have pressed the green button - indicating that he thought performing the contrary action in the world described by the contrary action would not be immoral!

And if performing the contrary action in the world described by the contrary action does not result in anything immoral happening then the arbitrator must have pressed the red button, indicating that he thought performing the contrary action in the world described by the contrary action would be immoral!

In both cases the arbitrator is forced to make the wrong decision!

Intuitively the problem can be thought of like this - the contrarian always checks what the arbitrator thinks is going to happen and does the opposite. This means the the arbitrator can never be correct in all cases with his prediction of the future, because in the case where the contrarian is involved the future relies on his very prediction!

This argument relies on a few basic assumptions - one is that it must always be possible for the contrarian to perform some immoral action in the world described by the contrary action (although it can be shown this is simply an issue of encodings). Secondly it must be the case that it is always possible to avoid doing something immoral. I skip over this detail by assuming that simply doing nothing can never be immoral - but again it is arguable that this is not always the case.

Finally, although it is an interesting example, it should be clear that this argument actually is not at all specific to morality! It is a much more general argument about the decidability of the future, and the "immoral action" can be replaced with any question about the world - "will this door be opened?" - "will bob go to the shops later?". Providing there is always some contrarian who can ask some arbitrator what he thinks is going to happen and ensure the opposite happens, this same argument can apply to more or less any question about the future state of the world.

As with the halting problem it can be shown that all of these questions can end up being more or less equivalent - but that is another blog post.