A problem that takes the age of the universe to calculate and can be done on your home computer

28/06/2011

In the theory of computation we get taught a kind of fallacy that has to be considered when reasoning about the growing power of computers. For some problems, as the input size grows, the time taken to calculate an answer grows far far faster - meaning that with a non-trivial input size, calculations can be predicted to take many times over the current age of the universe.

(Conversely, it is also interesting to note that this kind of thing can work in the opposite direction. We've calculated pi to several trillion decimal places, but pi calculated to 39 decimal places is sufficient to estimate the circumference of any circle that fits in the observable universe, with the same accuracy of the radius of a hydrogen atom.)

I was talking to a friend about this and made what seemed, at the time, a fairly reasonable claim -

 

"I could fairly easy conceive of and write a program for my home laptop that would take longer than the length of the universe to calculate."


Thinking back to the conversation, I realized this was a bit more of an elaborate claim than I had first though. I began to think about how I might actually do this.

My initial knee-jerk reaction was something like a program that naively loops over the numbers between zero and a bazillion and adds them together. This would be sure to take a long, long time providing I picked a range big enough. But there are several issues with this. First of all, we have a solution to the sum of numbers between n and m, namely the sum of an arithmetic sequence, and it runs in a set time no matter how many numbers we want to sum. Secondly, I figured I'd probably run out of RAM before the universe ended.

It was clear that whatever problem I picked had to be a difficult problem to solve (geek: run in exponential time, be NP-Complete or even better NP-Hard), had to be in some way incremental so as to not blow up my RAM. As well as this it still had to be explainable and comprehensible to the average person.

I began to consider some of the usual areas in which hard problems crop up and in which people tend to understand. Prime numbers are always a good thing to look at, as they are easy to understand, even if some theories about them are complicated. Another area that looked promising was in cryptography, but computers have gained so much in power these days that powerful cryptographic algorithms tend to be really quite complicated to explain. To save my RAM from blowing up the problem had to be in some way incremental, or a "best fit" problem.

I considered some dense, high dimensional search space problems of my own invention, but most of the things I could come up with didn't seem that natural, and wouldn't really be understood by the layman. As usual, doing some research on the internet proved a better bet and I found the perfect example.

 

The Traveling Salesman problem

This is a well known, and well studied problem; notorious for being hard to solve. It goes like this -

 

Given a list of towns, and a list of distances between the towns, what is the shortest route between the towns?


It turns out there is no real technique to solving this problem rather than trying every combination of routes. There are various techniques for finding a route that is either very good, or almost optimal, but nothing that will confirm the correct, shortest path without simply trying all combinations. Due to the nature of the problem, the number of combinations that make up paths grows extremely fast with the number of initial towns. In fact we can be more specific and say that for n towns, the number of combinations that need to be checked is proportional to n factorial or n!.

The reason for this is that we have to construct paths of length n and for each point in the path, we need to consider a route continuing out to every other town not already visited. If you imagine being at a connection m in a path (having visited m towns already), we have to consider paths which continue out to n-m towns (all the unvisited ones), then for each of these towns we have to repeat the process. Starting routes from each of the initial towns, using this process, resulting in a number for the total number of paths that looks something like this: n * (n-1) * (n-2) * (n-3) ... Which is exactly n!

Because of how massive n! gets when n is even slightly large, this means that within about 25 towns, the solution to this problem would already take far, far longer than the age of the universe to compute. We can demonstrate this with some simple maths:

  • We can calculate 25 factorial, and find we have 15,511,210,043,330,985,984,000,000 different possible paths between the towns.
  • We can imagine that checking the distance of a route would probably take several hundred cycles of cpu time
  • So we get a total number of calculations for the problem that is around 1,500,000,000,000,000,000,000,000,000 cycles
  • Modern processors can run at about 3 GHZ, which is roughly 3,000,000,000 cycles per second
  • This gives us a total running time for the calculation of around 500,000,000,000,000,000 seconds
  • The current age of the universe is estimated to be about 14 billion years.
  • This is equal to 441,796,964,000,000,000 seconds Which for arguments sake we will round to 450,000,000,000,000,000 seconds
  • As you can see, 500,000,000,000,000,000 is bigger than 450,000,000,000,000,000

This means that if you had started your calculation at the beginning of the universe, at this point in time, it would be finishing up in about another billion years.

Even if we had a computer that was a million times more powerful (according to moores law, computing power is doubling every two years, so this might be a while) it would still take some fraction of the length of the universe to find the answer, which would be far beyond the span of a lifetime. Even if we had a computer several trillion times more powerful we would struggle with 30 towns, and calculating for 100 towns would more or less always be out of our grasp.

The other surprising thing about this problem is that it isn't one of a kind, there are many similar problems (a whole class of them), but in reality most of the time we don't need the optimal solution, and can use an alternative, less perfect solution. These are often far easier to calculate.

 

Note: Some people have pointed out to me the existence of an algorithm that manages to solve the Travelling Salesman problem in a faster time by eliminating groups of cities that can never render the shortest path. This runs in exponential time and is called the Held–Karp algorithm. Unfortunately this is beyond my maths to explain, and even with this algorithm the problem grows fast enough for the main point of the article to remain, we're just looking at hundreds of cities, rather than tens.


Sources

http://en.wikipedia.org/wiki/Pi
http://www.lycos.com/info/traveling-salesman-problem.html
http://en.wikipedia.org/wiki/Travelling_salesman_problem