The Ratiocinator: The Universal Universe, Part I: Turing machines

In the mid-1930s, English mathematician Alan Turing concerned himself with the question: Can mathematical reasoning be subsumed by a mechanical process? In other words, is it possible to build a device which, if any human mathematician can carry out some computation, can carry out that computation as well?
To this end, he proposed the concept of the automated or a-machine, now known more widely as the Turing machine. A Turing machine is an abstract device consisting of an infinite tape partitioned into distinct cells, and a read/write head. On the tape, certain symbols may be stored, which the head may read, erase, or write. The last symbol read is called the scanned symbol; it determines (at least partially) the machine's behaviour. The tape can be moved back and forth through the machine.
It is at first not obvious that any interesting mathematics at all can be carried out by such a simplistic device. However, it can be shown that at least all mathematics that can be carried out using symbol manipulation can be carried out by a Turing machine. Here, by 'symbol manipulation' I mean roughly the following: a mathematical problem is presented as some string of symbols, like (a + b)². Now, one can invoke certain rules to act on these symbols, transform the string into a new one; it is important to realize that the meaning of the symbols does not play any role at all.
One rule, invoked by the superscript 2, might be that anything that is 'under' it can be rewritten as follows: x² = x·x, where the symbol '=' just means 'can be replaced by'; another rule says that anything within brackets is to be regarded as a single entity. This allows to rewrite the original string in the form (a + b)·(a + b), proving the identity (a + b)² = (a + b)·(a + b).
You can see where this goes: a new rule pertaining to products of brackets -- or, on the symbol level, to things written within '(' and ')', separated by '·' -- comes into effect, allowing a re-write to a·a + b·a + a·b + b·b, then rules saying that 'x·y = y·x', 'x + x = 2·x', and the first rule (x² = x·x) applied in reverse allow to rewrite to a² + 2·a·b + b², proving finally the identity (a + b)² = a² + 2·a·b + b², known far and wide as the first binomial formula.

This provides a glimpse of how 'mathematical reasoning' can be carried out by a Turing machine: it reads the symbols on its tape; those symbols determine the rules to apply; those rules prompt the rewriting of the string of symbols into a new one; these new symbols again determine manipulations to be carried out on them; and so on, up to the point where either some predetermined target is hit, or no further manipulations can be carried out.
The next important realisation is that every calculation or derivation you can write down uniquely relies on such manipulations; otherwise, the preceding line would not imply the next one unambiguously, and you would have to make a choice of which 'path' to take, which leads to multiple possible results, each of which as justified as each other. This would be rather troublesome in mathematics -- typically, one would like one result to be the 'right' one, rather than being presented with a 'choose your own derivation' playbook...
At this point, I should perhaps fix some notation, in order to minimize possible ambiguities. I will abbreviate a string of symbols with the symbol x; in the example above, x = (a + b)².
The action of a Turing machine on some string of symbols is written as f(x), so in the above case, f(x) = f((a + b)²) = (a + b)·(a + b), for instance. If this is confusing to you, because you're used to f(x) standing for a function taking a number to another number (such as 'f(x) = x²' takes 2 to 4, 3 to 9 etc.), just consider that we can set up a code taking each string of symbols to a string of 0s and 1s -- since again, the question answering game can be used to uniquely determine the symbols --, which is effectively nothing but a binary number, i.e. a number expressed in base 2 rather than the usual ('decimal') expression in base 10; so, on this view, f(x) does exactly what you're accustomed to.
Repeated action of a Turing machine on a string of symbols is again a Turing machine action: f(f(x)) = f(f((a + b)²)) = f((a + b)·(a + b)) = a² + 2·a·b + b² = g(x), which effectively just means that you can 'skip steps', and invent a single rewrite rule equivalent to the consecutive action of multiple rewrite rules (for instance, having shown using the application of multiple rewrite rules that (a + b)² = a² + 2·a·b + b², we could now include this as a rewrite rule directly).
Though far from a formal proof, I hope this at least makes plausible the notion that a Turing machine with a sufficiently rich set of symbols it accepts and rewrites it can perform indeed is capable of carrying out any computation that can be carried out at all. This notion is also known under the name Church-Turing thesis.
Now imagine you had such a 'universal' Turing machine. It has an alphabet of symbols it accepts, and a set of rules governing its behaviour. But, as we have already seen, the set of rules can be modified: certain rules may be 'chunked' together to give new rules; other rules might be broken apart further into sub-rules; it might even be the case that certain rules can be entirely replaced by different ones. Also, of course, the choice of alphabet is arbitrary: setting up a proper code, each string of symbols can be mapped to a different representation -- perhaps a binary one using 0s and 1s, or a decimal one using numbers, or anything else.
Thus, we can build another Turing machine, which can compute everything that the first one can compute, and hence, can also compute everything that is computable at all.
But if both can compute everything computable, that means they must always agree in their computations -- for every computation the first machine ('T₁') carries out, there must be a corresponding one on the second ('T₂'); so if T₁ maps a string x to another string y = T₁(x), there must be a corresponding string w that T₂ maps to an again corresponding string z = T₂(w). Here, corresponding means that w and x, and z and y are related as the two sides of a code: knowing one and performing some operations on it yields the other.
But these operations are again only of the symbol-manipulating kind, and symbol manipulation is what Turing machines do! So, there exists a Turing machine that takes w to x, and z to y, and vice versa: t(x) = w, t(y) = z. The output and input of T₁ and T₂ can be translated into one another -- which only makes sense, as there must be some way to check whether or not both actually do compute all the same things.
Putting it all together, we then get: T₁(x) = y, T₂(w) = T₂(t(x)) = z = t(y). Viewed through the 'lens' of the translator t, T₂ thus acts on x the same way T₁ does; it is thus able to fully emulate the action of T₁. This also applies the other way around, and more generally: any universal Turing machine is able to emulate any other. There is no Turing machine that is able to do anything any generic universal one can't do, which is only sensible, since all any Turing machine can do is compute, and a universal Turing machine can compute anything computable.
But, just how 'married' are we to the construction of 'infinite tape and read/write head' characteristic for Turing machines? There are certainly other ways of embodying the concept of symbol-manipulating computations -- one is for instance 'you, a pencil, and a stack of paper', which after all is the system Turing set out to imitate.
Indeed, people have come up with all sorts of clever schemes -- notably, Church's λ-calculus, register machines, μ-recursive functions, etc. And indeed, all of these can be shown to be equivalent with regards to their computational capacity: i.e. all of them can compute exactly the same things. In turn, they can all be emulated on Turing machines, and are able to emulate Turing machines themselves.
This is embodied in the notion of computational universality: every computationally universal system is capable of emulating every other (at most) computationally universal system.
The parenthetical 'at most' here references two things: first, there are systems 'less' than computationally universal, which trivially can be emulated by computationally universal systems as well; and second, it is at least conceivable that there may be systems that are 'more' than computationally universal. For instance, there exist problems that no computationally universal system can solve (about which, much more later), and one may postulate the existence of a system able to find a solution; such a system is typically termed a hypercomputer.
In this sense, there is no real difference between different kinds of realisations of computational universality, so I will simply use the term universal system as a catch-all in the future, or occasionally speak of Turing machines as a kind of pars pro toto for the whole class of universal systems.
A caveat ought to be attached here: in the definition of Turing machines, it was stated that the length of the tape had to be infinite; in general, any given computation may be arbitrary complex, and thus, need arbitrarily great resources. Of course, in the real world, all resources we can muster are necessarily finite. This means that no true Turing machine can ever be built; at best, they exist as a limiting concept: whenever memory threatens to become scarce, more may be added, using (eventually) all of the resources available in the universe; the question of the finiteness of resources then becomes the question of the finiteness of the universe.
Nevertheless, automata that 'could be' Turing complete (which is a shorter way of saying 'equivalent to a universal Turing machine') if given sufficient memory can be built -- you're sitting at one now, though its architecture (based on work by the Hungarian-American mathematician John von Neumann) is rather different from that of a Turing machine.
Computational universality is the reason that it doesn't matter whether or not you read this on an Apple- or Windows device, or on something else entirely: all of them, being able to emulate one another, yield essentially the same results (or at least, are capable of doing so -- the implementation can in practice be a bit crude occasionally).
Universality also accounts for the possibility of creating virtual machines, simulations of one computer on another. The most common example of this is given by the Java environment. Basically, one might imagine a 'Java machine', a Turing machine that takes inputs and produces outputs according to rules set by the Java programming language, and a 'translator', which tells the machine you use how to interpret inputs in such a way as the Java machine would do, and produce the appropriate outputs. Essentially, this creates a hierarchy-like structure: at bottom, your Windows-, OS X-, Linux- or whatever-system is happily chugging along doing its own idiosyncratic thing, while the translator accounts for the simulation of the same Java machine on a higher level, uniformly on all underlying platforms. (In general, though, the hierarchy is far more complicated than that: at the bottom lies the machine code, which is, essentially, your computer's 'native language', consisting mostly of maps between strings of 0s and 1s; 'above' this, compilers and interpreters translate between different programming languages; programs, machines in themselves, are executed on yet higher levels, in parallel or serially; some may implement virtual machines of varying sophistication, etc. -- but this is unimportant for the main point.)
This has the advantage that, given a proper translator, a programmer does not have to take into account possible differences in end-user systems, as the Java machine works the same way everywhere.
A final thought: earlier on, I had mentioned that 'you, a pencil, and a stack of paper' are just as much a computationally universal system as a Turing machine is -- actually, the pencil and the paper are just auxiliaries, they're not necessary functionally. But then, this means that you can emulate any other universal system -- and especially, do any computation a Turing machine can do, so there's really no reason to be afraid of maths. But even more than that -- if the universe itself is 'nothing but' a universal system, i.e. if it or its salient features can be simulated on a computer, then, at least in principle, everything that goes on within it can be emulated by the human brain -- meaning that the universe is, at least in principle, understandable. It is sometimes said that there is no reason to expect humans to be able to understand the universe any more than there is reason to expect dogs to understand calculus; here, then, there is such a reason.

Freitag, 8. Juli 2011

The Universal Universe, Part I: Turing machines

Keine Kommentare:

Kommentar veröffentlichen