The Ratiocinator

The Origin of Force

2012-07-16T11:29:00.000-07:00

As I have expanded upon lengthily in this previous post, interference is a key phenomenon in quantum theory. In this post, we will see how it can be used to explain the existence of forces between certain objects, using the example of the electromagnetic force in particular.

The usual popular account of the quantum origin of forces rests on the notion of virtual particles. Basically, two charged particles are depicted as 'ice skaters' on a frictionless plane; they exchange momentum via appropriate virtual particles, i.e. one skater throws a ball over to the other, and both receive an equal amount of momentum imparted in opposite directions. This nicely explains repulsive forces, i.e. the case in which both skaters are equally charged. In order to explain attraction, as well, the virtual particles have to be endowed with a negative momentum, causing both parties to experience a momentum change in the direction towards the other. Sometimes, this is accompanied by some waffle about how this is OK for virtual particles, since they are not 'on-shell' (which is true, but a highly nontrivial concept to appeal to for a 'popular level' explanation).

In this post, after the introduction, I will not talk about virtual particles anymore. The reason for this is twofold: first, the picture one gets through the 'ice-skater' analogy is irreducibly classical and thus, obfuscates the true quantum nature of the process, leaving the reader with an at best misleading, at worst simply wrong impression. Second, and a bit more technically, virtual particles are artifacts of what is called a perturbation expansion. Roughly, this denotes an approximation to an actual physical process by means of taking into account all possible ways the process can occur, and then summing them to derive the full amplitude -- if you're somewhat versed in mathematical terminology, it's similar to approximating a function by means of a Taylor series. The crucial point is that the virtual particles are present in any term of this expansion, but the physical process does not correspond to any of those terms, but rather, to their totality. So the virtual-particles analogy can't give you the full picture.

Interference, on the other hand, can, or at least so I believe. In order to make this as self-contained as possible (though I would urge you to read the already-linked previous post), let's briefly review a few facts about interference.

Interference

Interference is a phenomenon present in waves. Waves can interfere constructively, destructively, or in various intermediate ways. In order to determine the interference of two waves, you simply superimpose their graphs, and add the amplitudes in the appropriate places. Consider these three different waves:

Fig. 1: Three waves.

If they interfere, they give rise to this new waveform:

Fig. 2: Wave interference.

As you can see, the waves reinforce one another in some places, and cancel each other out in others. For another picture, consider waves upon a surface of water: where a peak encounters another peak, they create a higher (as high as both peaks put upon one another) peak, while where a valley encounters another valley, there will be a deeper resulting valley; while if a peak meets a valley of an appropriate depth, flat sea may result.

Interference depends on the relative phase between the waves. If pictured on the x-axis, the phase denotes the shift of the waveform:

Fig. 3: Two waves, differing only in their phase.

Plainly, the above to waves, if brought into superposition, will interfere completely destructively. The phase of a wave can be effectively depicted as an angle in the so-called phasor representation:

Fig. 4: Three waves, depicted by means of their phases and amplitudes. (Image credit: wikipedia.)

The angle of the little arrows represent each wave's phase; their length their respective amplitudes. Adding the arrows produces a third arrow, representing the phase and amplitude of the wave resulting from their interference.

Now, you can associate a phase to each quantum state. As the state evolves in time, the phase rotates, as in the figure. Thus, quantum states can interfere with one another. This is demonstrated in Young's classical double slit experiment:

Fig. 5: Double slit experiment with phases shown.

I have indicated the evolution of the phase as each state propagates. In the end, the interference pattern is obtained by adding the phases, as shown. At the point where the two sample trajectories meet on the screen, constructive interference occurs, leading to a peak.

Phase Invariance

Another property of quantum mechanics is that it is invariant under arbitrary, global phase changes. Thus, global phase change is a symmetry of the theory. Symmetry is a very important concept in modern physics, perhaps even the most important one. Basically, a symmetry is something you do to a system such that it doesn't change observably.

In this post, we will be concerned with the symmetries of a very simple object, the circle. These are simply the rotations through an arbitrary angle: if I show you a circle, then tell you to close your eyes, do something to the circle, you will not be able to tell whether or not I have rotated it -- the circle still looks the same. (You can also reflect a circle through an arbitrary axis going through its center, but this will not concern us here.)

Symmetries form a mathematical object called a group. Roughly, this just means that you can compose symmetries in a certain way and have the composition be a symmetry again -- in the present case, subsequent rotation through 90° and 45° is equivalent to a single rotation through 135°. There exists an identity element, e, which corresponds to 'doing nothing' -- rotating through an angle of 0°, if you will. There is also the notion of an inverse element, i.e. an operation that undoes a previous one -- rotating by 45° rightwards, if you have previously rotated through 45° leftwards. Just to be complete, for technical reasons, we also require associativity -- mathematically, this is written as (R₁ • R₂) • R₃ = R₁ • (R₂ • R₃), and doesn't mean anything other than that it doesn't matter whether we first perform the rotations one and two, followed by three, or two and three, followed by one.

The particular group that describes the symmetries of the circle is aptly named the circle group, or, somewhat more opaque, U(1).

The phase invariance of quantum mechanics is then encapsulated in the statement that quantum mechanics has a global U(1)-invariance. To see this, let's look at what happens if we change the global phase in the double slit experiment:

Fig. 6: Interference with a global phase shift.

Note that the absolute phases change -- by 90° --, but the relative phase, between the upper and lower path, stays the same. Thus, in the end, the amplitude, which determines the size of the peaks on the screen, is the same as before -- the global phase shift did not change the interference pattern.

Interlude: The Aharonov-Bohm Effect

Now, we've got the machinery in place to talk about another 'weird' consequence of quantum theory: the Aharonov-Bohm effect, in which the presence of an electromagnetic potential alters the relative phases of quantum states. The setup is a slight modification of the double slit experiment:

Fig. 7: Aharonov-Bohm setup. Note the shifted interference pattern.

Basically, we introduce a solenoid (the circle), idealized to be infinitely long, into the setup. This solenoid introduces a certain electromagnetic potential, denoted A_μ. As a consequence, we observe a shift in the interference pattern (the effect of the potential A_μis shown as a change in speed in the evolution of the phases.)

This is quite curious! Classically, one would not expect an effect to occur. This is because the magnetic field B outside of an infinite solenoid is equal to zero (it is only nonzero inside), and thus, there is no influence by the field on the particles going around the solenoid.

I will not, at this point, attempt to provide an explanation of the Aharonov-Bohm effect. We will simply take it as an experimental datum to be incorporated into our theory, and proceed.

Local Phase Changes

After this, let's consider what happens if we change the phase not globally, but locally, i.e. apply our phase shift only to one of the beams. Quantum mechanics is not invariant under such a transformation; it is not a symmetry of the theory. Thus, the interference pattern changes:

Fig. 8: Non-invariance under local phase transformations. Again, note the shifted interference pattern.

But what if, for whatever reason, we want a theory invariant under local U(1) phase changes? Well, we can use the Aharonov-Bohm effect to our advantage, and introduce an appropriate electromagnetic potential such that the effect of the local phase change is exactly cancelled, and the original interference pattern is recovered:

Fig. 9: Introducing an electromagnetic potential restores the original interference pattern.

Thus, if we want a theory that is invariant under local U(1) transformations, we must necessarily include an electromagnetic potential! And with the potential, we get the rest of electromagnetism, as well. (For the mathematically inclined, the electromagnetic potential I have introduced is a four component quantity A_μ = (cΦ, A₁, A₂, A₃), where Φ denotes the electric (scalar) potential, the gradient of which is the electric field E, and A = (A₁, A₂, A₃) denotes the magnetic (vector) potential, the curl of which is the magnetic field B.)

This is an amazing result -- most of the phenomena we experience in everyday life (excluding those related to gravity) are a result of the electromagnetic force: not just obvious things like lighting, TVs and other electric appliances, but also the fact that you can't move through walls or fall to the floor, every optical effect etc.

That merely postulating local U(1) invariance serves to explain things as majestic and magnificent as rainbows and lightning storms speaks to the great power of the principle.

Test case: Electrostatic Repulsion
To this point, this all probably seems rather abstract: with careful juggling of phase transformations and suitably introduced potentials, one can ensure the invariance of interference patterns on a screen. This does not seem to have much to do with what we ordinarily think of as electromagnetism. So let's consider a paradigmatic example, the electrostatic interaction between two point charges (for instance, electrons) and see how our new understanding sheds light on the matter.
First, another reminder. Previously, I've told you how the speed of the phase change of a particle dictates the path it takes through spacetime. To briefly recap, consider not just a double, but triple, quadruple, and so on, slit experiment, with not just one, but many perforated screens in between.

Fig. 10: Multi-slit experiment.

Clearly, at every point, the amplitudes of all quantum states have to be summed, according to their respective phases. This is the kernel of the path integral formulation of quantum mechanics -- imagine infinitely many slits and infinitely many screens; what you get is empty space, since at every point in space, there will be a slit. Thus, in order to calculate the propagation of a particle through space, you have to sum up the amplitudes to propagate through every point in space, according to the relative phases. In this way, there will emerge a most likely path for the particle to take, and this path will be the classical path -- in empty, flat space, a straight line.

This occurs because the more a path deviates from this classical path, the faster the phase of the particle following that path changes; but this means that on these non-classical paths, the particle grows ever more likely to destructively interfere with itself the more the path deviates from the classical one. Thus, the classical path emerges as the most likely one.

The 'speed' of the particle's phase evolution is given by a quantity known as the action, which depends on the difference between its kinetic and potential energy. The greater the action, the faster the phase rotates, the less likely the path is to be taken. Thus, the principle of least action emerges from quantum mechanics.

Now let us look at the aforementioned case of two like charges, say electrons. For simplicity, assume that one, the leftmost, is fixed to its position by some means.

Fig. 10: Two charges. Which of the possible paths will the right one take?

Note that in the figure, time runs upward, space to the right; so any particle stationary in space will follow a straight upwards-directed trajectory in the plane, as indicated for the left charge. Our task is now to determine the path of the second, free, charge from what we have learned so far. Will it be attracted to the first (path (1))? Will it remain stationary, as well (path (2))? Or will it be repulsed, and if so, how much (paths (3) and (4))?

To answer this question, we need to take a closer look at the quantity I have previously introduced, the action. For a particle following a certain path between the times t₁and t₂, the action can be written in the form S = (T - V)(t₁ - t₂), where T and V denote the average kinetic and potential energy, respectively. Plainly, this increases with increasing kinetic energy -- which is a good thing, otherwise, the action could always be minimized by going faster, so everything in the universe would spontaneously accelerate without bound for no good reason. So Newton was on a good track with his first law -- things on which no force acts indeed don't accelerate.
Now for the potential energy. Since we are in an electrostatic setup, we can simply consider the electric potential Φ. An electron in a potential Φ has a potential energy V = -eΦ. Φ gets bigger the closer the two electrons are together; thus, moving them apart lowers the potential, and with it, the action (note the all-important minus sign!). Thus, there is a 'sweet spot' for the action: it will be minimal on paths that take the second electron farther away from the first one -- but not too fast, or otherwise, the kinetic energy gets too large!
Qualitatively, this thus nets us the immediate conclusion that like charges repel. Just as immediately, by exchanging the minus sign I just drew your attention to with a plus, corresponding to exchanging the left, negative charge by a positive one (say a positron), we obtain that opposite charges attract.
I should, perhaps, loose a few words on what, exactly, I mean by 'charge'. Essentially, the charge of an object sets a 'speed scale' for the rotation of its phase: the higher the charge, the faster the rotation. Thus, more highly charged particles react more strongly to electromagnetic fields (and uncharged ones don't react at all) -- which is as it should be. Charge thus determines the strength of the coupling to the electromagnetic potential.

Recap

Since we have come quite a way from where we started, it is useful to take a moment and recapitulate what brought us here. We started out with the phase invariance of quantum mechanics. The Aharonov-Bohm effect taught us that changes in local phase, i.e. changes by local U(1) transformations, can be countered by the introduction of an electromagnetic potential. Thus, we obtained a theory invariant under local phase changes. This theory, as we have seen, successfully predicts the basic principles of electromagnetism, such as attraction/repulsion of opposite/like charges. In fact, it is easy to go a step further: the least action principle straightforwarldy implies Newton's second law, F = ma; in this particular case, F = -dΦ/dx = -eE, and thus, the Lorentz force law follows. (And since we could have equally well undertaken these considerations for the other charge, Newton's third law follows just as well.) It is remarkable that all this can be deduced from quantum interference and the simple symmetry principle we have introduced!

However, using the Aharonov-Bohm effect threatens the argument with circularity: it uses a dependence of the phase on the electromagnetic potential to show that particles react to the electromagnetic potential. Understanding this as an experimental input removes the circularity, however, the argument can be made fully independently, though at the expense of a little math. Basically, in order to insure invariance under local U(1) transformations, we find that we must modify our notion of differentiation; in order to do so, we must again introduce a quantity that has the effect of again removing the position-dependence from the theory. Mathematically, this quantity is known as a connection, and physically, it turns out to be just the electromagnetic potential A_μ.

Thus, the argument is complete in itself, and all the input that is needed is the local U(1) invariance.

What is A_μ?
Up to now, I have been cagey on the nature of the mysterious quantity we needed to introduce into the theory to insure invariance. The reason for this is that it's not easy to grasp. Physically, quantities like A_μ are called 'gauge potentials', for historical reasons. The defining characteristic of gauge potentials is that they're arbitrary to a certain degree -- meaning, there exists more than one choice of A_μ that lead to the same electric and magnetic fields, which, after all, are the only things we observe physically. So it seems as if A_μ can be little more than a bookkeeping device, and classically, that is indeed the way it is most commonly treated.
However, as we have already seen, in quantum mechanics, the potential takes center stage. Electrons don't couple to magnetic or electric fields, they couple directly to the potential. Furthermore, in the Aharonov-Bohm setup, an effect is observed even though no particle ever had a non-negligible amplitude to be in any region where the electromagnetic field differs from zero.
For the moment, we will leave the question of the 'reality' of the gauge potential aside, and instead consider its properties. In quantum field theory, each field is accompanied by a particle, corresponding to an excitation of the field: picture the field as a three-dimensional array of points, connected with springs; if you pull on one of the points, and then release it, it will start to oscillate, causing nearby points to oscillate as well, and so on. This excitation corresponds to the particle.
So, what's the particle associated to A_μ? First of all, we note that the electromagnetic interaction appears to be infinite in range. This means that the particle must be massless, due to an elementary consideration involving the energy-time uncertainty principle, ΔEΔt ~ h, where h is Planck's constant. The energy of the particle, given by the relativistic mass-energy relationship, is mc², and thus, it can't exist as an intermediary ('virtual') state for a time longer than h/mc², during which it can propagate a distance of at most hc/mc² (the heuristic nature of this argument makes me not worry about factors of 2π and such). Thus, only massless particles can mediate forces across infinite distances.
Second, it is itself electrically neutral, at least to a high degree of certainty. If that weren't the case, it would couple to itself strongly, leading to novel features, like possibly confinement (confinement is a property of the strong interactions, where the force mediating particles -- the gluons -- indeed carry appropriate charges themselves; effectively, its effect is to 'shield' quarks from direct observation. Since we observe electrons directly, we know the electromagnetic interaction is not confining.).
Third, its spin is 1. This essentially follows because A_μ is a vector field; scalar fields (just numbers at certain spacetime points) lead to spin zero particles, 2-tensor fields (like the metric of general relativity, g_μν) have a spin of two, etc.
Thus, the particle, or field quantum, of the electromagnetic potential is massless, chargeless, and has spin 1 (and is thus a boson). That particle is what we normally call the photon. So this, too, comes out of our simple requirement of local U(1) symmetry -- let there be light, indeed.

Postscript -- Other Forces
Besides the electromagnetic force, there exist (at least) three more forces in nature, which combined serve to explain the full multitude of phenomena hitherto observed (or close to that, anyway). These forces are the two nuclear forces, called 'weak' and 'strong' in a very down-to-the-point manner, and gravity. The strong force keeps the nuclei together and thus, ensures the stability of matter; the weak force is responsible for nuclear processes, and among other things, makes the sun shine. Gravity, of course, keeps our feet on the ground -- perhaps the most important of the three.
For the two nuclear forces, a story similar to the one I have just related can be told. The major difference is in the symmetry groups -- the weak force requiring local SU(2) symmetry, which is related to the symmetries of a three-dimensional sphere, while the strong force is described by a SU(3) gauge theory called quantum chromodynamics. Both accounts differ from the simple one I have given in this post in some ways: one difference is that the symmetry transformations no longer commute, i.e. doing one transformation, then the other, is not the same as doing them in reversed order. This is not that unusual: rotations in three dimensions behave the same way. Take any object, such as a book, and rotate it first through one axis, then through another; put it back into its original state, and do the rotations in inverse order, and you will generally get a different final orientation.
Both forces also have their own quirks. For instance, there is, strictly speaking, no theory of the weak interaction on its own; instead, it is described in a unified way together with the electromagnetic interaction, via the so-called electroweak theory, based on the direct product of the groups SU(2) and U(1), SU(2) x U(1) -- we speak of a unified theory in this case. In order to get the usual phenomenology from the theory, this symmetry has to be broken, which is the main job of the recently-discovered Higgs field (the mass-giving aspect comes in almost accidentally). An added complication is that the U(1) responsible for electromagnetism is not the straightforward one in SU(2) x U(1), but actually a different subgroup.
The strong force, on the other hand, shows the peculiar phenomenon known as confinement -- since its gauge bosons carry the charges appropriate to the theory themselves (conventionally denoted as the three colors red, green, and blue), they conspire, via self-interaction, to hide the explicit workings on the quark level from view, making only color-neutral ('white') states observable.
Apart from these differences, though, the conceptual structures of these theories are as described in this post, and their conceptual similarity has led them to be subsumed in one immensely successful meta-theory called simply the standard model, based on the gauge group SU(3) x SU(2) x U(1). This is, however, not a genuine unification in the way the electroweak theory is; thus, the quest for a more fundamental, and perhaps theoretically more appealing so-called grand unified theory (GUT), the basic strategy for which is to embed the standard model group in some larger group, such as SU(5), SO(10) or the exceptional E(6), is ongoing.
Gravity, on the other hand, seems much more reluctant to join into the reign of the other forces. Our understanding of gravity is essentially macroscopic, and so far, it has resisted all attempts to be brought in line with the quantum paradigm. While there are tantalizing similarities with the gauge structure of the other forces, every attempt to join them on traditional terms so far is either inconclusive, or has ended in outright failure. Thus, the quest for quantum gravity -- and the even larger endeavor to unify all the forces into one single theory of everything -- remains today unfulfilled, and the greatest challenge for physics.

The Interpretation of Quantum Mechanics

2012-01-16T09:26:00.000-08:00

So far on this blog, I have argued that quantum mechanics should be most aptly seen as a generalization of probability theory, necessary to account for complementary propositions (propositions which can't jointly be known exactly). Quantum mechanics can then be seen to emerge either as a generalization (more accurately, a deformation) of statistical mechanics on phase space, or, more abstractly (but cleaner in a conceptual sense) as deriving from quantum logic in the same way classical probability derives from classical, i.e. Boolean, logic.

Using this picture, we've had a look at how it helps explain two of quantum mechanics' most prominent, and at the same time, most mysterious consequences -- the phenomena of interference and entanglement, both of which are often thought to lie at the heart of quantum mechanics.

In this post, I want to have a look at the interpretation of quantum mechanics, and how the previously developed picture helps to make sense of the theory. But first, we need to take a look at what, exactly, an interpretation of quantum mechanics is supposed to accomplish -- and whether we in fact need one (because if we find that we don't, I could save myself a lot of writing).

Fundamentally, this is a question of what one expects of a physical theory, i.e. what one thinks a physical theory is, and what it supposedly provides. Broadly, one can identify two strands of answers in regard to this question: one is realist, basically the conception that a theory tells us something about things that are out there in the world, that really exist, in some sense; the other is anti-realist or perhaps non-realist, a position which maintains that either there is no such thing as an 'out there', or that its existence is wholly immaterial to physical theory -- science mainly concerns itself with what we can say about nature, not with how nature actually is.

This latter position is also known as instrumentalism, and it is when it comes to quantum mechanics most closely associated with the figure of Niels Bohr. Basically, it amounts to the position that physical theories in general, and quantum mechanics especially, should be treated as a kind of black box, into which one can input a precise statement of a physical problem, such as an experimental setup, and which then outputs the expectation for the experiment's outcome. This is a consistent and in principle adequate point of view that one always can take recourse to; conceptual problems raised by any given theory may essentially be treated as of little consequence, as the theory is only an artifice, a construction to relate initial conditions to observable outcomes, where no ontological weight is put on the in-between machinery.

Nevertheless, if one expects explanations, as opposed to mere predictions, from physical theories, an answer not only to what happens, but also to how it happens, and perhaps even to why, then this point of view strikes me as deeply inadequate. In particular, the correctness of any given theory becomes an article of faith alone: even though your black box has output the right predictions a thousand times in a row, one does not have license to infer that it will do so again the one-thousand-and-first time, whereas with a theory that explains what happens through explaining how it happens, one can stake one's faith at least in the presumption that if the mechanism is correct, i.e. the answer to the how is in some way a faithful representation of how things actually happen, then the prediction should be expected to be correct any given time. There is also a more aesthetic flaw -- on an instrumentalist reading, correct physical theories do not alleviate our ignorance about the world, they in fact only compound it: not only do we not know how nature works, we also don't know why a certain theory appears to describe the outcome of experiments as well as it does.

Thus, it strikes me as far more desirable to find an account for a physical theory that explains its correctness in terms of its relation to nature, i.e. to what 'actually' happens. Nevertheless, I find myself compelled to use scare quotes whenever talking about things 'actually' happening or being 'really there'. The reason for this, as I have previously argued, is that I don't necessarily believe that there is a unique matter of fact to what 'actually' happens: there may be different accounts, different models, or gauges, if you will, that lead to the same observable reality (to reiterate an example I used before, consider the way the knight moves in a game of chess: there are many different accounts one can give of the rule -- one straight, one diagonally; in the form of an L; two straight, one straight in an orthogonal direction, etc. --, which all lead to the same game of chess). Ultimately, this is because of computational equivalence: different computations yield the same output, indeed any universal computation device can be used to emulate, or can be seen as, any other, and thus, computes all and only the same things.

Be that as it may, this puts me in the position of trying to understand what quantum mechanics is trying to tell us about the world -- which is to interpret it.

The Schizophrenic Quantum Picture

The main problem in the interpretation of quantum mechanics is known as the measurement problem. Broadly, it can be formulated as the problem of how a theory with all the apparent fuzziness and vagaries of quantum mechanics can account for our experience of a determinate, definite reality, or even for the specific outcomes every measurement generates.

More specifically, it can be seen as the tension between two ways quantum systems seem to evolve: one is the unitary evolution dictated by the Schrödinger equation; the other is the sudden 'collapse' of this description to settle on a definite answer to a question posed in the form of measurement.

To make this more explicit, recall the notion of superposition: a quantum system may be not merely in either of two dichotomic states, but also in an arbitrary linear combination of them. So if |0⟩ and |1⟩ denote two possible states for a qubit, then its general state can be written as |ψ⟩ = ɑ|0⟩ + β|1⟩, where ɑ and β are in general complex coefficients such that |ɑ|² + |β|² = 1, and where |ɑ|² and |β|² give the probability that experiment will find the qubit in the state |0⟩ or |1⟩, respectively (see also the entry on interference). (In the following, I will not bother with carrying these factors through; sticklers may freely introduce them as needed, or multiply everything with 1/√2 to ensure equal probability and normalization.)

The problem with a state like |ψ⟩ is that whenever we undertake a measurement, what we actually find is either |0⟩ or |1⟩, and that within the usual quantum dynamics, there is no process that can account for this.

To see this, consider that there is good reason to believe that quantum mechanics ought to apply to all physical systems equally. The simplest argument is that ultimately, everything is made out of particles and quanta, and there is no rule so that if a system reaches a critical size, it fails to be described by quantum mechanics (though there are attempts to introduce just such a rule to account for the problems). So whatever we use to measure a quantum system should itself be describable by quantum mechanics -- and thus, the interaction between the measuring apparatus and the system ought to be no different from any other quantum interaction. So if we write the state of the measuring apparatus pre-measurement as |ready⟩, and we make a measurement on a system in the (non-superposed) state |0⟩, then we ought to expect, if the apparatus is faithful, i.e. always indicates the right state after the measurement, an evolution like the following:

|ready⟩|0⟩ → |"0"⟩|0⟩,

where |"0"⟩ just means the state of the apparatus that indicates that it has measured the qubit to be in the state |0⟩, say by having a certain light lit, or a pointer in a certain position. Similarly, if the qubit's state is in fact |1⟩, then the evolution should be:

|ready⟩|1⟩ → |"1"⟩|1⟩,

where again |"1"⟩ means that the measurement apparatus is in a certain state that indicates the outcome of its measurement was to find the qubit in the state |1⟩. So far, this is all well and good.

There is a notion here that I should introduce, known as the eigenvalue-eigenstate link. This means nothing else than that a system can only be said to have a certain property if it is in an eigenstate of having that property; thus, for a qubit to have a certain value, it needs to be in an eigenstate of having that value, i.e. either |1⟩ or |0⟩ for a value of 1 or 0 respectively. Thus, if it is in a superposed state, it does not have any definite value.

But now, consider the qubit to be in the superposed state |ψ⟩ = |0⟩ + |1⟩, where as mentioned above I have neglected to normalize the state. Now, knowing that the quantum dynamics is linear, the evolution for the total state is given by the combination of the evolutions of the two components as described above. Thus, we have:

|ready⟩(|1⟩ + |0⟩) → |ready⟩|1⟩ + |ready⟩|0⟩ → |"1"⟩|1⟩ + |"0"⟩|0⟩.

But notice what's now happened: if |"1"⟩ indicates a definite state of the measurement apparatus, and likewise does |"0"⟩, then the above state is one in which the apparatus is in neither of those states -- it fails to have a definite state at all, and is in a superposition of being in the states indicating having measured the qubit as |1⟩ and having measured it as |0⟩, corresponding to the superposition the qubit itself is in.

However, that's very different from our experience with actual measurements -- where no matter what state the qubit is in, we always get a definite outcome, with the measurement apparatus exclusively ending up in the state of either |"1"⟩ or |"0"⟩. Plainly, the linear quantum dynamics does not account for this. What to do?

The first, and perhaps most obvious, attempt at fixing this situation was to introduce, besides the usual linear dynamics, another, second dynamical process, the so-called collapse dynamics. According to this idea, the description above, upon measurement, probabilistically 'collapses' to one of its components, with a probability given by the usual squared-modulus rule. That is, somehow nature picks out one of the components of the superposition as the 'real' one, and discards the other(s).

This process is sufficient to explain the above conundrum -- the measurement apparatus, along with the qubit, always ends up in a probabilistically determined, definite state. But it has its own severe shortcomings.

First of all, there is no hard-and-fast rule that determines which interactions, exactly, are supposed to count as 'measurements', and which aren't -- so that there is a certain degree of arbitrariness regarding which dynamical rule to apply in what case. This leads to situations of a somewhat paradoxical character: for instance, if the measurement apparatus and the system it measures are both quantum systems as stipulated, I should be able to describe the evolution of the combined system using the ordinary linear quantum evolution, as long as I don't execute any measurement on this system; but this seems flatly inconsistent with the need of applying the collapse dynamics within the system as the measurement apparatus measures the object system.

This leads to, for instance, the famous thought experiment known as Schrödinger's cat: a cat is, probably in gross violations of animal rights, put in a box, along with a devious mechanism consisting of a quantity of radioactive material, a detector, and a vial of poison such that, if an atom decays within the radioactive substance within some fixed time span, say an hour, the detector triggers a mechanism that breaks the vial and kills the cat.

The argument is often given that the cat, the detector, or 'the environment' constitutes an observer executing a measurement, leading to the collapse of the wave function; but this is in fact not sufficient, since I, standing outside the system, careful not to accidentally measure it, should be able to apply the usual linear dynamics to account for the system as a whole. (Of course, in reality, I would almost immediately end up 'accidentally measuring' it, as the macroscopic nature of the system means it will develop correlations with the environment, and in turn, me, extremely rapidly.)

Another problem is that it is hard to make such a formulation consistent with the special theory of relativity, which allows actions to only travel at a maximum speed of c, the speed of light in vacuum. The conflict exists because, for instance, when one performs a position measurement on an electron, the wave function goes to zero instantaneously at every point in space, except that at which the electron was indeed found.

However, a more troubling problem for me, philosophically, is the indeterminacy the collapse introduces into the description, though I seem to be somewhat alone in my worry there. Basically, the linear dynamics of the usual quantum evolution are perfectly deterministic -- which especially entails that there exists sufficient reason for any state of the system, given by the prior state of the system and whatever it interacts with. But in the collapse, this principle of sufficient reason (due to our old friend Leibniz) fails: there is nothing that determines which state a superposition collapses to; the collapse dynamics thus introduces indeterminism into the description. This is deeply troubling to me, because it implies that some things just can happen without reason; there is no use in further questions. There is no more 'why', no matter of fact regarding why the wave function collapsed to this particular state rather than another. It just happened that way.

This seems problematic to me for two reasons. First, if there is no answer to why something happens a certain way, I can't see a way how it could possibly happen -- there can't be any mechanism according to which it happens, as such a mechanism would ultimately determine the outcome, would give an answer to why by answering how. There must be some decision in some sense, for one possibility over another -- as otherwise the issue would remain undecided --, and yet, there can't be any process by which that decision is reached.

Second, it just seems to defeat the purpose of the whole scientific endeavor -- we've reached a point beyond which there is no more answer other than 'it just happens that way'; but then, we might as well never have started to go down that path, and just look at every phenomenon and explain it by 'it just happens that way'. It just happens that way that planets orbit the sun on ellipses; it just happens that way that like charges repel; it just happens that way that quantum particles can produce interference effects. If that is an acceptable answer anywhere, it should be an acceptable answer everywhere.

So to me, the collapse can't be the right answer to the measurement problem -- even if it can be made to work consistently, the price to pay just seems too high.

The Universal Wave Function

Another problem with the collapse (or rather, the extension of an already mentioned one), to me, seems to point the way to a better resolution. If quantum mechanics is a theory that truly applies to all physical systems, then it should also apply to the universe as a whole. But if only measurement causes the collapse of the wave function, then, since nothing measures the universe -- the universe being all there is --, the wave function of the universe can never collapse, but is described exclusively by the linear 'dynamics' (though the question of what, exactly, 'dynamics' might mean when considering the universe as a whole is not easily answered). But then we have the same situation as we had with me, as an outside observer, describing the 'cat in a box'-system using the linear dynamics, while an 'inside' observer, such as the cat or the detector, might want to use the collapse dynamics in order to describe his apparently determinate experience.

One should note that there is an in principle measurable difference between a system in which the collapse has already occurred, and a system still in superposition, so that I could in principle undertake a measurement that tells me whether the cat-in-the-box system is in a superposition (and thus, contains a cat for which there is no definite matter of fact of whether it is alive) or not -- so it's not the case that both descriptions give the same answer; in fact, they are inconsistent.

But if it is then right that the universal wave function never collapses, we are led to consider a point of view in which no collapse ever occurs. This is the position of Hugh Everett III, and at first, it must seem like utter nonsense, as it appears to manifestly fail the requirement of providing an explanation for the appearance of a determinate world out of the quantum description -- because whatever is in superposition, on this account must stay in superposition, and we should thus generally fail to have any determinate experiences at all.

Nevertheless, in his 1957 doctoral dissertation, Everett proposes to do exactly that: derive the appearance of a collapse from the linear dynamics of quantum mechanics. The problem is: it is never made exactly clear how this is supposed to work. Some claim that the different components of the universal wave function should be understood to be distinct universes or worlds, which split apart from each other every time a supposed 'collapse' happens; others think the split occurs only on the level of the 'minds' of an observer; and yet others just flatly deny the existence of any objective facts, arguing that any fact can only be relative -- I measure spin up relative to the particle being spin up; the cat is dead relative to the atom being decayed. I will not enter deeply into the field of Everett exegetics here; rather, I will just mention some general problems faced by all Everettian interpretations, and then focus on a specific approach that I find most interesting.

In looking for an apparent collapse, a couple of features stand out: the first is the appearance of a definite experience, which brings with it the continuity of said experience (a wave function, once collapsed, will yield the same results on repeated measurements), and the intersubjective agreement on this experience (if I measure the particle to be spin up, so will you); the second, somewhat more subtle, is the production of entropy. This is because the collapse is a non-information preserving process: once the wave function has collapsed to a certain state, that state does not contain enough information to reconstruct the previous state -- many different states can collapse to one and the same final state. By contrast, the linear dynamics is completely information preserving (what physicists call 'unitary'), and thus, in particular, deterministic and reversible.

Ways To Slice The Quantum Cake

The intention of Everett, arguably, was to show that while objectively, no wave function collapse ever occurs, subjectively, things may well appear as if it did. In particular, if we model an observer as a quantum system who looks at the measurement apparatus (is in the state '|looking⟩'), from our above considerations, the evolution would be the following, if a superposed system is measured:

|looking⟩|ready⟩(|1⟩ + |0⟩) → |1!⟩|"1"⟩|1⟩ + |0!⟩|"0"⟩|0⟩

The observer will thus evolve to a state with both components of |1!⟩ and |0!⟩, where for instance |1!⟩ means 'sees the outcome of the measurement to be 1'. He thus would determinately believe to have seen either state, and subjectively, it would appear to him as if a collapse had occurred (since it appears to him that way if he ends up in the state |1!⟩, as well as if he ends up in the state |0!⟩, by the linearity of the dynamics, it must appear to him that way in any superposition of these states).

So let us at least provisionally grant Everett that he indeed accomplishes this. An urgent question remains: why does the observer see the world he does? The above decomposition of the quantum state is not unique; one can write it in a different basis, which may entail a very different picture of the world.

In general, an arbitrary quantum state |ψ⟩ can be written as a linear combination of basis states: |ψ⟩ = Σ_ic_i|ψ_i⟩, where the c_i are complex coefficients. Thus a qubit state can, as we have already done above, be written as |ψ⟩ = 1/√2|1⟩ + 1/√2|0⟩, where I have explicitly reinstated the coefficients, 1/√2 in both cases. However, I can introduce a different basis, |+⟩ = 1/√2|1⟩ + 1/√2|0⟩ and |-⟩ = 1/√2|1⟩ - 1/√2|0⟩, from which I can just as well construct every possible qubit state. And the superposed qubit state from before, written in the new basis, can now simply be expressed as |ψ⟩ = |+⟩ -- manifestly not a superposed state!

So it seems that, again, we can tell two equally valid, but apparently contradictory stories. In one, the qubit, and hence, the measurement apparatus and the observer is superposed -- in the many-worlds picture, there has been a split into two 'copies' of each, differing with respect to 'seeing 1' or 'seeing 0' as measurement result --, while in the other, there is no superposition, and there's a unique observer in a definite state, so no split has taken place.

Of course, in a measurement, it is ultimately the measuring apparatus that defines the basis (for instance, through its orientation in space), and since we're (as observers) the ultimate buck-stops-here measuring devices, that means us; so if we don't ask why we are the way we are, we can postulate a basis defined through us that solves the problem (in the literature known as the 'preferred basis'-problem). In this sense, it would be our point of view that determines the basis, and thus, the way we see the world.

But a better answer is possible. In order to understand this, we must first realize that ultimately, every realistic quantum system is open -- i.e. there is always interaction with an environment not taken to be part of the experimental setup. This environmental interaction introduces decoherence, the (apparent) loss of quantumness: decoherent states can no longer interfere, and thus, behave like systems governed by classical probability theory. Effectively, the interaction with a large system, such as the macroscopic environment (which may include measuring devices, cats, humans...) greatly increases the total number of states available to the total system; but the capacity of two states to interfere is described by their overlap in the state space, and with the increased number of states, that overlap will tend to very small values very quickly.

However, decoherence doesn't treat all states equally. Some states very quickly evolve into mixtures of one another -- a sentient being in such a world would not have time to perceive any given state of the world; there would be no basis for perception, or cognition, in such a reality, such a basis. But certain kinds of states, so-called 'pointer states' (because they correspond to measurement devices, whose pointers indicate a certain outcome), are more robust to such environmental interactions. These states can then be used to construct a preferred basis, in which a classical reality emerges -- objects are well-localized, interference effects (almost) vanish, etc. This process has been given the nickname 'einselection' (from environment induced superselection) by Wojciech Zurek.
Here, the quantum cake slices itself, so to speak -- a point of view, and with it, a way to view the world, emerge jointly and dynamically. In some sense, the observer and the observed determine one another.

Decoherence is then a mechanism that may lead to the appearance of wave function collapse in Everettian interpretations, by essentially removing different branches from one another through precluding their mutual interference. And indeed it can generate the entropy production we have surmised is necessary to give the appearance of a collapse: the information is dissipated into the environment; the loss of coherence is an irreversible process, though only effectively so -- a being with perfect knowledge of and absolute control over all degrees of freedom of both the system and the environment could reconstruct the original state from the final one. However, decoherence only accounts for the emergence of definite experiences within a framework such as the many-worlds interpretation -- while it causes the quantumness of the system to 'leak' into the environment, the global wave function is still in superposition. Thus, contrary to what is sometimes claimed, it does not on its own solve the measurement problem.

Everything's (Im)probable

Another problem that's facing any Everett-like interpretation is the so-called problem of probability. In a nutshell, probability, as usually understood, is a measure of how much we should expect some event to occur, to the exclusion of other, incompatible events -- this understanding of probability then only makes sense if one thing happens, rather than another. But without a collapse, in a measurement, all possible alternatives do, in fact, occur, as terms in the global superposition. What can we then mean by the probability of getting a certain outcome?

Or, for another take on the problem, Everettian quantum mechanics is a deterministic theory; thus, the only way for probability to arise is through ignorance. However, one can in principle know the complete state of a quantum system -- by simply preparing it in the appropriate state -- and nevertheless be only able to predict the outcome of certain experiments in a probabilistic way. But if we've got total knowledge, and the theory is deterministic, we should be able to give exact predictions!

One response to this problem has been given by Lev Vaidman. He considers a setup in which the experimenter is given a sleeping pill before the experiment; depending on the outcome, he will then, in his sleep, be moved to one of two different rooms. Upon waking up, but before he opens his eyes, he will be asked: 'In which room are you?'

Clearly, he can't definitively answer this question -- all he can do is to calculate the probability of being in either room. Thus, the observer is in fact ignorant about the state of the world, and the interpretation of probability as arising due to ignorance is restored.

However, to precisely quantify the probability, one still has to postulate the usual Born rule, perhaps bolstered by an interpretation for the probabilities in which they give the 'weight' or 'measure of existence' of distinct branches, or worlds. Other approaches, most notable the one by Deutsch, expanded upon by Wallace, attempt to even derive the specific form of the Born rule from the linear dynamics -- in particular, they adopt a decision-theoretic approach, showing that expecting future events according to the probabilities given by the Born rule is the most rational approach.

This has a certain subjective character, and thus, may worry some who think that physics should be concerned with objective truths about the world; but I think to the contrary, it's a step in the right direction, that however does not go quite far enough. The reason for this is, while a physical theory may well pertain to objective reality, what it ultimately must explain is our experience within that reality -- which is necessarily subjective. I have previously pointed to the example of a rainbow to illustrate this: there is no actual 'thing' corresponding to the rainbow in the outside world; it is entirely a product of how we perceive the world, and thus, in particular, is different for different observers. Nevertheless, a theory that doesn't explain rainbows would be incomplete.

Taking the Inside View

Thus, I believe the only way to fully understand quantum mechanics is to view it from the inside. A good starting point -- since our aim is still to deduce the apparent collapse of a superposition from the linear dynamics -- would be to investigate what a superposition looks like, if viewed from inside. What is it like to be superposed? How does it feel?

These may not be the questions science usually asks, but I believe they are necessary; ultimately, if science is to explain our experience, it must answer this kind of questions, since the way things feel, what things are like to us, are precisely what constitutes our experience.

To be able to make progress on this issue, however, we need a model for how we get to know our own state (of mind). How do we know how something feels to us? As I have previously argued, the most straightforward model for such introspection is just asking questions of yourself.

So, let us go back to our observer, observing a qubit (in order to avoid an unnecessary proliferation of terms, I will suppress the state of the measurement apparatus, and pretend the observer could somehow observe the qubit 'directly'). Let's first say the qubit is in the definite state |1⟩. The observer looks at the qubit, discovers its state, and then asks himself whether or not he got any definite result.

|Definite?⟩|looking⟩|1⟩ → |Definite?⟩|1!⟩|1⟩ → |Yes!⟩|1!⟩|1⟩

The observer is in the state |1!⟩ of having observed 1, and correctly concludes that he is in a definite state. The same works for the state |0⟩ of the qubit:

|Definite?⟩|looking⟩|0⟩ → |Definite?⟩|0!⟩|0⟩ → |Yes!⟩|0!⟩|0⟩

Now let's look at the case of a superposed qubit. The first step works just as before:

|Definite?⟩|looking⟩(|1⟩ + |0⟩) → |Definite?⟩(|1!⟩|1⟩ + |0!⟩|0⟩)

The observer enters into a superposition of observing 1 and observing 0. But what if he now asks himself: 'Have I observed a definite value of the qubit?', or equivalently: 'Am I in a definite state of observing a value of the qubit?' Because of the linearity of the dynamics, the following happens:

|Definite?⟩(|1!⟩|1⟩ + |0!⟩|0⟩) → |Yes!⟩|1!⟩|1⟩ + |Yes!⟩|0!⟩|0⟩

Even though the observer is 'in fact' in a superposed state, if he asks himself if he has observed a definite outcome, he will conclude that yes, he has -- he is in an eigenstate of experiencing a definite result, so to speak.

Now, the usual interpretation of this is that he 'mistakenly' believes himself to be in a definite state, since actually, he isn't. But it seems to me that this is a lot like mistakenly believing that one has a migraine -- it is just indistinguishable from the real thing, since in our minds, the subjective beliefs are the only real things we have (in particular, the apparent migraine would hurt just as much). So I would prefer to interpret this as leading to the emergence of the appearance of a definite experience (and thus, to a definite experience): even though 'underneath' the level of our access, everything is a chaotic muddle of superpositions, at a higher level, a few islands of definiteness stand out -- such as the invariable belief of experiencing a definite outcome. That ultimately, things are not really that way does not play a greater role than that ultimately, there is no rainbow out there.

Thus, being in a superposition feels exactly like being in a definite state; subjectively, i.e. with regards to our experience, there is nothing to tell them apart. If then the impression of being in a definite state is one of the hallmarks of an apparent collapse, the usual linear dynamics, viewed from the inside, produces exactly this.
However, this might seem as just a parlor trick at first brush. Surely, the mere impression of being in a definite state can't lead to the richness of experience, of determinate experience, we receive through our interaction with the world?
And yet, more or less this is actually what I wish to argue for. First, let us tease out some more consequences of this crazy idea (which David Albert, who introduced it in his book Quantum Mechanics and Experience under the name 'the bare theory', called 'amazingly cool').
One particular consequence of the occurrence of a collapse is that if I repeat the same measurement, I will with certainty get the same result. If we believe an actual collapse has occurred, this is easily explained: the system now actually is in the state it collapsed to, and thus, a repeated measurement simply re-detects that state. When a collapse dynamics is absent, though, this agreement requires explanation. In many-worlds theories, this explanation is provided by the stipulation that the observer is now in a certain world, or branch, associated with a specific measurement outcome, and thus, a specific state of the system; so again, a repeated measurement only confirms this fact.
However, in the bare theory, no collapse happens, and no worlds are split -- all we have is the linear dynamics, and consequently, physical systems will rarely be in an eigenstate of having a particular property. So at first, it seems that if the first measurement did not reveal a definite property of the system, the second measurement has no hope of repeating the result. But it is in fact easy to see, that if the observer asks themselves whether or not they got the same measurement result as before, the answer will unambiguously be 'yes': after two measurements on the same system, which started out in a superposition, the general state will be |1!⟩|1!⟩|1⟩ + |0!⟩|0!⟩|0⟩; so in fact, any observation corresponding to the question 'Are both measurement outcomes identical?' will return the answer that they indeed are -- however, there will in general not be a fact of the matter regarding what those outcomes were.
This argument can easily be extended to cover more complicated cases -- say, if the observer first measures a system in the (definite) state |1⟩, then another system in the state |0⟩, and finally, a system in the superposition |0⟩ + |1⟩, and is then asked (or asks himself) whether his measurement result agrees with either measurement undertaken before, he will claim that this is indeed the case -- i.e. he will report (and believe) that the measurement result he got in the third case will be equal to either one of |1⟩ or |0⟩. Thus, there is no subjective distinction between measurements carried out on systems in definite states versus measurements carried out on systems in superposition -- their results will seem just as 'definite' in both cases, however, in the latter case, there won't be any actual matter of fact regarding the measurement outcome. But the subjective appraisal of the outcomes of the three measurements -- either two times 1, once 0, or the other way around -- will thus agree with what is expected in quantum mechanics with the collapse dynamics.
More generally, one can show that for infinitely many measurements, any observer will tend towards being in an eigenstate of believing to have made measurements with statistics equal to those given by ordinary quantum mechanics (see for example John Barrett's The Quantum Mechanics of Minds and Worlds -- one of the best expositions of the problems and virtues of different Everettian interpretations, if not the best --, chapter 4). Of course, what exactly it means to believe something in the limit of infinite measurements is somewhat difficult to interpret -- not to presume too much about the reader's capabilities, but I am only capable of accomplishing distinctly finite tasks, and would therefore typically fail to have any definite belief about the statistics of my measurements at all.
Another critical question is, given two observers, whether they will agree on the measurements results they have obtained. However, that this must be so is in fact shown by the same argumentation as before, where now the two measurements are not to be interpreted as repetitions by a single observer, but rather, as distinct observations, undertaken by different experimentators. And once again, it is the case that, while there is no definite matter of fact regarding what the measurement outcomes were, nevertheless both observers will agree that their measurement records coincide.
The bare theory thus explains three things with regards to our experience: its definiteness, its continuity, and its intersubjective coherence. In other words, why, even though the world is typically in a severely superposed state, we nevertheless appear to have a definite experience; why this experience seems to be (more or less) the same from one moment to the next; and why my experience appears to agree with yours.
One thing that it crucially does not seem to explain is the fact that we don't merely have some definite experience, but that this experience has a well-defined content: I don't just experience undefined somethings, but concrete objects, out there in the world; I don't merely see spin up or down in any given experience, but either definitely spin up or spin down. As it says in the song, 'I see trees of green/ red roses, too', not 'I see [some definite things]/ [other definite things], too'!
Failure to explain this basic fact of our most immediate experience seems quite outrageous. But nevertheless, consider how things would appear if they only appeared to us as if our experience had a definite content: I would raise these very same complaints, as certainly, I would be convinced of the definite content of my experience of the world! In other words, if, in an experiment, I would get the result spin up or down, I would be utterly adamant that I did not, in fact, get this indefinite result, but a completely definite one -- while simultaneously failing to point definitely to either. Yet, this failure I would again be wholly ignorant about!
This is certainly a very strange view, but, to me at least, not without its charm. And later on, for those entirely too uncomfortable with this sort of picture, I will give an argument to somewhat ameliorate the consequences of this idea. But for now, let's talk about some of the problems the bare theory, despite its astounding successes, faces.

The Theory, Stripped
There are two common factors in almost all accounts of the bare theory that I am familiar with: 1) excitement about its 'amazingly cool' properties, and 2) its utter rejection as a resolution of the problems of quantum theory. There are several objections that are usually raised against the theory; I will only consider those two that I believe are most severe, for a good discussion of the rest, see again the already mentioned book by Barrett.
The first one is the accusation of empirical incoherence, and I want to come right out here and admit that I'm not exactly sure I understand it. Basically, the argument is that the reason for postulating, and accepting, quantum theory are the results of certain measurements we have made. But, in the bare theory, those measurements typically do not have any definite result at all; thus, they can't be sufficient for us to accept quantum theory, much less the bare theory reading of it.
This, to me, seems like an utter non-problem: while it is true that our measurements have no definite outcome, that fact itself, and how it comes to be that we nevertheless have a definite belief in their outcomes, may be taken as an empirical datum; and this datum is completely explained by the bare theory. The data that leads us to postulate quantum mechanics and the bare theory then is not the data created by measurements, but the data gained from our definite beliefs about these measurements -- to wit, that they have definite outcomes.
The second objection strikes me as more serious: if the bare theory is true, then typically, one would not expect the world to be in a state in which any given observer is conscious and ready to undertake some certain measurement. Rather, the typical state of the world would consist of an enormous superposition of many possible states for any observer, where he may either be asleep, or distracted when he intended to read the measurement result, or be home sick, or maybe even not exist at all.
Similarly, any given experiment does not have a neat, clean outcome as we have previously supposed, but typically, between the experiment yielding (or not) any certain outcome, there is also the possibility that the experiment may fail to work correctly, or blow up, or that a meteor strikes the lab, obliterating both the experiment and the poor experimentalist conducting it -- such that, after any given interaction, the observer is not merely in a superposition of having gotten one result or another, but also of not having gotten any result at all, or even of having died in the process. In which case, he can hardly 'ask himself' afterwards whether or not he has gotten a definite result -- the question does not make any sense if asked of a pile of meat scraps.
So, concretely, if the observer is after the experiment in a state of |0!⟩ + |1!⟩ + |Blown to bits!⟩, then acts on that with |Definite?⟩, we get the following evolution:

|Definite?⟩(|0!⟩ + |1!⟩ + |Blown to bits!⟩) → |Yes!⟩|0!⟩ + |Yes!⟩|1!⟩ + |Huh?!⟩|Blown to bits!⟩),

and consequently, the observer would fail to be in an eigenstate of 'believing to have made a definite observation', and, by the eigenvalue-eigenstate link, thus not have this belief. So, the bare theory apparently does not account for definite beliefs after all!
At first sight, this objection seems quite damning -- all the bare theory's 'amazingly cool' properties go out of the window, once one starts considering even slightly more realistic cases.
However, I believe that this argument is the relapse into a laboriously exorcised notion: that of the special nature of the observer in quantum mechanics, where observer here means 'human' or even 'conscious human'. In the above, we have assumed that it makes no sense to ask of the environment whether it is in a definite state, or has gotten a definite result. But ultimately, the environment -- broadly defined as anything that has a chance to enter in the above superposition -- is simply an observer, too. Or, the other way around, a human, conscious observer's belief is ultimately just a physical thing, a certain configuration of a physical system, i.e. the brain, as well -- just that some of these configurations correspond to states in which a certain person has a certain belief does not change anything about that.
So, after any given interaction, there exists a certain system -- a 'meta-observer' -- of which one can 'ask' the question whether it is in a definite state; and this whole system will then 'answer' in the affirmative. Only a subset of this system can meaningfully be considered as identifiable with the original experimenter; but to this subset, the 'yes' will mean that it is in a definite belief-state of having performed a measurement, and gotten a certain result. Only a part of the superposition of the 'meta-observer' can be regarded as having beliefs; but to that part, those beliefs appear definite.
In a certain sense, this seems to invite some of the many worlds back in through the backdoor -- and one could view it like that. For instance, both the observer experiencing his apparently definite measurement result and his being blown to bits would seem to have to be regarded as equally real, and certainly, one has difficulty imagining both being real in the same world. However, the number of such worlds is greatly reduced: rather than there being multiple 'copies' of the observer, experiencing each possible outcome in separate worlds, there is only one observer, in one world. Also, the number of 'worlds' is no absolute quantity, but depends on the resolution with which you view the system: just as there is only one observer, there is only one laboratory containing the observer, but on this level, several of the 'worlds' on the observer level -- those in which, for instance, the observer had a heart attack, but the laboratory as a whole was not damaged -- are now unified. The worlds depend on what is taken to be constant across them -- the existence of the observer, or the existence of the laboratory. But ultimately, at the level of the universe, there is only one single world; so it is questionable how justified talking about 'different worlds' is in this case.
There is one more interesting aspect to this idea of 'constancy across branches'. This involves the stability of meaningful information -- where 'meaningful' here just means being of a certain form for a certain reason. A key is of a certain form, because only this form opens the lock it is made to open. The key does something with the lock, and thus, it has meaning for the lock, and that meaning lies in the information about its form (a description of this form, sufficiently precise, would enable its receiver to construct an equivalent key).
Such meaningful information tends to be the same across many branches, whereas random sequences typically vary strongly. In his book The Fabric of Reality (where he lobbies strongly for a many-worlds view of quantum mechanics), David Deutsch uses the example of coding vs. non-coding ('junk') sequences of DNA: a gene that is necessary for the functioning of an organism, say one which determines insulin production, is likely to be the same across many different branches, as changes, i.e. mutations, typically will be disadvantageous to the organism carrying it, leading to them being selected against. But junk sequences of DNA are copied independently of their utility, and thus, any change to them will typically have no effect on an organism's reproductive fitness.
If thus any species in our 'branch' carried a coding DNA sequence that happens to be identical to a non-coding one, in other branches, the coding sequence typically will be the same -- being necessary for the species' presence in the first place -- while the non-coding one may vary wildly.

Know Thyself
Of the features of an apparent collapse that the bare theory provides, so far we have not accounted for the entropy a measurement with definite outcome generates. Since, on the bare theory, measurements do not in fact have definite outcomes, it might seem that it can't possibly reproduce this aspect. But one can amend the theory easily to account for this.
In order to do that, recall the argument I have introduced in the previous post: a system, consisting of two entangled sub-systems, may be in a pure state, and consequently, have zero entropy. However, each of the sub-systems regarded on its own has a nonzero entropy, because in regarding only this system, one effectively discards the information contained in the correlations between the systems; and of course, hidden information always means entropy.
In fact, the amount of entropy is a measure for the amount of entanglement between the two sub-systems: the more entangled both are, the higher the entropy of each.
But now consider that measurement, i.e. the acquiring of information about a system (the object) by another system (the observer), is a physical process -- both systems must interact in order for it to take place. And in this interaction, entanglement is created -- indeed, entanglement can be viewed as the information about the total system not contained in either sub-system.
A minor digression. There is a certain controversy regarding whether the wave function, i.e. the mathematical object used to represent the state of a physical system, describes a real, physical object, or merely the knowledge of an observer (that he uses to predict certain experimental outcomes, etc.) -- in the jargon, whether it is ontic or epistemic in nature. Considerations like the above show, in my opinion, that there is not much of a difference between the two. Certainly, information is physical; the brain of an observer having some knowledge thus is physically different from the brain of an observer lacking that knowledge. But this physical difference must have been acquired by the interaction with a physical system -- the quantum system under study (perhaps via appropriate intermediaries). So if this brain contains, in its physical configuration, the knowledge of the state of some quantum system, encoded in a wave function, and this wave function is in fact a complete specification of the system, then this information must have been both physically present in the system, and it must encapsulate the whole of the system -- but then, it is in one way or another identical to the system (or at least the observable part thereof).
As an analogy, one might think of a footprint in mud: the mud here being the observer's brain, while the foot is the quantum system (after all, feet are also colloquially called 'Quanten' in German...). After an interaction, the mud contains knowledge of the foot in the form of its imprint -- this form is physical, as is the altered state of an observer's brain after an interaction with a quantum system. By making a plaster cast, the form of the foot can be completely recovered. Of course, it is always possible that there might be a hidden reality beyond the footprint: such as the person the foot was (presumably) attached to. But this would only correspond to unobservable parts of reality.
Also, the wave function may be epistemic in the sense a probability distribution on phase space is: it may represent our ignorance regarding a more fundamental, ontic layer. The information about such a distribution is not contained in a single system; consequently, it can only exist in the brain after many interactions with physical systems. And indeed, one single throw of a coin does not tell you whether it is fair or biased. Regarding this line of explanation, however, a recent result by Pusey, Barrett, and Rudolph appears to rule out such a possibility (see this excellent explanation on Matt Leifer's blog).
So we see that, since information is physical, there is no clean break between 'epistemic' and 'ontic' views of the wave function; having or not having information about some physical system means being in a different physical state, and if we believe in the causal closure of the observable universe, then physical state transitions can only be effected by interactions between physical systems.
In order to acquire information about a quantum system, the observer then has to interact with it, and this interaction generates entanglement. One can thus no longer describe the observer and the object system separately, but must consider them both part of a larger, entangled system.
However, if we now take the point of view of the observer, this must mean that the description of the object system is, after measurement, no longer complete -- it acquires entropy, as a part of its information is now stored in the correlations to the observer. This gives us an origin for the entropy production in the apparent collapse process.
While the total system of observer and object thus may be in a zero-entropy state, and evolve without picking up any entropy, and thus, according to the linear dynamics as required, subjectively, it will look to the observer as if the system he measures picks up entropy in the course of measurement, merely as a result of increasing correlations between him and the measured system, and his own ignorance about the total state.
But is this actually true? Could the observer not somehow possess perfect self-knowledge, and thus, perfect knowledge of the complete system?
The startling answer to this question is: no, it is impossible for an observer to acquire perfect knowledge about any system he himself is part of (and thus, in particular about the system composed of himself and the object he measures). This is the essential content of a theorem due to Maria Dalla Chiara, elaborated upon by Thomas Breuer.
Interestingly, this is not a consequence of quantum-mechanical weirdness: the result exists just as well for entirely classical theories (though one could interpret it as implying that there are no entirely classical theories, if by 'classical theory' one means a theory in which it is in principle possible to acquire perfect knowledge about every observable). Essentially, it follows from the assumption of a theory's universal validity: if the theory applies equally well to observer and observed, then one necessarily encounters the problems of self-reference. In fact, it is essentially a Gödelian (diagonalization) argument by which it follows that an observer can't distinguish all states of a system he himself is part of.
This also poses a restriction on the thought experiment known as Laplace's demon: a sufficiently powerful intellect, in possession of complete knowledge and equipped with perfect reasoning skills could, in a deterministic universe, predict the future exactly from the current state of the world. But as we see now, such complete knowledge is impossible -- is, in fact, logically contradictory: since the demon must be part of the world to acquire information about it -- information is physical --, it is due to the above considerations impossible that he could perfectly know the state of the world. This introduces an apparent indeterminism, in the form of the demon's inability to make perfect predictions, into the theory.
Bringing it all together, now, we seem to have much of what we want from a theory in which an apparent collapse is realized entirely within the linear dynamics: the bare theory explains our apparently determinate experience, its continuity, and the agreement between different observers, while the entropy production is taken care of by the impossibility of perfect state self-knowledge.
I also want to re-iterate that I do not think of the bare theory as deceptive, as it is generally portrayed in the literature. The general point of view is that this way of thinking suggests that we are deceived into believing we have definite experience, while in fact, we typically don't. But being in a superposed state does not mean that any given outcome does not occur, and neither does it mean that it does occur -- it merely means that there is no fact of the matter regarding which outcome occurs. So it may very well simply be true that a definite outcome occurs, while there is just no fact determining which outcome occurs (perhaps this kind of 'ω-inconsistency' is just the price one has to pay for insisting our experience is complete, i.e. always definite...). Indeed, in a world with limited information content, this does not seem such a strange proposal, at least not to me.
Besides, we do have definite experience: the experience of having a definite experience is definite, regardless of whether the 'underlying' experience is. Again, all there is to experiences is how they seem to us; it simply makes no sense to claim that we are 'deceived about our experiences'. We experience what we experience, and the bare theory provides a mechanism for definite experience to emerge out of the indefinite quantum world.
It is thus a lot like the way I have argued definite laws emerge from indefinite, random fundamental dynamics. If we take a random bit string, it is utterly impossible to predict whether the next digit will be 1 or 0 -- it is completely lawless. Nevertheless, moving up a level, for any given bit string, I can predict the relative ratios of 1s and 0s, with a reliability that increases with the bit string's length. This lawfulness is not imposed; rather, it emerges directly from the more fundamental lack of laws.
In a similar sense, the definiteness of our experience emerges from the indefinite nature of quantum mechanics. There is no definiteness to the question whether or not we saw a 1 or a 0 as a result to a spin experiment on a superposed particle; but our experience of a definite result is definite. Indeed, one may take this as implying that without an observer, there is nothing to be observed; the observer and the observed are two sides of the same coin, the result of some specific way to slice the quantum cake.
Yet still, two questions seem to loom large: one is to give a satisfactory account for the probabilities encountered in quantum mechanics; the other is the apparent discrepancy between our experience of a definite phenomenal content, and the bare theory's prediction of an essentially 'contentless' phenomenology -- there is something definite, yet no further fact as to what, exactly, that is; but subjectively, these facts exist (and indeed, seem to be all that we have direct knowledge of).
I am not too sure the latter is actually a question. As I have already said, even if we did not have any definite phenomenal content, we would be asking the same questions: it would appear just as ludicrous to us to suggest that in fact, we don't have any definite experiences, while it is so immediately clear that we actually do.
Yet I can see that this argumentation, while perhaps satisfying on a certain level, leaves something visceral to be desired. Luckily, an idea due to Sven Aerts may provide an answer to both open problems.

B.Y.O.P. (Bring Your Own Phenomenology)
Aerts essentially takes to heart the lessons from Dalla Chiara's and Breuer's results, and thus, considers the outcome of a measurement as a function not merely of the measured system, but rather, of the state of the system composed of both the observer and the observed. With this in mind, he considers a procedure to arrive at an outcome for a measurement that minimizes the influence of the observer, such that the selected outcome is that outcome for which it is the most probable that it pertains to the system under study.
Consider the process of observation as an observer assigning to the system he observes a certain experimental outcome, based on both his state and the state of the system. This assigns the observer a more active, participatory role than on the usual accounts of observation: he chooses, rather than reveals, a measurement outcome. But this is only to be expected in regimes where the coupling between the observer and the system is no longer negligible, i.e. where it no longer can be assumed, as is done in classical physics, that the observer passively receives information broadcasted by the system.
If we now assume that the observer chooses the measurement outcome in such a way as to be maximally certain that the outcome pertains to the system -- Aerts calls such an observer 'Bayes-optimal' -- then one can show that the usual quantum outcomes and statistics are recovered. To such an observer, the world looks much like it would look to an observer in a collapse theory: definite outcomes with probabilities following the Born rule. This framework also provides a natural explanation for the origin of the probabilities: they quantify the observer's ignorance -- but not about the system he observes, but the irreducible ignorance about his own state.
I'm still not entirely convinced that this last addendum is strictly necessary to derive the appearance of our experience from the quantum formalism; however, those (understandably) uncomfortable with the notion of a definite-but-contentless experience may take recourse to this framework in order to justify their apparently definite experiences.
This is then the closest I can come to providing an answer to the question of how our definite, macroscopic world emerges from the quantum dynamics. The bare theory, and similarly, the Gödelian impossibility of perfect state self-knowledge, ensure that only a part of the quantum world is accessible to any observer; in this way, the appearance of a definite, repeatable, and communicable experience emerges. It is just the linear quantum dynamics that is necessary to account for our experience; we neither have to invent selection rules that break the quantum evolution to comfort us with an objective reality, nor do we have to postulate the existence of a plethora of worlds, populated with slightly different copies of each and every one of us. The observer arises, along with the appearance of the observed, out of itself from the quantum realm, just as regular, lawful behavior emerges from fundamental randomness.
I find this view to be immensely satisfying.

Untangling Entanglement

2011-12-29T07:25:00.000-08:00

What to Feynman was interference (see the previous post), to Erwin Schrödinger (he of the cat) was the phenomenon known as entanglement: the 'essence' of quantum mechanics. Entanglement is often portrayed as one of the most outlandish features of quantum mechanics: the seemingly preposterous notion that the outcome of a measurement conducted over here can instantaneously influence the outcome of a measurement carried out way over there.
Indeed, Albert Einstein himself was so taken aback by this consequence of quantum mechanics (a theory which, after all, he helped to create), that he derided it as 'spooky' action at a distance, and never fully accepted it in his lifetime.
However, viewing quantum mechanics as a simple generalization of probability theory, which we adopt in order to deal with complementary propositions that arise when not all possible properties of a system are simultaneously decidable, quantum entanglement may be unmasked as not really that strange after all, but in fact a natural consequence of the limited information content of quantum systems. In brief, quantum entanglement does not qualitatively differ from classical correlation; however, the amount of information carried by the correlation exceeds the bounds imposed by classical probability theory.

Entanglement is deeply related to another, allegedly uniquely quantum property, known as superposition. Simply put, superposition is the capacity of a quantum system to exist in an arbitrary combination of states, rather than, like a classical system, in only one definite state at any given time. Superposition, in fact, follows very directly from the picture of quantum-mechanics-as-probability-theory, and it's there that we start today's foray into quantum weirdness.
Afterwards, we're going to think about correlations -- the question of when and why knowledge about one thing or event carries with itself knowledge about another thing or event. These can be explained fully within classical probability theory, which has the advantage of allowing intuitive examples; as already hinted at, the quantum case will not introduce any great conceptual revolutions here. Together with the concept of superposition, this will allow us to arrive at a simple, yet powerful picture of quantum entanglement, which essentially will amount to realizing that in the quantum case, unlike the classical one, all the information in a system may be contained entirely in the system's internal correlations.

Superposition: Indecision in the Quantum World
As usual, we will start by focusing our attention on the simplest system imaginable: a bit, which can be in either of two classical states. If the precise state of the bit is unknown, we can consider it to be in a state of 'classical superposition': a measurement of the bit might yield either state with some probability. However, note that the bit is, 'underneath' our ignorance, all the time really in a definite state, which we just don't happen to know! This is one crucial difference to quantum theory, where, as we have already seen, probabilities are irreducible (to briefly recap, this is due to the fact that any system with a finite information content, once one has exhausted the number of simultaneously decidable propositions, i.e. once one has asked it as many questions as it can answer at any one time, must answer perfectly randomly to each further question, as anything else would amount to extracting extra information -- which, however, just isn't there).
The other main difference is related to the fact that in the classical theory, probabilities are real, positive numbers, obeying the constraint that their sum must equal one -- i.e. if the bit is in the state 0 with a probability of 70%, it follows that it must be in the state 1 with a probability of 30% -- while the 'probabilities', i.e. the amplitudes, of the quantum theory are complex numbers, whose squares sum to one.
This means that a quantum bit has a state space that is much greater than just the two possibilities the classical theory offers: the set of all pairs of complex numbers α and β such that |α|² + |β|² = 1, which can be represented as a three dimensional unit sphere. (If you recall, in the last post, I introduced complex numbers as rotations, so if one complex number is a rotation around one axis, two complex numbers suffice to characterize rotations around two axes, and any given point on the surface of a three dimensional sphere is related to any other by two rotations -- around latitude and longitude, for example.)
This representation of the state space of a qubit is known as the Bloch sphere, and you'll recognize it from the top of this and prior posts:

Fig. 1: The Bloch sphere. (Image credit: wikipedia.)

Perhaps this is easier understood by considering first the state space of a 'real' bit, whose 'probabilities' are irreducible, can adopt arbitrary real values between -1 and 1 (negative numbers to allow for interference, see the previous post), and obey the constraint a² + b² = 1. This is nothing but a unit circle:

Fig. 2: State space of a 'real bit'.

Complex numbers, despite in a sense being equal to two real numbers each, only add one more degree of freedom to the state space, making it into a two-dimensional surface, because, while each complex number comes with its own phase, only relative phases are physically relevant, eliminating one degree of freedom.
The quantum probability theory thus leads to a much richer description than the classical one does. While the classical bit can only ever be in either of the states denoted |0⟩ and |1⟩, the qubit can be in arbitrary combinations with complex coefficients α and β, so that the general qubit state has to be written as |ψ⟩ = α|0⟩ + β|1⟩. Again, while a similar notation in the classical case only denotes uncertainty about the true state, in the quantum case, because of the irreducibility of quantum probabilities, there is no more fundamental 'true' state.
To drive this point home, consider that every qubit can only ever yield one classical bit of information upon measurement; that's all there is. Now say we have 'exhausted' this bit by a measurement along the x-axis in the picture, and have obtained the state |0⟩. If we now make another measurement along either of the other axes, there is no information 'left' in the qubit to determine the measurement outcome -- it must be perfectly random, i.e. both possible measurement outcomes must be equally likely. With respect to the other two directions, the qubit thus must be in an equal superposition of both possible outcomes, i.e. it must be the case that
|α|² = |β|² = 0.5: the superposition is 'real' in this sense, as opposed to the classical case, where it can only ever be apparent, the bit having a 'real' classical state underneath, that just happens to be unknown.
One thing is common to both the classical and the quantum case: whenever we perform a measurement, we will only find a definite state, both for the classical and the quantum bit. It is as if the quantum state suddenly forgets about all its beautiful extra structure in order to give us something our classical brains can more easily deal with. This appearance has led to the notion of 'wave-function collapse', and its occurrence (or the appearance thereof) is known as the measurement problem, which we will not yet tackle, but file under 'uncomfortable and unresolved' for the time being.
To sum up, superposition is thus a feature of the quantum probability theory, in which the system does not contain enough information to decide which of a number of possible states it is in, and thus, must be considered to not be definitely in either; only upon measurement does this superposition 'collapse', and a definite state emerge (at the expense of indefiniteness elsewhere). This straightforwardly generalizes from qubits to more complicated systems that have more states available to them.

Correlation: Information-Jelly Smeared on Spacetime-Bread
Two events A and B are correlated if, knowing that A occurred, you assign to the occurrence of B a different probability than if A hadn't occurred. So if I have two balls, one red, and one green, and two boxes, and put either ball into a box, shuffle them, and hand you one of the boxes, if you open your box, and find the red ball, you will assign to the possibility that I find the green ball once I open my box a probability of 100% -- the two events are perfectly correlated.
This already allows us to head off one frequent misconception about quantum entanglement, which is that it somehow may be used to transmit information faster than light. But ultimately, entanglement is just a correlation like the one above -- so while you instantly know what will happen when I open the box, even if you are so far away from me that a signal, travelling at the speed of light, never could transfer this information to you in time, it is nevertheless clear that this scheme can't be used to transfer any information. In order to do so, you would have to be able to determine in advance what ball you will find in your box, and thus, what ball I will find -- but, barring supernatural abilities, the possession of which probably would mean that you wouldn't have to rely on such a clumsy scheme to communicate instantaneously across great distances anyway, this is of course impossible.
But let's return to the issue of correlations. The example I have presented is one of perfect correlation, but less than perfect correlations are possible, as well. For instance, consider a case in which there are three balls, two of which are green, one of which is red. I randomly select two, and put them in one box each. The probability that in my box, there is a green ball, is about 67%; the probability of my box containing a red ball is 33%. Now, if you open your box, and find a red ball, you know that my box must contain a green ball; but if you find a green ball, you know that my box contains a red or a green ball with equal likelihood. Thus, you finding a green ball makes me finding a red ball more likely -- the probability jumps from 33% to 50 %. The two events are correlated, but it's not the case that whenever you find a green ball, I find a red one; but if you find a green ball, I find a red one more often than if you don't find a green ball.
Probabilities change with the acquisition of new information. Thus, in order for the probability you assign to the content of my box to change, opening your box must have given you information about my box. But how can there be information about my box in your box? Your box contains one bit of information: the color of your ball is either green or red, and with finding out its color, that one bit is exhausted, and it hasn't told you anything about my box. In principle, any color of your ball is compatible with any color of my ball; there's no fundamental law of nature that prohibits arrangements of two red, or two green balls.
But, in the case of perfect correlation, there is an extra bit of information in a proposition that does not pertain to either of the two boxes, but to the system of both boxes as a whole: the color of my ball is the opposite of your ball's color. This is a correlation, a bit of information not contained in either box, and it together with the color of your ball determines the state of the system completely: 'my ball is red' and 'your ball is green' is equivalent to 'my ball is the opposite color of your ball' and 'your ball is green', even though neither proposition pertains to the color of my ball alone. In the first case, the information is completely local; in the second case, the information 'in your box' is local, as well, but one bit of information is 'shared' between both your ball and mine, correlating their color; this bit thus does not encode a property of either box, but expresses a relation between them.
This does not surprise us: we have already learned that information is a relational entity. A sea of indistinguishable red balls can't be used to encode any information; only if one ball differs from the other, if a distinction is introduced (say, one ball is painted green), can the system be used to represent information, and then as many bits as there are distinctions. Properties that are the same across the 'sea' are invisible in a sense: there is no observable difference between the indistinguishable red balls, and a similar sea of red sports cars -- though the latter have more 'structure', they nevertheless don't contain more information. The extra structure forms an unobservable background, until, through the introduction of a distinction, a property is made observable. An object thus only has properties with respect to other objects that lack them; thus, its information does not inhere in itself, but in the relations between it and its environment. From this point of view, as we will see, quantum entanglement is natural; it is in fact its absence in the classical theory that will need explanation!
One question that still needs to be addressed is: when are two systems correlated? In the thought experiment, the correlation arises through the existence of two balls of different color, and through me putting each in one of the boxes. That is, the balls' colors have a common origin. In general, two systems, in order to become correlated, must have interacted in some way -- and conversely, generally, when two systems have interacted with one another, they will be correlated. This only makes sense: if the systems had never interacted, neither system would have knowledge of the other; thus, it can't be possible to extract information about one system through observations made on the other, since it simply does not contain such information. But correlations between the systems would allow this -- thus, there can't be any correlations.
A word of caution here: even though one ball's color allows us to infer the other's, it's strictly speaking wrong to say that because your ball is, say, green, mine is red -- both have their respective colors because I put them into the boxes, and arranged things that way. This is the essence of the old (but true) adage that 'correlation does not equal causation'. Just because the occurrence of event A means that one should expect event B to occur with greater likelihood, does not mean that A causes B in any way; there could, for instance, be a third event, C, which is the cause for both A and B -- in the thought experiment, that would be me putting the balls into the boxes.
A classic example for such a faulty inference was the conclusion that the fields emitted by power lines cause poor health: it was observed that, in the neighborhood of power lines, the general health was lower than the population average. Thus, 'living near power lines' and 'having poor health' are correlated. However, the reason is simply that the cost of living near power lines is lower -- power lines are ugly things, and thus, their presence devalues property. So people living near power lines tend to be somewhat poorer, simply because they can more easily afford living there. But less wealth means less health care, which means poorer health on average. Thus, 'lack of money' is correlated with both 'living near power lines' and 'having poor health'; and in this case, the correlation is actually a causative one.

Entanglement: Show Us Your Bits
We have seen that the system made from your box and mine, each with their respective balls in them, is a system that contains two bits of (relevant) information; these two bits determine the color of each ball. They can be 'distributed' in two different ways: they can either be both local, if the system is described by the two propositions 'your ball is green' and 'my ball is red', or one can be non-local, distributed between both boxes in the form of a correlation, corresponding to propositions such as 'your ball is green' and 'my ball is the opposite color of yours'. Clearly, both descriptions are completely equivalent: they uniquely determine the color of each ball.
But, there seems to be a third option looming, which is anticipated by the relational nature of information: can both bits be distributed non-locally?
Classically, this is not possible. We will discuss why in more detail later, but for the moment, think of it in the following way: a classical system contains, for all practical purposes, infinitely much information. So if we tried to 'remove' the one local bit that determines the truth value of the proposition 'your ball is green', we would find that this proposition is still true -- the bit is still there, because the classical ball contains an infinite reservoir. And the bit we've removed then doesn't help if we put it into the correlation, as the whole system is already exactly determined; it can't tell us anything we don't already know.
In quantum mechanics, however, the amount of information in any given system is finite, and a system of two 'quantum balls' ('quballs') would actually only contain two bits of information. So, what would happen if we removed the one remaining local bit in this case?
Well, as we have discussed above, this means that the state of your quball would then no longer be determined -- there would be no information 'left' to determine the truth value of the proposition 'your quball is green'. It would thus enter a superposed state, red and green with equal probability. But if your quball is in a superposed state, and my quball's color is determined by the proposition 'my quball is the opposite color of your quball', then this must mean that my quball is in a superposition, as well.
And this is then the origin of the basic phenomenology of entanglement: once you open your box, you will observe a ball of a definite color, as the superposition 'collapses'; instantaneously, you know the color of my ball, as well, thanks to the correlation existing between both. Given the relational nature of information, together with the realization that there is only a finite amount of information in any given quantum system, there is nothing mysterious about this; the correlation between both balls, in particular, is not qualitatively different from the correlation in the classical case.
However, there is an added snag in the quantum case, which stems from the fact that the qubit has such a large state space. What I've called 'quballs' aren't really qubits -- they're a sort of hybrid quantum/classical chimera, which has the characteristically finite information, but otherwise has been treated using ordinary probability theory, essentially. Concretely, quballs were assumed to only have one property with regard to which they may differ: their color. Hence, the bit we 'pulled out' of the local description of your quball essentially was left dangling, and did not contribute to the phenomenology anymore; the situation was completely described by the correlation and the random color you observe once you open the box.
However, essentially because of the complex nature of quantum probability amplitudes, things are different if you consider proper qubits. The value of a qubit can be measured along three orthogonal axes, the x-, y-, and z-directions in the Bloch sphere picture above. Nevertheless, the qubit only contains one bit of information; as you recall, this means that if, say, its value along the z-axis (for historical reasons, one often uses the word 'spin' in this context, because electrons having a certain spin were among the first examples of two-level quantum systems, i.e. qubits; but all such systems are isomorphic, i.e. have the same description, so one can talk 'as if' one were considering electron spins for concreteness) is absolutely determined, its spin along the x- and y-axes must be completely undetermined, i.e. it is in a superposed state with respect to these directions, these particular properties.
Thus, if, in the quantum case, only one bit were contained in the correlations, the behavior of the second qubit would still be insufficiently determined -- only one direction of spin would be correlated. So if you 'opened your box', i.e. did a measurement in order to determine the value of the spin of your qubit along, say, the x-axis, it may be the case that my qubit's spin is perfectly correlated, while, if you chose to measure along the y-axis, my qubit's spin along that axis might not be correlated with yours at all. But luckily, we still have that second bit left over, which is sufficient to fix this problem: both bits now encode the truth values of the propositions 'my qubit's spin along the x-axis is opposite to your qubit's spin along the x-axis' and 'my qubit's spin along the y-axis is opposite to your qubit's spin along the y-axis'. Note also that the third direction, the spin along the z-axis, is not independent of the spins along the x- and y-axes; this can be understood by realizing that, in the case of a single qubit, the indefiniteness of the spin along these axes implies the definiteness of the spin along the z-axis -- the bit of information associated with the qubit has to be somewhere. In short, the two bits of correlations between the two qubits suffice to fix my measurement outcome, if the outcome of your measurement is known. This is discussed in more detail and rigor in the paper The Essence of Entanglement by Caslav Brukner, Marek Zukowski, and Anton Zeilinger, which was the major inspiration for the point of view I have so far presented.
So we see that the difference between the quantum and classical case is merely that in the quantum case, all the information about the system can be in the correlations between the parts of the system, while in the classical case, every single part of the system contains at least the information necessary to define its own state.
It's perhaps useful to recapitulate the possibilities of correlation between two systems. To this end, let's look at some pictures. This is the case of two uncorrelated one-bit systems:

Fig. 3: Uncorrelated systems.

The dots stand for the balls, quballs, or qubits, and the lines signify correlations. As you can see, in this case, every system is correlated only with itself; thus, every system only encodes the truth value of one proposition, i.e. one bit of information. For instance, if the left one stands for 'your ball', the line might be the proposition 'your ball is green', and if the right one is 'my ball', the line might signify 'my ball is red'.
Now, we introduce a correlation between the two parts.

Fig. 4: Classical correlation.

The line connecting both parts now encodes a proposition that does not refer to either part alone, but to the system as a whole; thus, if again the left dot is your ball, the line starting and ending on it signifies 'your ball is green', while the line connecting both dots means 'my ball is the opposite color from your ball'. Note that the color of my ball on its own is now not completely determined; the color of your ball is a necessary input in order to fix it uniquely. This is as far as classical correlations go.

Fig. 5: Entanglement.

Going beyond this, we get a picture of quantum entanglement: each dot now represents a qubit, and the entire information about the system is contained in the correlations. Indeed, as we will see, this can be thought of as a characteristic of quantum systems. To describe this, David Mermin coined the phrase 'correlation without correlata', and based his 'Ithaca interpretation' of quantum mechanics on this point of view.
So again, we see that the only difference between the quantum and the classical case is the amount of correlation, not the kind; and because all (or most) of the information about the system is contained in the correlations, there is no information left to determine uniquely the state of each of the system's parts -- they must be in superposition. As Schrödinger put it in his essay The Present Situation in Quantum Mechanics:

"Maximal knowledge of a total system does not necessarily include total knowledge of all its parts, not even when these are fully separated from each other and at the moment are not inﬂuencing each other at all."

Entanglement and Entropy
Entropy, as we have learned, can be thought of as a measure for the information one lacks about a system. This suggests an immediate connection to entanglement: if there is information about the system contained in the correlations, then only having access to a part of the system may not allow you to access the full information about even that part -- thus, if you only have access to one of the dots in the pictures above, you may not have full information about even the one dot (cf. Schrödinger's quote above), meaning that it must be in a state of nonzero entropy (remember how the color of my ball above was only described by 'the color of my ball is the opposite of the color of yours'; without access to your ball, this obviously fails to uniquely specify the color of my ball).
However, the total system, composed of both dots and the correlations between them, may be in a completely known state, and thus, have zero entropy! Indeed, this is a characteristic of von Neumann entropy, which is the direct generalization of the concept of entropy in classical probability theory to quantum probability theory.
Due to quantum entanglement between two subsystems, blocking off access to one of them, even if they are spatially distant, may lead to a description of the other in which it has a certain entropy -- the reason for that simply being that the information about the subsystem is not entirely contained in the subsystem, but also in the correlations between both systems. Entropy, in this sense, can be seen as a measure of entanglement; from this point of view, it is easy to see that, if I 'hide' the left dot, the right dot must have the same entropy as the left dot would have, if the right dot was hidden -- as both are entangled to the same degree.
In quantum mechanics, a state of zero entropy is called a pure state, because it describes a quantum system which is known to be in some state with certainty. Conversely, a state with nonzero entropy is called mixed, as it can be regarded as a statistical mixture of pure states -- i.e. there is a set of states, each of which a quantum system might be in with a certain (classical!) probability. It is a feature of quantum mechanics that pure states only ever evolve into pure states, as quantum evolution is reversible, i.e. every process is possible to occur in a time-reversed version. This is the case because quantum evolution conserves information: if the state at a given time t contains a certain amount of information, at a later time t' it must still contain the same amount; but this means that knowing the state at time t is equivalent to knowing the state at time t', and vice versa.
More generally, quantum evolution always leaves the total amount of entropy constant. But in our everyday world, entropy rises constantly! Indeed, I have previously argued that this rise of entropy is one of the most fundamental laws of the universe, if not the most fundamental one. How can this apparent incompatibility be reconciled?
This is, in fact, the already mentioned measurement problem in disguise. The classic answer is that, during the measurement process, the wave-function collapses, which is a process that does not conserve information, and hence, leads to rising entropy and is not time-reversible. The reason is simply that before the collapse, there are many different states that may collapse to one and the same observed state; thus, knowing the observed state, it is not possible to reconstruct the state pre-collapse. We have lost information, and entropy has risen. The acceptance of this as a genuine, random process (which state to collapse into is chosen at random, which introduces the oft-quoted indeterminism into quantum mechanics) is essentially what the so-called Copenhagen interpretation of quantum mechanics is all about.
But it is possible, using our picture of entanglement as correlations, and of entropy as a measure of the correlatedness between two systems, to attack this issue from a different angle. For what the above discussion neglects is the observer: essentially, things are portrayed as if they could be viewed without interacting with the system, as if observers were non-corporeal spirits that had direct knowledge of physical reality. But this isn't so; in order to observe, we must interact, and the interaction between observers and physical systems is of exactly the same kind as the interaction between all other physical systems -- this is just the recognition that observers are physical, too.
Now, the observer in quantum mechanics is unfortunately often regarded with a certain mystical air, which leads to things like Deepak Chopra, What The Bleep Do We Know?!, and nonsense about how each observer chooses their own reality, and so on. In fact, observer effects are not magic -- they exist in a classical context just as well. If you want to know the voltage in a circuit, you have no choice but to wire an appropriate instrument into the circuit; this act, however, will influence the actual voltage. Similarly, observer-dependent 'realities' are quite possible without invoking quantum magic: when you look at a rainbow, you will see a different rainbow than I do -- you will observe it in a slightly different place, for one. But this is just simple optics.
It is in this spirit that we will treat observer effects in quantum mechanics. Thus, the observer is a physical system like any other, that interacts with the system she observes. And, as we have learned, interaction will in general lead to correlation, and moreover, to entanglement. But since the observer observes a system, which has now become a subsystem of an entangled observer-observed system, she will miss the correlations that exist between her and the system -- she observes only part of the system, after all, which, as we now know, may not contain the complete information about itself. Thus, the observer, having become entangled with the system she observes, will observe it in a state carrying nonzero entropy, even if it was in a pure state before.
This is not a contradiction with the information-conserving (which physicists call 'unitary') evolution of quantum mechanics: the complete system of observer and observed is still in a pure state; before the act of observation, such was each of its subsystems, but afterwards, that's no longer the case, as both subsystems have become entangled. So we see that the apparent non-unitarity is merely an artifact of looking only at a part of the system: the complete system, consisting of observer and observed, evolves unitarily. Nevertheless, the observer sees entropy production -- without the need of postulating any wave-function collapse, or similar hacks.
Note, however, that this does not solve the measurement problem completely. For one, while it shows a phenomenology that is certainly similar to that of wave-function collapse, it could be an entirely coincidental similarity -- collapse might need to occur anyway, perhaps because the scheme proposed here is unable to account for all the entropy production. Also, there is no way given to determine which state the wave function collapses to. These and other problems will be considered in a future post about the interpretation of quantum mechanics.
This point of view of the apparently non-unitary dynamics of measurement is also proposed by Carlo Rovelli in his relational interpretation of quantum mechanics. He further justifies the 'incomplete' picture the observer has of the dynamics of her interaction with the measured system in part through an appeal to arguments by Marisa Dalla Chiara and Thomas Breuer, which appeals to the problems inherent in logical self-reference in order to demonstrate that complete quantum mechanical self-measurement is impossible. See also the Stanford Encyclopedia of Philosophy article on relational quantum mechanics, sects. 4.1-2.
We are now in a position to answer the question, left over from above, of why classical objects can't become entangled (or are at least very hard to entangle), while for quantum objects, it is natural. First, let's review the reason of why it is natural for quantum objects to entangle easily. In the quantum realm, we approach the limit of information and interactions -- a typical quantum system contains only a few bits of information, and can be regarded as isolated from other quantum systems. We can thus see the relational nature of information more easily: a quantum object has its properties with respect to some other quantum object. In the case of entanglement, one qubit may have its properties relatively to another. But what about the case of a single qubit? Well, the view I want to propose is that this qubit has its properties relatively to the observer.
This goes as follows: a qubit in an indeterminate state is measured, and becomes through this act correlated with the observer. The observer now 'knows' the qubit's state; that knowledge is expressed by the correlation between the observer and the qubit. Hence, I put 'know' in apostrophes: the observer need not be human, or indeed a conscious being, in order for this description to hold -- indeed, this observer might be some mechanical measuring apparatus, indicating his 'observation' via the position of some pointer, for instance.
This correlation has the consequence that observing the observer -- the measuring apparatus -- tells you something about the state of the qubit; observing this meta-observer -- say, seeing you scribble down the value of the measurement -- in tuen tells me something about the qubit, and so on. The information about the qubit's state is distributed through a web of correlations. It 'leaks into the environment', as it is sometimes phrased.
But this web has the effect of 'fixing' the state; in order to bring the qubit again into an indeterminate state, it has to be 'untangled', isolated from the environment. This is possible in the quantum case, but classical systems are big -- they contain lots of information, and thus are correlated with the environment in lots of different ways. It is practically hopeless to undo all these correlations and perfectly isolate a classical system from the environment. Hence, it is not the case that the classical system consisting of the balls in the boxes is only a two-bit system -- for all practical purposes, both balls contain infinitely much information, and thus, undoing one correlation can't put a classical system into a superposed state; there are still an untold amount of correlations fixing the property 'my ball is green'.
In summary, we can say that quantum mechanics is really all about correlations -- because it is all about information, and information one system has about another is a correlation between those systems. The higher the correlations (with the environment or any other observer), the more information the system carries, the more its behavior will tend towards classicality. The picture is simple, but its explanatory power is huge; indeed, it goes far beyond what we have touched upon up to now.

A Bit of Holography
The concept of the holographic principle was proposed in response to Jacob Bekenstein's discovery that, contrary to prior belief, black holes must be extremely high-entropy objects; in fact, in conjunction with work done by Stephen Hawking, he could show that the entropy of a black hole is the maximum entropy any object of the same size can have, and that this entropy is proportional to the area of the black hole's horizon (the 'point of no return', beyond which nothing, not even light, can escape the black hole's gravitational pull). Because of the connection between entropy and information, this is interpreted as the information of everything that falls into a black hole being 'stored' on its horizon -- since the horizon is a two dimensional surface, but the things that had the misfortune of crossing it are three dimensional, this means that three dimensional information is stored on a two dimensional medium, whence the phenomenon got the name 'holography'.
What does this have to do with entanglement? Well, perhaps nothing. But perhaps, rather a lot: consider that what a horizon does is effectively hiding things from view. However, as we have seen, hiding things -- subsystems of an entangled system, for instance -- generically leads to entropy production. One can thus imagine a region of space enclosed in a sphere, which has the property of hiding everything behind its radius, and then compute the entropy 'generated' by this hiding of information. And indeed, the calculation has been done in 1993 by Mark Srednicki, who found that the thus produced entropy actually does scale with the area of the sphere, as is the case with black holes!
Actually, a simple argument will make this heuristically plausible. Consider a universe, filled with some gas, the atoms of which are randomly entangled among each other. Hiding a spherical volume behind a horizon produces entropy. Generally, one might expect this entropy to scale with the volume of the sphere. But recall that the situation is symmetrical -- the gas inside the sphere is entangled with the rest of the gas as much as the gas outside is with the gas inside; thus, if I were 'inside' the sphere, and hid the outside gas, it must have the same entropy. However, the gas inside the sphere obviously has a different volume from the gas outside the sphere. In fact, the only quantity the two have in common is the area of their boundary.
A picture will make this even clearer.

Fig. 6: Entanglement entropy.

In this picture, for simplicity, only one 'object' is assumed to be inside the sphere, which however is heavily correlated with its surroundings. For some correlations, I have indicated the correlated systems, while others I have just left dangling, assuming they go off somewhere into the environment. Also, I have shown the 'punctures' where a correlation crosses the imaginary horizon. The number of punctures indicates the number of correlations, so to somebody who can't see beyond the horizon, it appears as an object correlated with the environment in a way that depends on the number of links puncturing the horizon's surface.
This does not straightforwardly imply holography: it is clear that, in the picture, I could vary the size of the horizon, without varying the number of correlations, at least to some degree. Only if there were some connection between correlations and area could the proportionality be guaranteed -- this is a hint we should keep in mind. However, for a black hole, the size of the horizon is fixed by the amount of mass inside, such that the only way to have it grow is to throw more mass into it -- and the mass, being approximately classical, will itself be highly correlated with the environment, and thus, entail an increase in correlations.
Some more interesting hints come from string theory, out of the so-called black hole/qubit correspondence, due to Mike Duff and collaborators: there seems to exist a connection, at least in the mathematics, between entangled systems of qubits and so-called 'extremal' black holes, where extremal means that they have the property that their mass is equal to their charge (in appropriate units) (and since we're talking string theory here, there isn't just the one electric charge we're familiar with, but there's also magnetic charge, and due to supersymmetry, there are actually multiple different ones of each kind). There is much that is interesting about this unexpected relation -- in particular, for a certain such black hole, its entropy is given by a measure of entanglement between three qubits, the so-called 3-tangle!
But still, the identification of entanglement entropy and Bekenstein-Hawking entropy is somewhat controversial; nevertheless, the idea should be enough to make one wonder.

What is Quantum Mechanics?

2011-12-17T03:46:00.000-08:00

So far, I've told you a little about where I believe quantum theory comes from. To briefly recap, information-theoretic incompleteness, a feature of every universal system (where 'universal' is to be understood in the sense of 'computationally universal'), introduces the notion of complementarity. This can be interpreted as the impossibility for any physical system to answer more than finitely many questions about its state -- i.e. it furnishes an absolute restriction on the amount of information contained within any given system. From this, one gets to quantum theory via either a deformation of statistical mechanics (more accurately, Liouville mechanics, i.e. statistical mechanics in phase space), or, more abstractly, via introducing the possibility of complementary propositions into logic. In both cases, quantum mechanics emerges as a generalization of ordinary probability theory. Both points of view have their advantages -- the former is more intuitive, relying on little more than an understanding of the notions of position and momentum; while the abstractness of the latter, and especially its independence from the concepts of classical mechanics, highlights the fundamental nature of the theory: it is not merely an empirically adequate description of nature, but a necessary consequence of dealing with arbitrary systems of limited information content. For a third way of telling the story of quantum mechanics as a generalized probability theory see this lecture by Scott Aaronson, writer of the always-interesting Shtetl-Optimized.
But now, it's high time I tell you a little something about what, actually, this generalized theory of probability is, how it works, and what it tells us about the world we're living in. First, however, I'll tell you a little about the mathematics of waves, the concept of phase, and the phenomenon of interference.
Let's start with some pictures. This is a wave:

Fig. 1: A wave.

And this is another wave:

Fig. 2: Another wave.

A wave is characterized by two basic quantities: its amplitude, i.e. the height of the curve at a given point, and its frequency, i.e. the number of peaks or valleys over a given interval. As you can see, both the above waves agree with respect to those quantities. Nevertheless, there is a difference: one is shifted relatively to the other. To better appreciate this, let's put both in the same picture:

Fig. 3: Two waves.

The difference between two peaks, or equivalently between two valleys, is called a phase difference. These two waves are completely out of phase: where one has a peak, the other has a valley, and vice versa. This means, if both reflect the value of some physical quantity at a certain location, say sound pressure, or light intensity, or just water depth, the total value of that physical quantity is the sum of both waves -- in this case, the following:

Fig. 3: No wave.

Since every value at every point is added to an equal, but oppositely signed value, the complete result is zero. This phenomenon is known as destructive interference: two waves cancel each other out, if they are equivalent in magnitude, but opposite in phase. More generally, waves of differing magnitudes and phases can yield complicated results when they are added. For instance, consider these three waves:

Fig. 5: Three waves.

If allowed to interfere -- if superimposed or brought into superposition --, they give rise to this 'wave':

Fig. 6: New wave.

It is not so easy to glean the original three waves just from this graph! (However, there exists a mathematical technique, called Fourier analysis, to do precisely that -- even for superpositions of infinitely many waves.)
A useful way to think about waves and their relative phases is in terms of circular motion. Imagine a particle moving around a circle at constant speed. The projection of its motion onto a plane graphed out over time will be a simple wave, and the position of the particle at any given time corresponds to the height of the graph at a corresponding point.

Fig. 7: A phasor. (Image credit: wikipedia.)

This kind of entity, i.e. the graphical representation of a wave as an arrow pointing to a position on a circle, is known as a phasor (no relation to the similarly-named device from the Star Trek franchise, unfortunately). The name comes from the fact that it gives us complete information about the phase, amplitude (through the length of the arrow) and frequency (through the speed of its rotation) of a wave; thus, the phasors of two waves suffice to determine their interference.

Fig. 8: Interference as a sum of phasors. (Image credit: wikipedia.)

The wave denoted y(t) is the result of the interference of the waves y₁(t) and y₂(t), arrived at by adding the vectors of their individual phasors (which I will from now on only refer to as phases, since the conceptual differences are immaterial for our purposes). This addition of phases proceeds essentially by adjoining the arrows, i.e. placing the 'bottom' of one arrow at the 'tip' of the other, keeping the directions fixed.

Wave or particle, or is that the wrong question to ask?
After these preliminary remarks, let's now move on to some physics. It is somewhat customary to introduce quantum theory by considering its historical development -- the explanation for the anomaly in black body spectra by Planck, the application of Planck's idea to the photoeffect by Einstein, Bohr's model of the atom, and so on. However, since this historical development is necessarily somewhat convoluted, involving many false starts and almost-right ideas, this regrettably often leads to a convoluted and almost-right picture of quantum theory. I will thus only consider one single empirical ingredient, and build up the rest through judicious cherry picking of ideas and concepts in a manner that to me appears most logical and clean-cut.
The empirical ingredient I want to consider is the classical double slit experiment. Historically, there had been dissent between two theories on the nature of light: Newton, influenced by Pierre Gassendi, considered light to be made out of particles, or corpuscels ('small bodies'), while Descartes, Hooke, Huygens and others considered light to be a wave, in analogy to sound or water.
As we now know, waves have the unique property of being capable of interference. So it is a natural question to ask whether one can devise an experiment utilizing this effect in order to decide which theory is the right one (interestingly, Newton already had made observations in favor of a wave model of light, in the form of what today are called Newton rings, but had nevertheless held on to his corpuscular model). This turns out to be possible, and at the beginning of the 19th century, Thomas Young carried out an experiment in which he shone light onto a plate with two parallel slits in it, theorizing that if light were a wave, circular waves emanating from the slits would lead to a characteristic interference pattern, while particular light would only illuminate two straight lines as images of the slits, the same way baseballs thrown through two openings would only hit the wall at the points straight behind them.
To understand this, we must now look at two-dimensional waves instead of one-dimensional ones; fortunately, this does not cause any serious new difficulties.

Fig. 9: Young's double slit experiment. (Image credit: wikipedia.)

From each of the two slits to some given point on the wall, a wave will undergo a number of oscillations depending on the distance; thus, the wave's amplitude, relative to a wave arriving at the wall from the other slit, will be a function of this distance. As we have learned, at this point, we must then add the amplitudes, and careful reasoning shows that the waves will reinforce at some points, and cancel at others, leading to the characteristic dark-bright-dark pattern in the figure.
When Young carried out his experiment, this pattern was indeed what he saw -- thus, or so it seemed, he had shown that light is indeed a wave, not made out of particles.
However, let's fast forward to modern times, where we have light sources whose intensity can be very accurately tuned. According to the wave model of light, if we successively dim the lights, what we ought to see is a gradual weakening of the interference pattern, until it becomes too faint to observe (we can use various tricks, such as a prolonged exposure on a photographic plate, in order to prolong this time). And at first, this is indeed what we see. But at some point, instead of the continuous, uniform pattern we expect, we see discrete points of illumination, popping up randomly on our photoplate. If we let this build up for a while, we'll get something like this:

Fig. 10: Long-term exposure with a weak source. (Image credit: wikipedia.)

This is very much un-wavelike behavior -- in fact, it looks a lot like particle impacts. Nevertheless, the interference pattern persists! However, if we decide to be clever and catch whatever it is that goes through the slits in flagrante delicto, by positioning a detector at each slit, to see if either slit is traversed only by one entity -- as would be expected from particles --, or if both slits are traversed at the same time -- as a wave would do --, yet another surprising thing happens: we do only ever have one detection -- one 'click', as it is still called, though modern detectors rarely click anymore -- at each slit -- but the interference pattern goes away, and instead, we only observe two illuminated areas, as would indeed by expected from particles!
This is the origin of what is often called wave-particle duality: sometimes, light seems to behave like a wave, interference and all, while at other times -- especially if we try to sneak a peak at what goes on behind the curtain -- it appears to behave like a particle. This is another example of complementarity, though of a slightly different quality than those we have already encountered.
However, another interpretation is possible. We can look at the situation in terms of probabilities. The probability of one point x of the wall being illuminated is equal to the probability of it being illuminated by light going through slit 1 plus the probability of it being illuminated by light going through slit 2, or: P(x bright) = P(light through S1) + P(light through S2). Clearly, this does not leave room for interference: probabilities are always positive, so for any given point, the sum of both probabilities is strictly greater than either probability alone. We thus get the 'two bright areas' picture we would expect for particles: directly behind each slit, the probability for light to get there via that slit is large, and the probability for it to get there via the other slit is small -- tending to 0 for slits far enough apart; right in the middle, both probabilities will be small (they can again be made arbitrarily small with the right arrangement); behind the other slit, we get the same picture again as behind the first one.
But, as we now know, quantum theory is essentially a generalization of probability theory, so what does it tell us about this situation? Well, first of all, quantum probabilities, as represented by the Wigner density on phase space, can indeed become negative, so it is no longer true that the sum of both probabilities is necessarily larger than either probability on their own. However, one must note that this only happens in such a way that no experiment gives a certain outcome with '-20% probability'! That would be a nonsensical notion, and luckily, these areas are protected against observation by the uncertainty principle.
Even more, though, quantum 'probabilities' -- one typically, for reasons that will become clear soon, talks about 'probability amplitudes' or just amplitudes for short -- can generally take on complex values.
This is very intriguing, because complex numbers have a direct relationship with circular motion. To see this, we must conceive of numbers as being related to transformations. If we have a stick, of length x, with its left end fixed at origin of the number line, i.e. at 0, then any positive real number tells us to stretch that stick by a factor equal to its magnitude, i.e. the number '3' is understood as the instruction 'make the stick three times longer'.
Negative numbers, on the other hand, can be understood as an instruction to flip the stick over, i.e. rotate it by 180° around the origin (remember, the stick is fixed there). So the number '-0.5' tells us to apply a half rotation to the stick, then shrink it to half its previous size. We can also check that this interpretation of numbers gels with our usual one: for instance, applying a half rotation twice is the same as not rotating at all, thus (-1)*(-1) = (-1)² = 1, as we would expect.
However, why would we limit us to 180° rotations? Let's consider what happens if we rotate our stick by 90°. We have now left the number line, and are in a plane; our stick stands orthogonal to the numbers we considered before. But this is no great mystery -- rotations are quite simple operations. Nevertheless, this simple extension introduces the full formalism of complex numbers. Let's call the number that effects our rotation of 90° i, for convenience. (Clearly, it can't be any of the numbers on the number line, so we have to invent a new name for it.)
Now, the simplest thing in the world is that you rotate by 90° twice (in the same direction), you have in total effected a rotation by 180° -- or, i*i = i² = -1. Thus, i, somewhat unluckily called the imaginary unit (which can, as we have seen, be tied to the quite real concept of rotation), is the square root of -1. It's a simple concept; nevertheless, it was regarded with suspicion even by mathematicians for centuries.
Anyway, in the end, it turns out that we can represent arbitrary rotations using numbers that are sums of real and imaginary parts, or compositions of stretchings, flips, and 90° rotations. For instance, the number 1 + i corresponds to a 45° rotation:

Fig. 11: 1 + i in the complex plane.

This should begin to remind you of something. Indeed, every complex number can be represented by a magnitude and a phase, where the former relates to stretching, the latter to rotation. These phases behave exactly like the ones we're already familiar with, and thus, show interference -- if two amplitudes differ in phase, the phase of the sum of the amplitudes is arrived at the same way as above, and may thus interfere constructively or destructively. The phasors discussed above can thus be represented very naturally as complex numbers; conversely, converse numbers exhibit the same phenomena we are already familiar with from phasors and their associated waves.
It is thus the different theory of probability, furnished by quantum mechanics, that is at the origin of the phenomenon of interference, and the apparent wave-particle duality. This is also the reason for the historically motivated terminology that refers to the state of a quantum system as its 'wave function', often denoted by the Greek letter ψ (psi), sometimes written as |ψ⟩, where the strange brackets just mean 'this is a quantum object'.
In the most familiar versions of quantum mechanics, there is a slight hitch here that I must confess I don't exactly know how to motivate using an intuitive argument. Basically, probabilities must be positive real numbers, so one takes the square of the absolute value of the amplitude in order to extract the physical prediction; this is known as Born's rule. This is not necessary in the two formalisms I have so far introduced: in doing quantum mechanics on phase space, the Wigner distribution yields probabilities in the same way as any ordinary probability distribution does, and in deducing quantum mechanics from quantum logic, the probabilities are obtained from the density matrices via a rule that is a natural generalization of the equivalent rule in classical probability theory. Of course, this rule is equivalent to the squared-modulus one, but it would take a bit of math to exhibit both in full detail -- it's important to note, however, that this is not an additional or ad hoc assumption in order to make things come out right, but a straightforward, if a little technical, consequence of the theory. For more technically versed readers, Saul Youssef has constructed an argument that this rule uniquely provides a relative-frequency interpretation of complex probabilities (see here).
The upshot of this is that it is now clear how to resolve the puzzles of the double slit experiment: the probability of any given point being illuminated is equal to the squared modulus of the amplitude for light to arrive there, which is equal to the squared modulus of the sum of the amplitudes for light to arrive there via slit 1 and light to arrive there via slit two. Or: P(x bright) = |A(x bright)|² = |A(light through S1) + A(light through S2)|². Since both amplitudes are complex numbers, they show interference, which is not changed by the squaring: if both sum to zero, then zero squared is still zero; if both sum to 1, the same holds, thus, there will be maxima and minima of illumination on the wall. Also, we immediately see the reason why the observation of light at the slits destroys the interference: the probability to observe light at any slit is equal to |A(light through S1)|² or |A(light through S2)|²; thus, the observation having been made, the probability of light arriving at any point on the screen is equal to the sum of the probabilities of the light going there through either slit, thus: P(x bright) = |A(light through S1)|² + |A(light through S2)|². Since these are both positive real numbers, there is no interference, and we observe merely two bright bands behind the slits.
However, this should not be taken as an argument for an 'all particle' version of quantum theory -- the view that, say, an electron is after all a particle, and it is just the weirdness in the quantum mechanical probability that causes interference patterns to appear. Indeed, I am not sure if the question of whether the electron is a wave, a particle, both, or neither, is a sensible one to ask. After all, there are seemingly empirically adequate models in which it is either of those: in the de Broglie-Bohm or 'pilot wave' theory, the electron is a particle, albeit guided by a 'quantum potential', while Carver Mead has proposed a model of 'collective electrodynamics', in which only wave phenomena exist fundamentally, the discrete appearance of particles of matter being related to quantization effects.
Maybe this question is of the same kind as an inhabitant of the Matrix asking what programming language his world is written in, and what the program is that computes it: there can be no unique answer to it, as its object is not what one might call an 'element of reality' in somewhat antiquated terminology. The experimental phenomena are independent of whether the electron is a wave or a particle, just as the experience of a person living in the Matrix is independent of the implementation of his simulated environment. I've considered the question of what, in such a context, it is reasonable for scientific theories to address elsewhere.

The Path Not Not Taken
Let's recap: we have seen that, in order to explain the seemingly paradoxical behavior of light in the double slit experiment, it suffices to appeal to a generalized theory of probability, such that the observed interference effects are in fact due to the relative phases associated with each path a photon, i.e. a particle of light, can take to the screen. This led us to consider probability amplitudes, whose squared modulus gives us ordinary probabilities, and which simply have to be added in order to determine the total amplitude for a photon arriving at any given point on the screen.
About the double slit experiment, Richard Feynman, along with Einstein perhaps the most-quoted physicist of the 20th century, was reportedly 'fond of saying that that all of quantum mechanics can be gleaned from carefully thinking through the implications of this single experiment' (source). As we will now see, he had good reason to think so.
Consider, besides the first two, cutting another slit into the screen, for a 'triple-slit experiment'. Our prescriptions don't have to be changed: now, all that we have to do is to add three amplitudes in order to determine the likelihood of a photon hitting a given point on the wall. The same is true for four, five, six, etc., holes. Nothing qualitatively new emerges here.
Let's add a second screen behind the first one. Now we have a two-step process: in order to determine the amplitude for a photon hitting the wall, we must first consider the amplitude of it traversing the first screen, then the amplitude of it traversing the second -- i.e. in order to derive the amplitude of the photon, emitted at point A, arriving at some point B on the wall, we must sum over the amplitudes of the photon going through each hole in the first screen, then through each hole in the second, then to point B. Again, there is nothing qualitatively new here.

Fig. 12: Multiple slit experiment.

But this actually exhausts the possibilities of modifying the experiment -- we can add more screens with ever more slits, but this will change nothing of the essence of our prior reasoning; we'll just have to evaluate ever more sums, which might get tedious, but does not introduce any new conceptual troubles.
So let's now consider the ultimate limiting case -- infinitely many screens with infinitely many holes in them. Clearly, we must sum over every path the photon can take to every point in between two points A and B, since at every point in space, there will be one of the infinitely many slits, and we have learned that we must sum over all slits. Again, this is nothing conceptually new (though in practice, things can get somewhat complicated when having to evaluate these infinite sums).
Now, the crucial thing to realize is that this case, in which there are infinitely many screens with infinitely many slits, is the same case as if there were no screens and no slits, but just empty space -- since at every point in space, there is a slit, and thus, at no point, there is a bit of screen. But this means, that in order to obtain the amplitude of a particle propagating from some point A to a point B, we must sum over all paths that it could take to get there, no matter how absurd they seem!
This is the germ of the idea behind the so-called path integral formulation of quantum mechanics, due to none other than Richard Feynman. (The above story is told in more detail, and with the math to back it up, in Anthony Zee's excellent Quantum Field Theory in a Nutshell, under the heading: 'The professor's nightmare: a wise guy in class'.)
Now, why do I introduce yet another formulation of quantum mechanics? There are two main reasons. First, the path integral formulation, while mathematically challenging, has the great virtue of lending itself well to intuition, better than other formulations at least. The second is that, using path integrals, or heuristic path sums, it is easy to show how quantum mechanics actually is necessary to establish a firm footing for classical mechanics, and to explicitly show the emergence of physical laws from a stochastic process, as discussed in the previous post.
To show this, I will borrow an example from Feynman's excellent 1985 popular science book 'QED: The Strange Theory of Light And Matter', which provides to this day the best introduction that I am aware of to the challenging concepts of quantum field theory, aimed at the general reader without mathematical background, showcasing Feynman's admirable skill at exhibiting high-level concepts without needing high-level mathematics. (The lectures the book was based on, by the way, are available as streaming video here.)
Let us consider the elemental phenomenon of reflection. Most readers will probably be familiar with the law that says that the angle of incidence has to equal the angle of reflection -- i.e. a beam of light, incident on a reflective surface under an angle α, will be reflected in such a way that the reflected beam will again form an angle of α with the reflective surface (or, as it is more usually defined, with a direction orthogonal to it).
The question is -- how does the light know to do this? Does it need to know in advance how the surface is tilted in order to be reflected appropriately? Is there a law that, from all possible angles of reflection, a priori selects the 'proper' one?
The answer can be found by thinking in sums-over-paths. The key is that the phase of every path depends on a quantity called action -- the larger the action, the higher the 'frequency', i.e. the faster the rotation of the little arrow in the phasor diagram. The action is a somewhat abstract quantity -- for our purposes, it is only necessary to know that it is dependent on the path the system takes.
Now take the following figure:

Fig. 13: Sum over possible reflection paths. Adapted from Feynman's 'QED'.

It shows the phase and the action for different possible reflection paths. As you can see, the larger the action, the more the phases vary among adjacent paths. As we know by now, in order to get the total amplitude, we have to add the individual contributions, which can be done graphically:

Fig. 14: Sum of phases. Adapted from Feynman's 'QED'.

There are three different regions to this diagram. The arrows in the middle one, labelled E to K, all point roughly in the same direction; their associated actions are small, so there is not much change in phase between the paths they represent. However, the arrows at the left and right ends, A to D and L to O, all point in progressively more different directions, leading to them 'going around in a circle'. But this means that their sum is equal to 0 -- they interfere destructively. Conversely, the arrows in the middle region reinforce one another -- they interfere constructively. The sum over paths will thus be dominated by those paths for which the action is small, since those paths get reinforced, while other paths get cancelled.
This is actually something quite remarkable: without putting it in, without postulating it, the law of reflection pops out, merely from the stochastic considerations on all possible paths of reflection! This law is thus a 'passive' one in the sense of the last post -- it is obeyed by the system, without having to be stipulated a priori; it is thus an emergent law.
But the consequences of this picture run much deeper than this. It can be used immediately to explain Fermat's principle, which says that 'light travels between two given points along the path of shortest time'. This principle, sufficient to explain all phenomena of reflection and refraction, poses, without the quantum-mechanical justification, a puzzle analogous to the previously formulated one: how can light know which path takes the shortest time to traverse? The answer is now clear: it actually traverses, in a manner of speaking, all possible paths -- but only those for which the travel time, and the action, is minimal, give a significant contribution!
But indeed, this is not yet the most general formulation. What holds for particles of light, in fact holds for all quantum mechanical systems -- in all cases, the 'path' (which may be more general than the notion of a single-particle path, i.e. a sequence of configurations of the system -- of some fields, say -- often more generally called a 'history') for which the action is minimal yields the greatest contribution to the amplitude -- so much so that, in the classical approximation (where quantum effects are deemed too small to care about), it suffices to consider only this path for the system. This is known as the least action principle, and it is arguably one of the most powerful tools in the toolbox of modern physics. (So much so that Bee Hossenfelder, over at Backreaction, has discussed it as a possible 'principle of everything', though simultaneously cautioning against the notion.)
The supreme importance of this principle can be gauged by realizing that all theories of modern physics, from Maxwell's electrodynamics to the quantum field theory (the already-mentioned QED or quantum electrodynamics) that encompasses it, from Newtonian mechanics to general relativity to the whole of the standard model of particle physics, can be derived through its application. One only needs some characteristic of the system one is discussing -- in the simplest case, its kinetic and potential energy --, and out pop the equations of motion, i.e. the laws governing the system's behavior. Since the principle of least action has its roots in the stochastic nature of quantum mechanics, thus all the laws of modern physics can be seen to be emergent ones -- eliminating the necessity of the laws being somehow set and fixed in an a priori way. Rather, they emerge from the behavior of the systems themselves.
As Bee notes, this is an enormously elegant tool for 'sensemaking': the universe follows the laws that it does, because of the principle of least action, because, in a sense, all possible laws a system could follow are implemented -- but only those that do not cancel each other out 'survive'. Thus, the laws emerge from lawlessness.

From Principle to Theory of Everything?
However, there is one question that is not addressed by the least action principle: while, given the particulars of a system, such as the universe, it can be used to derive the behavior of the system and the laws it follows, it is silent on why a system is the way it is, rather than some other way; this is generally considered as an input to be determined empirically. While it is possible that this is a hard and fast boundary, of either the principle or of science itself, the idea of extending its reach has been discussed occasionally.
Seth Lloyd, for instance, here considers the possibility of regarding spacetime as a computation, and answers the question of which computation is supposed to correspond to our particular spacetime by appealing to a superposition of all possible computations -- of all possible paths a computer might take through an abstract 'computational space'. This superposition is dominated by the programs with the shortest description -- by spacetimes with the smallest algorithmic complexity as discussed in this post. Thus, this produces a natural explanation for the fact that our universe seems to be governed by fairly simple laws -- these are the laws correspond to the shortest programs.
A related perspective is investigated by Jürgen Schmidhuber in his paper on 'Algorithmic Theories of Everything', and also by his brother Christof, who considers the possibility of deriving 'Strings from Logic'.
In such a picture, science would essentially be a process of data compression -- the effort to find the shortest program that gives rise to a certain set of data, in this case that data being the experience of the universe, concentrated into observations and measurements, since this program will be the one that dominates the 'sum over programs'. This is reminiscent of the tale of Leibniz and the inkblots, as told in the very first post of this blog: a couple of ink blots on a piece of paper may be considered lawful if they admit a complete description in such a way that the description is significantly shorter than just noting down the position of each blot -- i.e. if, for instance, there exists a short computer program that draws the distribution faithfully.
Also note the similarity of this construction to Chaitin's 'number of wisdom', the halting probability Ω, which is the sum of all halting programs on some computing machine, weighted by their length.
As a slight digression, this throws light on an aspect of science that is sometimes neglected: you can never be certain you have the complete picture. The reason for this is, quite simply, that compression isn't computable -- there is no universal program able to compute all strings by a maximal amount. So for each compression you find -- each candidate theory -- you will never be able to tell if it is the best possible compression; there always might be a better one lurking around. So the fun in science never ends!
Returning to the original point: there may be hope, if the universe is computable, to relate its complete description -- i.e. both its composition and the laws that it follows -- to a kind of 'least action principle' by considering all formally describable theories -- all possible computations --, which gives rise to a unique universe following a unique set of laws in the same sense that considering all possible paths gives rise to a unique path and angle of reflection for a beam of light. Such a universe would then be self-sufficient in the sense that neither its laws, nor its constituents, need any external justification -- they both emerge naturally out of the most general considerations possible.

The Emergence of Law

2011-12-05T11:18:00.000-08:00

For many scientists, the notion of a lawful, physical universe is a very attractive one -- it implies that in principle, everything is explicable through appeal to notions (more or less) directly accessible to us via scientific investigation. If the universe were not lawful, then it seems that any attempt at explanation would be futile; if it were not (just) physical, then elements necessary to its explanation may lie in a 'supernatural' realm that is not accessible to us by reliable means. Of course, the universe may be physical and lawful, but just too damn complicated for us to explain -- this is a possibility, but it's not something we can really do anything about.
(I have previously given a plausibility argument that if the universe is computable, then it is in principle also understandable, human minds being capable of universal computation at least in the limit; however, the feasibility of this understanding, of undertaking the necessary computations, is an entirely different question. There are arguments one can make that if the universe is computable, one should expect it to be relatively simple, see for instance this paper by Jürgen Schmidhuber, but a detailed discussion would take us too far afield.)
But first, I want to take a moment to address a (in my opinion, misplaced) concern some may have in proposing 'explanations' for the universe, or perhaps in the desirability thereof: isn't such a thing terribly reductionist? Is it desirable to reduce the universe, and moreover, human experience within the universe, to some cold scientific theory? Doesn't such an explanation miss everything that makes life worth living?
I have already said some words about the apparent divide between those who want to find an explanation for the world, and those who prefer, for lack of a better word, some mystery and magic to sterile facts, in this previous post. Suffice it to say that I believe both groups' wishes can be granted: the world may be fully explicable, and yet full of mystery. The reason for that is that even if some fundamental law is known, it does not fix all facts about the world, or more appropriately, not all facts can be deduced from it: for any sufficiently complex system, there exist undecidable questions about its evolution. Thus, there will always be novelty, always be mystery, and always be a need for creativity. That an underlying explanation for a system's behaviour is known does not cheapen the phenomena it gives rise to; in particular, the value of human experiences lies in the experiences themselves, not in the question of whether they are generated by some algorithmic rule, or are the result of an irreducible mystery.

In the previous discussion, I brought up, as an example for a rule-guided system that nevertheless can give rise to complex and unforeseen phenomena, simple 'games' known as cellular automata. A cellular automaton consists of a grid of cells, and a simple rule that determines what colour each cell to paint, depending on the previous state of the grid -- in the simplest cases, only the state of the cell itself, and its immediate neighbours.
The most basic such automata are those in which the grid only consists of a one-dimensional array of cells; their evolution is typically depicted as a two-dimensional grid, where each line represents the next 'time-step' in the evolution of the line before it, i.e. a one-time application of the rule. A typical evolution of such a cellular automaton looks like this:

Fig.1: Evolution of Rule 110

(The picture was generated using Wolfram|Alpha; if you fancy playing around with it a little, you can just type in 'rule' followed by a number between 0 and 255 -- there are 256 elementary cellular automata --, and it will show you the rule the automaton follows, and generate an example evolution.)
These automata form the paradigmatic example of what I will call an active or prescriptive law. Their evolution is described by a fixed rule, and every step of their evolution looks the way it does precisely because of that rule; they have no freedom, they could not have done otherwise (though it is possible to soften that condition by introducing a probabilistic law that, say, paints a certain cell black with 30% probability). The law determines their evolution.
When we think about (physical) laws, we usually think about active laws -- the stone fell down because of gravity, unstable atoms emit beta radiation because of the weak force, a massive object does not change its state of motion if no forces act upon it because of the law of inertia. Indeed, most people would perhaps hold that this is the only kind of law, or at least the only kind truly worthy of that name.
But if this were so, the proposal of a lawful, physical universe would face an apparently insurmountable obstacle. The reason for this is that the laws do not explain themselves: any explanation of a physical universe in these terms would be faced with the question, 'Why these laws?', and unable to answer it -- and hence, would be incomplete as an explanation. Stephen Hawking, in A Brief History of Time, laconically posed the question as: "What is it that breathes fire into the equations and makes a universe for them to describe?"
If there is a fundamental law, who or what put it there?
Certain attempts at ameliorating the problem have been made, which pursue broadly opposing directions: one is to insist that ultimately, there is only one set of laws that could possibly give rise to a universe, singled out by some criterion, often mathematical or logical in nature; the other one is to assume that all possible laws lead to universes, but we just happen to inhabit this one, because it is suited to our needs -- i.e. we couldn't exist anywhere else, hence, we exist here (this is subsumed under the umbrella of the 'anthropic principle', the subtleties of which I have no intention of getting into here).
Both paths, in my opinion, are faced with difficulties, that are interestingly similar in both cases. The most important is the lack of testability. If, in the first case, there is no way that the universe could have been otherwise, we loose the Popperian criterion of falsifiability that is supposed to differentiate good science from mere speculation (though here, also, a bit of discussion is lurking that has to be shelved for the moment), as there is no possible experiment that could falsify it -- for if there were one, then there would be a different way the universe could have been, and the theory would not be unique. The second case suffers from the same problem, but here, it is due to an embarrassment of riches: for every conceivable experiment, there exists a set of laws consistent with it, so an experiment can only tell us which universe we inhabit, not whether or not the notion of the existence of a 'multiverse' is true or false.
Fortunately, I believe that there is another possibility.

Passive laws
Not all laws are of the previously-defined active kind; there are also those laws that are what I will call passive or descriptive. The most clear examples are those obtained by some sort of 'averaging' procedure over a system's true fundamental dynamics. In certain cases, it may not matter what the system is doing in detail; an approximate, 'coarse-grained' description may be fully appropriate. This is the case, for instance, in statistical physics: if you consider a gas, it is usually immaterial what each and every one of its constituent atoms is doing; rather, we are more interested in the gas' macroscopic properties.
These macroscopic properties are defined by the aggregate dynamics of the gas' microscopic constituents: the temperature is the average kinetic energy of the atoms; the pressure is the average force exerted on some area of the wall of some container by the atoms colliding with them; etc.
One can find various relationships between these macroscopic properties, the most important of which is the ideal gas law, which states that the product of pressure and volume is proportional to the temperature, with the proportionality constant being related to the amount of gas we are considering.
These relationships have the same form as the rules of the cellular automata we discussed earlier: given some characteristics of the system, it is possible to use them to derive others; in particular, it is possible to describe the time evolution of the system, given the laws.
However, the interpretation of both kinds of laws must be different: while the laws governing the cellular automaton evolution exactly determine that evolution -- i.e. the evolution is a certain way because the laws say so --, laws such as the ideal gas law are the way they are because the evolution of the system happens the way it does; these laws do not prescribe a certain evolution, they merely describe the evolution that occurs. It is not the case that a future state of some gas is the way it is because the ideal gas law says so; rather, the ideal gas law has to be formulated the way it is in order to be able to describe the state of the gas.
The reason for this is that the cellular automaton's rule is a fundamental law, while the ideal gas law isn't -- it can be derived, as a relationship between statistical expectation values, from the dynamics governing the microscopic constituents of the gas -- i.e. the atoms it is made of. This entails the possibility that the ideal gas law may be violated! It is arrived at by 'throwing away' information about the fundamental laws, by going to a statistical description. But, statistics are only right on average; there is a certain probability that things might be different (however, a gas contains so fantastically many atoms, that violations of the statistics are spectacularly unlikely).
So far, we seem to have only managed to loose something, without any apparent gain: the new kind of laws we have found, the passive laws, only provide an approximate description of the system they apply to; it is only the forces of probability that make them hold. Nothing in the world says they can't be violated, they just will tend not to be, for sufficiently great sample sizes.
However, it is interesting to note that one very important law -- some say, the most important one -- is of just this passive kind: namely, the second law of thermodynamics, the law of entropy increase. Recall the discussion in this post: entropy is a measure of the number of microstates that yield the same observable macrostate -- for instance, the number of ways atoms in a volume of gas can be rearranged, without the gas looking any different. Macrostates having a greater number of associated microstates are more likely than macrostates that only few microstates give rise to -- for instance, the macrostate that corresponds to a gas of only half the volume of its container has less ways to rearrange the atoms (half of them, in fact) than the macrostate that fills the whole volume. Any change in state will typically lead to a more likely state -- simply by force of there being more of those to choose from --, which is a state of higher entropy. Thus, entropy always increases.
However, there is no fundamental law that says a gas can't spontaneously occupy just half the volume of the room -- it just will tend not to.
The interesting thing now is that this is not a law that needs to be built in, but one that will arise spontaneously, no matter the underlying dynamics -- it is thus a law exempt from the question: 'Why this law?'
In being descriptive rather than prescriptive, the second law justifies itself: it does not cause the system to behave the way it does; rather, it emerges out of the behaviour the system shows by itself. Distinctly from a cellular automaton rule, which has to be built into the cellular automaton by whoever created it, it arises spontaneously out of the system's behaviour -- indeed, even out of a cellular automaton's -- without having been put in beforehand!
That's all well and good, you might say, but underneath it all, there still is the fundamental rule governing the cellular automaton; whether or not we choose to forget about the microscopic laws, they still need to be there for any descriptive laws to emerge, don't they?
Well, actually, and perhaps (hopefully!) quite surprisingly, the answer to that question is: no, they don't! Even in the complete absence of a fundamental law, supervening, descriptive, passive laws can emerge.

Law Without Law
The argument is extremely simple. Consider a perfectly lawless object; any will do. In previous discussions, we have identified lawlessness and randomness: if there is no way to predict the behaviour of a certain system, then it is lawless; and equivalently, then it is random (as any prediction can then only be as good as chance allows). So, as a lawless object, we may take a random binary string, i.e. a sequence of 1s and 0s such that knowing the first (n-1) bits, guessing the nth is only successful with a probability of 50%. We may think of this string as being the record of the evolution of a certain physical system, or of a series of experiments performed on the system, but we can just as well consider it in the abstract -- as you recall, anything can be coded in binary.
Now let's imagine, instead of seeing the string up close, like this: 101001011110101110100..., we were sensitive only to certain 'macroscopic' properties of the string, for instance, the total number of 1s versus 0s. Imagine for instance that a macroscopic experiment corresponds to a great many microscopic ones -- a realistic model, if you consider that measuring, for instance, the temperature of a gas corresponds to measuring the kinetic energies of billions and billions of its constituent atoms. Surprisingly, while we can't predict anything on the microscopic bit-level, because indeed, the string is utterly lawless at that level, on the macroscopic level, we now gain the capacity to make predictions -- if only probabilistic ones. And even more strikingly, that capacity for prediction emerges precisely because of the fundamentally random, i.e. lawless nature of the string!
Because of this randomness, we know that, at each position, a 1 is as likely to show up as a 0; for a string of some length, thus, there will be as many 1s as there are 0s. We can thus predict that whenever we make our macroscopic experiment, we will receive a string in which there are as many 0s as there are 1s, and not a string that consists only of 0s, or in which 1s greatly outnumber the 0s. The more macroscopic our experiment, i.e. the longer the string, the more accurate this prediction will be. This is a law that emerges from fundamental lawlessness -- a law without need for justification. A universe built on such laws thus may be both physical and lawful, and hence, explicable at least in principle.
However, the possibility to predict the uniformity of a bit string may not sound overly impressive, at first. But, consider that bits can be used to represent anything. Fundamentally, a bit is nothing but a distinction: between up and down, red and green, round and square. Whatever differs in one characteristic can be used to store one bit of information. Conversely, one bit of information can be used to differentiate two things in one property. Bit strings can thus be thought of as representing all the properties of one object, as distinct from other objects (this necessity, of referring to other objects, introduces an interesting, relational aspect into the description: in a set of identical elements, none can be told apart from the other (obviously), so there is no point of reference for ascribing to these elements properties of their own; you first need to introduce objects that differ from an element in at least one characteristic in order to meaningfully speak about that characteristic, and in order to be able to represent information using these objects. We will return to this notion at some later point.).
A whole string of bits might, in aggregate, then stand for a macroscopic property; and just as only aggregate properties of the object matter macroscopically, only aggregate properties of the bit string may matter to determine them. So, consider the proposition 'the bit string consists of equally many 0s and 1s' to stand for 'the moon is made of rock', and the proposition 'the bit string consists only of 0s' to stand for 'the moon is made of green cheese'. Whenever you look at the moon, you are fed a new bit string -- the property has no independent existence apart from a measurement context. Nevertheless, with overwhelming probability, you will observe a rocky moon, rather than one made out of green cheese, even though the fundamental laws of the universe don't require it -- even if nobody ever decreed it to be this way, rather than any other.
Or, as a last example, consider a set of bit strings, each of which is determined separately. From any state whatsoever -- say, all bit strings start out all 0, or all 1 -- this system will, bit string by bit string, evolve towards a state in which almost all bit strings are composed of equally many 0s and 1s, showing only small fluctuations away from this state. This already comes very close to the thermodynamic phenomenon of equilibration, i.e. of evolution towards a certain state of equilibrium, say one of uniform temperature and pressure, in the case of a gas.
This, then, is the main message of this post: law can come from non-law, and not all kinds of law beg the question of their own origin. In the next post(s), I will discuss how quantum mechanics implies that in fact, all of the laws that govern our universe can be considered to be of this passive type -- but first, we'll have to think a little about what quantum mechanics actually is, and come to terms with some of its effects and implications.

The Origin of the Quantum, Part III: Deviant Logic and Exotic Probability

2011-11-19T05:50:00.000-08:00

Classical logic is a system concerned with certain objects that can attain either of two values (usually interpreted as propositions that may be either true or false, commonly denoted 1 or 0 for short), and ways to connect them. Though its origins can be traced back in time to antiquity, and to the Stoic philosopher Chrysippus in particular, its modern form was essentially introduced by the English mathematician and philosopher George Boole (and is thus also known under the name Boolean algebra) in his 1854 book An Investigation of the Laws of Thought, and intended by him to represent a formalization of how humans carry out mental operations. In order to do so, Boole introduced certain connectives and operations, intended to capture the ways a human mind connects and operates on propositions in the process of reasoning.
An elementary operation is that of negation. As the name implies, it turns a proposition into its negative, i.e. from 'it is raining today' to 'it is not raining today'. If we write 'it is raining today' for short as p, 'it is not raining today' gets represented as ¬p, '¬' thus being the symbol of negation.
Two propositions, p and q, can be connected to form a third, composite proposition r in various ways. The most elementary and intuitive connectives are the logical and, denoted by ˄, and the logical or, denoted ˅.
These are intended to capture the intuitive notions of 'and' and 'or': a composite proposition r, formed by the 'and' (the conjunction) of two propositions p and q, i.e. r = p ˄ q, is true if both of its constituent propositions are true -- i.e. if p is true and q is true. Similarly, a composite proposition s, formed by the 'or' (the disjunction) of two propositions p and q, i.e. s = p ˅ q, is true if at least one of its constituent propositions is true, i.e. if p is true or q is true. So 'it is raining and I am getting wet' is true if it is both true that it is raining and that you are getting wet, while 'I am wearing a brown shirt or I am wearing black pants' is true if I am wearing either a brown shirt or black pants -- but also, if I am wearing both! This is a subtle distinction to the way we usually use the word 'or': typically, we understand 'or' to be used in the so-called exclusive sense, where we distinguish between two alternatives, either of which may be true, but not both; however, the logical 'or' is used in the inclusive sense, where a composite proposition is true also if both of its constituent propositions are true.

A useful method to decide the truth value of composition is the so-called method of truth tables. A truth table is a table containing the truth values of elementary propositions and their compositions, like the following:

Fig. 1: Example of a truth table

In this table, two additional connectives to the already familiar 'and' and 'or' have been defined, the conditional → and the biconditional ↔. Their interpretation is that p → q (read 'p implies q' or 'if p then q') is true whenever q follows from p, and p ↔ q (read 'p if and only if q') is true when both p → q and q → p, i.e. if (p → q) ˄ (q → p) is true. These need not worry us too much, however, as p → q is equivalent to ¬p ˅ q, as can be easily checked with the truth table method, thus p ↔ q is equivalent to (¬p ˅ q) ˄ (¬q ˅ p) (where the brackets just mean that the expressions within them have to be evaluated first). Thus, they don't bring anything essentially new to the table; they can be regarded as merely convenient shorthand for our purposes.
Within this system of classical logic, one can carry out certain deductions -- essentially, one can use the method of truth tables to decide the truth values of arbitrarily complicated composite propositions, given knowledge of the truth values of their constituting propositions. Though simple, one should not underestimate this system -- essentially, it is all your computer ever does!

The Algebra of Sets
One interesting realization of the structure of Boolean logic is based on set theory, and has the advantage of being relatively easy to grasp intuitively. Any proposition can be interpreted as positing the membership of a certain object to some collection of things for which that proposition holds true -- i.e. a proposition asserts something to be an element of a particular set, the set of all things having the property the proposition ascribes to that something. So let the set be the set of all green things; the proposition 'grass is green' thus posits that a thing, grass, belongs to a set, the set of green things. The proposition is true, since grass is in fact a member of the set of all green things. Similarly, 'a ball is round' posits that balls belong to the set of round things, which is again obviously true. Conversely, 'fire is cold' is false, since fire is not in the set of cold things (which, for definiteness, might be considered to be the set of all things having a temperature below the melting point of ice).
And again, we can compose propositions out of more elementary ones: 'this ball is round and green' posits that a certain object, this ball, belongs both to the set of round and to the set of green things -- or alternatively, to the set of things that are both round and green. This illustrates that we can transfer the operations of Boolean logic to operations on sets -- the set of things that are both round and green is the set of all things that are both in the set of things that are round, and the set of things that are green. If the ball belongs to this set, then the proposition is true. The logical 'or' has a similarly simple interpretation: 'the toy is round' or 'the toy is green' is true if the toy either belongs to the set of round things, or to the set of green things -- or both. In particular, it is true if the toy is a round green ball.
There is an easy way to visualize this, known as Venn diagrams. The logical 'and', variously known as the intersection or meet of two sets, is represented as follows:

Fig. 2: Set intersection

The left circle can be interpreted as the set of all round things, while the right circle can be considered to represent the set of all green things; thus, whatever belongs to both sets -- whatever is both round and green -- lies in the red area. This is also often denoted as A ∩ B, where A and B are the two sets; in order not to confuse matters, though, we will stick with our original notation, A ˄ B.
The logical 'or', following the same conventions, can then be pictured as follows:

Fig. 3: Set union

It is alternatively known as union or meet of two sets, and an analogous alternative notation exists, which, however, we again won't bother with. Again, the red area marks the set of all things that a proposition built from the disjunction of two elementary propositions holds true for; i.e. it is the set of all things that are either green, or round, or both.
Negation, it should be noted, in this formalism is represented by the complement -- it is arrived at by 'inverting the colors' of any diagram, as the set of all things that are not in a set is the set of all things that are outside the set.

Probability
Armed with these notions, we can entertain some qualitatively new questions. Two features play a role here: one is that in everyday life, we rarely only consider 'simple' sets corresponding to elementary propositions; the other is that typically, we will have less than perfect information in any given situation.
In order to deal with the first, we will introduce the notion of subsets -- i.e. 'simpler' sets that form part of some larger, more complex set. For instance, the set of green or round things has as its subsets both the set of all green things, and the set of all round ones. In order to deal with the latter, we will introduce the concept of probability -- basically, a notion quantifying how much you should expect a given proposition to be true, or a given thing an element of a certain (sub)set, given insufficient information to deduce the actual truth value.
Let us, for concreteness, look at the set of all cats. This is a decidedly non-simple set: not all cats are the same, so saying of something 'it is a cat' does by no means entail having complete information about that something. Cats have different sizes, shapes, colors, genders, etc. All of these form subsets of the set of all cats that each individual cat may either belong to or not. So, a question one might ask is: "Given that this is a cat, how much should I expect it to be black? Or female? Or bigger than 30cm?"
This question is a question about the subsets of a set, asking how much one should expect an element of a set to be in some particular subset of that set. The answer can be determined easily -- by counting! To wit, just count the number of elements in the whole set, then count the number of elements in the subset you're interested in. So let's say there are 1000 cats in total, 200 of which are black. This implies that every fifth cat is a black cat. Thus, we can define our sought probability of a random cat being black as the proportion of all cats that also are black cats.
Of course, in practice, we won't have access to either the set of all cats or the subset of black cats; however, we can nevertheless estimate the probability by taking random samples, i.e. picking out cats at random, and noting how many of them are black. The more cats we pick, the more accurate our estimate of the probability for cat-blackness will become. It is like reaching into a bag containing all cats (though beware of cats in bags), pulling out one after the other, and noting their colors: about once in five times, you will grab a black cat.
Thus, we can clarify the notion of probability as being the ratio of the number of elements of a subset to the number of elements of a set. In mathematics, the number of elements of a set is called its measure, though the term is valid in much more general circumstances than we consider here. In these terms, the probability of something being an element of a certain subset is the measure of that subset, divided by the measure of the total set. This is a naive notion of probability, and I don't pretend to have given a full introduction here; however, for our purposes, it will prove sufficient.
We can now go on and derive the fundamental notions of probability theory. First of all, the probability of something being an element of the whole set -- the probability of a cat being a cat -- is obviously 1. We can use this to normalize our probabilities, and assign each subset a measure smaller than one, according to its 'relative size' with regard to the total set. This relative size we write as P(A), the probability of (something being in) the subset A. In the Venn diagrams above, this measure corresponds to the portion of the image that is red; one may interpret it as the probability of hitting the red area with an arrow fired randomly at the image (the reader is asked not to verify this for themselves, as monitors can be expensive). Thus, we can extend our notion of probability to continuous quantities, as well.
Another immediate consequence is that the sum of all probabilities of mutually exclusive events (an 'event' is just another word for a set element, here) is equal to 1 -- here, the mutually exclusive events correspond to disjoint subsets, i.e. sets that have no overlap, such as black and white cats (if you talk only about solid colors, that is). Clearly, if you unify all the possible non-overlapping subsets, you get the whole set back. This doesn't mean anything other than 'something has to happen', i.e. one event out of the probability space (the whole set) must occur, i.e. you draw a cat out of the bag.
Furthermore, if P(A) is the probability of event A occurring, it is clear that hence, the probability of A not occurring must be 1 - P(A) -- if P(A) is the amount of red in the Venn diagram, and 1 is the total area, then 1 - P(A) is the not-red area.
Next, we can combine probabilities in just the same way as we can combine propositions, with similar interpretation. If we have a proposition of the form 'the cat is black, and the cat is female', which you'll recall is true if the cat is both in the subsets of black cats and of female cats, then the probability of that proposition to be true is given by the measure of that subset, which is the intersection of the two other subsets -- i.e. if A is the set of black cats, and B the set of female cats, the set of both black and female cats is given by A ˄ B; its probability is denoted P(A ˄ B), and is given by the product of both individual probabilities, i.e. P(A ˄ B) = P(A)P(B), which can be easily seen if you consider that P(A) is the fraction of all cats that are black, and P(B) is the fraction of all cats that are female; of those, again a fraction of P(A) are black, so the fraction of all cats that are black and female is the fraction of cats that are black of the fraction of cats that are female, i.e. P(A)P(B).
Similarly, one can join propositions by the logical 'or', obtaining a value for the quantity P(A ˅ B). Since A ˅ B is true whenever A is, and whenever B is, this is equivalent to the total area both sets occupy within the whole set; thus, P(A ˅ B) = P(A) + P(B). But we must be more careful here -- as can be seen in fig. 2, both sets may overlap, and in the formula we just gave, this overlap is counted twice -- once as part of P(A), and once as part of P(B). The formula is thus only valid if both sets don't overlap, i.e. if there is no thing such that both A and B is true of it -- if, for instance, there were no cat both black and female. In the general case, we must subtract the intersection once. Luckily, we have just learned that the intersection is equal to P(A)P(B), so the general formula is P(A ˅ B) = P(A) + P(B) - P(A)P(B).
Another useful notion is that of conditional probability -- roughly, the probability that A happens, given that we know B has happened. If B and A don't intersect, we know that A can't happen, if B does -- the two are exclusive. Thus, the conditional probability of A given B -- written as P(A|B) -- must be proportional to the intersection of A and B. Since B has happened, we can ignore all events outside B, and thus, set P(B) equal to one, which amounts to dividing by P(B). Thus, we arrive at P(A|B) = P(A ˄ B)/P(B); this can be understood as the area of B that lies within A.
This completes our short survey of probability theory.

Quantumness
The notions we have used so far seemed quite general -- but implicitly, they relied on assumptions rooted in a classical understanding of the world. One concept in particular is not well captured by the mechanism developed so far, and that is the concept of complementarity.
If you recall, in the previous two posts in this series, complementarity was forced upon us by the notion of information-theoretic incompleteness. Information-theoretic incompleteness means, roughly, that there exist questions that a formal system can't answer, because of their complexity. We exhibited one particular set of such questions, the values of the bits of a halting probability's binary expansion beyond a certain point. This means there is a maximum amount of information that can be obtained about any given system (and if that amount is exhausted, all following measurement results must be maximally uninformative, and thus, random); thus, it follows that certain observables are engaged in a kind of back-and-forth: obtaining more information about one entails less precise information about the other. This is at the root of Heisenberg's famous uncertainty principle. (For a more formal discussion of the connection between incompleteness and complementarity see this paper by Christian Calude and Michael Stay.)
So, does the notion of complementarity bring anything new to the table? It does indeed!
First, we need to look at a straightforward consequence of the apparatus of logic we have discussed above. Using truth tables, the following identity can easily be verified:

p ˄ (q ˅ r) = (p ˄ q) ˅ (q ˄ r)

This is known as the distributive law. It is relatively intuitive, spelled out with concrete propositions: 'it is raining, and I am at home or I am outside' is equivalent to 'it is raining and I am at home, or it is raining and I am outside'. However, the notions of complementarity and distributivity do not play well with one another.

Let's consider this picture:

Fig. 5: Complementarity in phase space

It is a representation of the phase space of a one dimensional quantum system -- i.e. a quantum particle moving only in one direction. The particle's position is denoted on the horizontal, its momentum on the vertical axis. Momentum and position are complementary observables, and hence, can only simultaneously be known to a certain, maximum precision; this is encapsulated in the fact that there is a minimum area, sharper than which the particle is not localizable in phase space. This area is given by Planck's constant h, leading to the uncertainty principle ΔxΔp > h (this should be thought of as a heuristic, rather than exact, relation).

Now consider the following three propositions:

p: the particle's momentum is within Δp
q: the particle's position is within Δx₁, meaning the left half of the interval Δx
r: the particle's position is within Δx₂, meaning the right half of the interval Δx

In these, the phrase 'is within' may be interpreted as 'the value found by experiment will lie in the range of'. Now consider the composite proposition p ˄ (q ˅ r): it is clearly true, since it is essentially just a restatement of the uncertainty principle.
However, distributivity would tell us that this is equivalent to the proposition (p ˄ q) ˅ (q ˄ r) -- but this is clearly false, as both (p ˄ q) and (q ˄ r) are false! It is not the case that experiment will find the particle's momentum within Δp, and its position within Δx₁-- this would violate the uncertainty principle. Similarly, it is not the case that experiment will find the particle's momentum within Δp, and its position within Δx₂, as again uncertainty would be violated. One may imagine the particle, or its position, prior to measurement, to be 'too big' to fit in either Δx₁ or Δx₂, yet comfortably within Δx as a whole.
This is in stark contrast to the classical case. The difference is that in a classical context, every particle has a definite position (and momentum) at all times, we just might be ignorant about it -- but in quantum mechanics, it is not right to talk about a particle's momentum and position apart from a measurement context at all.
The machinery we have developed, while adequate in the classical case, thus fails to capture the quantum reality. In order to account for this discrepancy, the notion of quantum logic has been developed -- which is equivalent to classical logic, except that the distributive law does not hold. What, exactly, this quantum logic is is an entirely different discussion, and one I don't want to go into here -- some have suggested that it is the empirically adequate logic to describe reality, and that thus, it ought to replace classical logic (Hilary Putnam has argued for this point of view in his paper 'Is Logic Empirical?'); while others merely see the whole endeavor as an exercise in the manipulation of symbols.
However, a simple argument against the existence of any 'true' logic to use in reasoning about the world is that one can build computers whose architecture corresponds to different logical frameworks, which nevertheless end up being able to compute the same things. (For example, the Russian Setun was a computer built on ternary, rather than binary, logic, i.e. used three instead of two 'truth values'.)
For us, it is enough to realize that quantum logic is able to deal with certain awkwardnesses better than classical logic; that it is in principle possible to reason about quantum systems using classical logic is demonstrated by the existence of hidden variable models, i.e. theories explaining quantum behavior by appealing to certain fundamental, but inaccessible parameters of the theory, our ignorance of which leads to the apparent weirdness of quantum theory.
But, this puts us in a bit of a pickle, with respect to our interpretation as logic being about set membership: the algebra of sets is clearly distributive! So, how can propositions be modeled in a quantum context? Can an analogue notion of probability be found?
We will dodge the bullet by simply defining appropriate objects to model quantum propositions -- call them q-sets. They have all the properties of classical sets, except for distributivity. Thus, their algebra is equivalent to quantum logic the same way the algebra of classical sets is equivalent to classical logic. Mathematically, this is an easy step -- the algebra of sets forms an abstract structure known as a Boolean lattice; repealing distributivity merely means moving to an orthocomplemented lattice. Everything else works much as it did before, so, q-sets have elements, and if a certain element is in a q-set, the proposition 'element x is in q-set A' ('the particle's momentum is within Δp') holds true for it.
Now we can, again, erect a theory of probability -- of q-probability -- upon our theory of q-sets. Again, we will find probability measures, rules for the composition of probabilities, and so on. I will skip to the punchline here, as the detailed way to get there is a bit mathematical: in the end, the theory of q-probability one arrives at, is nothing else but quantum mechanics itself!
This is a remarkable result. From nothing but the complementarity of observables, arrived at via information-theoretic incompleteness, the whole formal apparatus of quantum mechanics emerges. Just mentioning some elements of this derivation, the q-propositions will turn out to be so-called 'projection operators' on Hilbert space (the quantum mechanical state space analogous to classical phase space); the q-sets will be given by (closed) subspaces of Hilbert space, each of which is associated with a certain projection operator; and the probability measure will turn out to be determined by the density operator, a certain representation of the quantum mechanical state of a system. Two particularly important results necessary to arrive at this derivation are Solèr's theorem, which essentially limits the choice of Hilbert space to those over the real numbers, complex numbers, or quaternions (which we'll meet again eventually), and Gleason's theorem, which roughly says that the appropriate probability measures are given by density operators. For more details, see the article at the Stanford Encyclopedia of Philosophy here, or the paper by Itamar Pitowsky here.
There is, however, one distinction between classical and quantum probability that must be made: classical probabilities are wholly due to ignorance, while quantum probabilities are irreducible. Every classical system has a definite state at all times, and experiment can reveal this state with arbitrary precision; that we can make only probabilistic statements is only due to our not knowing that definite state. In quantum mechanics, however, there is no 'deeper level' at which all probabilities are washed away through greater knowledge -- beyond a certain level, as required by complementarity, no more accurate statements can be made. Quantum randomness is fundamental.

Maxwell's Demon, Physical Information, and Hypercomputation

2011-11-12T10:30:00.000-08:00

The second law of thermodynamics is one of the cornerstones of physics. Indeed, even among the most well-tested fundamental scientific principles, it enjoys a somewhat special status, prompting Arthur Eddington to write in his 1929 book The Nature of the Physical World rather famously:

The Law that entropy always increases—the second law of thermodynamics—holds, I think, the supreme position among the laws of nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell's equations—then so much the worse for Maxwell's equations. If it is found to be contradicted by observation—well these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.

But what, exactly, is the second law? And what about it justifies Eddington's belief that it holds 'the supreme position among the laws of nature'?
In order to answer these questions, we need to re-examine the concept of entropy. Unfortunately, one often encounters, at least in the popular literature, quite muddled accounts of this elementary (and actually, quite simple) notion. Sometimes, one sees entropy equated with disorder; other times, a more technical route is taken, and entropy is described as a measure of some thermodynamic system's ability to do useful work. It is wholly unclear, at least at first, how one is supposed to relate to the other.
I have tackled this issue in some detail in a previous post; nevertheless, it is an important enough concept to briefly go over again.

To me, it's most useful to think of entropy as a measure of how many microstates there are to a given macrostate of some thermodynamic system. Picture a room full of gas, like the one you're probably in right now: what you observe is primarily that the gas has a certain volume, temperature, and pressure. These characterize the macrostate. However, at a level unobservable to you, the gas consists of a huge number of molecules (roughly 10²⁵ in a cubic meter). The positions of all of these molecules, their speeds, and the direction of their motions make up the microstate of the gas.
It's plain to see that I could change many details of this microstate without causing any observable change in the macrostate -- I could exchange an O₂ molecule in the upper right corner with one in the lower left, or more generally a molecule here with a molecule their in a myriad ways, and nobody would notice. So there are a great number of microstates to the macrostate of the gas in your room that you observe; hence, the gas' entropy is very high (maximal, in fact).
However, if I were to move all the molecules of air in your room to one half of it, leaving the other utterly empty, you most certainly would notice a difference -- especially if you happen to be in the now empty half of the room!
But this situation (luckily) would not persist -- if I 'stopped the clock', carried every molecule laboriously towards the back half of the room, then restarted the clock again, the gas would nearly immediately expand in order to fill out the whole room again. The reason is that the gas, bunched up in one half of the room, has a lower entropy than if it fills out the whole room. It's fairly intuitive: there are now less changes I could make to the configuration of the molecules that would go unnoticed -- there are less places for the molecules to be, for starters. The number of configurations of gas molecules that correspond to the gas being bunched up in one halt of the room is much lower than the number of configurations that correspond to the gas filling the entire room -- there are less microstates to the former macrostate than there are to the latter.
In fact, there are rather enormously less states available to the gas that are bunched-up than that are room-filling. Thus, if I were to choose a random state for the gas from a hat, I would with a much higher likelihood draw one that fills the whole room, than one that only fills half of it. This entails, however, that any change to the state of a gas will likely lead towards states of higher entropy -- since there simply are more of those. Thus, the gas expands.
This connects immediately to the notion of entropy as measuring the ability of a system to do work -- if I were to insert a piston into the evacuated half of a room, the gas' expansion would drive the piston, which, for example, might help pull a load, or do any other kind of work. If, however, the gas fills the whole room, and I were to insert a piston, it would be pushed on from both sides equally, thus not doing any work.
It's important to notice that ultimately, the second law of thermodynamics is thus a simple law of probability -- more frequent (in the sense of configurations of the system), i.e. more probable, states occur more often; that's all there is to it. It seems impossible to conceive of any way to violate this law -- Eddington's confidence thus was well placed.

Maxwell's Demon
Despite the second law's firm foundation, however, for more than 100 years a simple thought experiment stood against it, seemingly irrefutable. This experiment was conceived by James Clerk Maxwell, most well known as the originator of the mathematical theory unifying electricity and magnetism, and it came to be known as Maxwell's Demon. Maxwell imagined an intelligent being that, unlike you, is capable of directly observing the microstate of the molecules in your room, in particular, their positions and velocities (we are, for the moment, imagining the molecules as classical entities; thus, the demon is not limited in his observations).
Now picture a wall separating both halves of the room, and in the middle of the wall, a very tiny trapdoor that the demon can open and close at will; since it turns on well-oiled (i.e. frictionless) hinges, no work is required to open or close it. Whenever the demon sees a fast molecule arrive from the left side, he opens the door and lets it through; when he sees a slow molecule approaching from the right side, he does the same. Gradually, the right side will heat up, and the left side again cool down. This heat differential, however, can be used to do work -- the demon has managed to find a way to get a system whose entropy was originally maximal to perform useful work, flying in the face of the second law!
But is that really what happened? Almost everybody, upon first being told of this thought experiment, suspects something fishy is going on; nevertheless, exorcising the demon has proven surprisingly hard.
In order to get a better handle on the problem, let us look at a variation devised by the Hungarian physicist Leó Szilárd known as Szilárd's Engine. He considered a greatly simplified version of Maxwell's original thought experiment: in it, a single molecule moves in only one direction through the room. The demon measures which half of the room the molecule is in. If it is, say, in the left half, he slides (fritionlessly, so as not to require work expenditure) in a wall dividing the room; then, he slides (again frictionlessly) a piston into the right half, and opens the wall again. The molecule will now bump against the piston, each time moving it a little; this motion can again be used to do work. The 'room', i.e. the engine, is in contact with a heat bath, some environment at a certain temperature; thus, the molecule picks the energy it transfers to the piston back up from the heat bath, leading to a cooling of the environment, and hence, a reduction of entropy. Szilárd was able to calculate the precise amount of work as equal to kTln(2), where k is Boltzmann's constant, T is the (absolute) temperature, and ln(2) is the natural logarithm of 2.
(It's a simple calculation -- the work a gas does through expansion is equal to the pressure times the change in volume; since the pressure changes with the volume, one has to perform an integration, i.e. sum up very small changes in volume multiplied with their corresponding pressure. Thus the work W = ∫pdV, where the integral runs from V/2 to V, V being the room's volume. By the ideal gas law, pV = kT for a 'gas' consisting of a single molecule, thus p = kT/V, and W = ∫kT/V dV = kT(ln(V) - ln(V/2)) = kTln(2V/V) = kTln(2), where I've used a nice property of the logarithm, ln(a) - ln(b) = ln(a/b).)

Physical Information
The interesting thing about this result is that it directly connects information with the physical notion of work, and thus, of energy -- the demon obtains one bit of information about the system, and, using nothing else, is able to extract energy from it. Indeed, it is not hard to see where the value for the extracted energy comes from: kln(2) is essentially just a conversion factor between entropy measured in bits, i.e. information-theoretic Shannon entropy, and thermodynamic entropy; multiplying this by the temperature T gives the amount of work one would need to perform on the system in order to reduce the entropy by kln(2) (or equivalently, one bit), or conversely, the amount of work the system can perform if its entropy is reduced by that amount.
Still, the disturbing conclusion persists: the demon can extract work from a system in thermodynamic equilibrium, i.e. at maximum entropy. But the realization that information is a physical entity is the key of what we need to save the second law.
The equivalence between '1 bit of information' and 'kTln(2) Joules of energy' does not hold just in the special case of Szilárd's Engine; rather, as Rolf Landauer, working at IBM, first noted, it applies generally. To see this, consider how information is stored in a physical system. For each bit, there must exist a distinguishable state of the system -- remember, information is 'any difference that makes a difference'. Now imagine the deletion of information. In order to achieve this, consequently, the number of states of the system must be reduced. But if one reduced the number of (micro)states of the system, this would entail a forbidden entropy decrease -- thus, the entropy elsewhere, i.e. either in an environment acting as an 'entropy dump' or in a part of the system not making up those states that are used to represent information, must increase by (at least) a compensating amount.
For a more concrete picture, consider a system with a 'memory tape' consisting of a set of boxes, each of which is like the room in Szilárd's engine, containing an '1-molecule gas'. If the molecule is in the left half of a box, it corresponds to a logical 0; conversely, if it is found in the right half, its logical state is interpreted as 1. Re-setting the tape, i.e. transferring it to the logical state 'all 0', for instance, then corresponds to halving each boxes volume, and thus, halving the number of microstates available to the system (all gas molecules can now only be in the left half of their boxes, versus being anywhere in the box). To do so, the gas in each box has to be compressed to half its volume -- which is the inverse of the expansion process analysed in the previous section, and thus, necessitates a work equal to kTln(2) done on each box.
Information deletion thus always incurs the price of entropy increase, and consequently, the production of waste heat. One way to view this is that a 'deleted' bit is expelled into the environment in the form of kTln(2) Joules of heat.
It was Charles Bennett, a colleague of Landauer at IBM, who noticed that using these ingredients, the puzzle of Maxwell's demon could finally be solved. Key to this resolution is the realization that the demon itself must be a physical information-processing system (of course, one could posit it to be some supernatural being, but this spoils the debate, as nothing sensible can be said by physics about the supernatural by definition). In particular, he thus must have a finite number of internal states he can use to represent information, in other words, a finite memory. So, at some point after he has started taking data and extracting work, he will have to start making room for new data -- delete information, in other words. But this will create an entropy increase, and thus, waste heat, of precisely kTln(2) Joules per deleted bit -- just the same amount he has previously extracted from the system for free! Thus, all the work he got out of the system, he eventually will have to pay for in entropy increase. The second law is, finally, save!

Hypercomputation
Or is it? *cue dramatic music*
Bennett's analysis essentially assumes Maxwell's demon to be a finite-state automaton, a notion of computing machine somewhat weaker than a Turing machine in that it has only a bounded amount of memory available. This certainly seems reasonable, but it is not in principle impossible that there are physical systems that exceed the computational capacity of such a system, and assuming that there aren't assumes that the (physical) Church-Turing thesis holds, which roughly says that the notion of computability as embodied by Turing machines or equivalents exhausts physical computability, i.e. that there is no function that can be computed by some means that can't be computed by a Turing machine, or that there are no means of computation more powerful than Turing machines. Such a means, implemented as a concrete device, is called a hypercomputer.
This is closely tied to one of the central theses I wish to explore on this blog: that the universe itself is computable in Turing's sense, i.e. that in principle a Turing machine exists that can compute ('simulate') the entire evolution of the universe. Certainly, if this is true, then there can be no physical hypercomputers.
The current state of matters is such that physical theories seem to imply that the universe isn't computable; I have previously argued against this view, and will now try to use what we have learned in this post to mount another attack.
The main culprit standing against the computability of the universe is the (apparent) continuous nature of spacetime. This continuity implies that there are as many points in an interval of spacetime as there are real numbers; however, there are only as many Turing machines as there are natural numbers, so most of these can't be computed -- the continuum is not a computable entity. This can be exploited in order to achieve computational power greater than that of any Turing machine; two concrete proposals along this line are Blum-Shub-Smalle machines and Hava Siegelmann's Artificial Recurrent Neural Networks, or ARNNs.
Now let's suppose that things actually are that way -- the continuum is real, the Church-Turing thesis false. If we gave Maxwell's demon access to the computational power inherent in the continuum, does this have any bearing on Bennett's conclusion?
It is easy to see that this is indeed the case. Imagine that, for his memory, we gave the demon a certain continuous interval to work with. He could now use the following procedure to encode his observations of the molecule: if the molecule is found in the left half, discard the left half the interval; if it is in the right half, correspondingly discard the right half of the interval (this 'discarding' is not meant in the sense of deleting something --rather, one might imagine two dials, such that if the left half is discarded, the left dial is moved half the remaining space to its maximum value, and analogously for the right one). Since the continuous interval is infinitely divisible, this process can go on forever. Knowledge of the original interval and the remaining part encodes the precise series of measurement outcomes.
The demon thus never has to delete any information, and consequently, never incurs the entropy penalty for deletion. He can produce useful energy forever, creating true perpetual motion. Thus, in a non-computable universe, violation of the second law seems possible!
However, the strength of the argument on which the foundations of the second law rest -- remember, stripped to its essentials, it is just the observation that more likely states occur more often -- lets me conclude that this argument in fact should not count in favor of the possibility of perpetual motion, but rather, against the possibility of a non-computable universe. If we remember our Eddington:

If your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.

This conclusion, I must admit, is not quite rigorous -- it might be the case that the universe is non-computable without admitting physical hypercomputation. This is, however, a strange kind of position: it forces us to affirm the existence of a phenomenon for which no physical evidence, i.e. no observably hypercomputational processes, can be found -- thus in principle leaving open the possibility of formulating a description of the world in which the phenomenon is entirely absent, without being any less in agreement with observational data. As Ockham would tell us, in that case, the latter hypothesis is the one we should adopt -- meaning that the reasonable stance in this case would be to believe in the computability of the universe, too.

The Origin of the Quantum, Part II: Incomplete Evidence

2011-10-31T15:28:00.000-07:00

In the previous post, we have had a first look at the connections between incompleteness, or logical independence -- roughly, the fact that for any mathematical system, there exist propositions that that system can neither prove false nor true -- and quantumness. In particular, we saw how quantum mechanics emerges if we consider a quantum system as a system only able to answer finitely many questions about its own state; i.e., as a system that contains a finite amount of information. The state of such a system can be mapped to a special, random number, an Ω-number or halting probability, which has the property that any formal system can only derive finitely many bits of its binary expansion; this is a statement of incompleteness, known as Chaitin's incompleteness theorem, equivalent to the more familiar Gödelian version.
In this post, we will exhibit this analogy between incompleteness and quantumness in a more concrete way, explicitly showcasing two remarkable results connecting both notions.
The first example is taken from the paper 'Logical Independence and Quantum Randomness' by Tomasz Paterek et al. Discussing the results obtained therein will comprise the greater part of this post.
The second example can be found in the paper 'Measurement-Based Quantum Computation and Undecidable Logic' by M. Van den Nest and H. J. Briegel; the paper is very interesting and deep, but unfortunately, somewhat more abstract, so I will content myself with just presenting the result, without attempting to explain it very much in-depth.

As we have already seen, there is a deep connection between randomness and incompleteness, brought to light mainly by the work of Gregory Chaitin. It is intuitive in a way (though of course we are exploring areas to which intuition is a far from reliable guide): as far as a formal system is concerned, the answers to questions it does not determine may be chosen at will, i.e. randomly -- somewhat more formally, if T is a formal system, and G its Gödel sentence, i.e. a sentence which T can neither prove true nor false, then both (T + G) and (T + ~G), i.e T with either G or ~G (the negation of G) added as an axiom, will be consistent systems. In the framework of algorithmic information theory, a system T can only derive the value of finitely many bits in the binary expansion of a halting probability; the remaining ones, as far as one can consider them to have a 'true' value, do have that value for 'no reason', at least as far as T is concerned. Importantly, the number of bits T can derive is related to T's Kolmogorov complexity, i.e. its information content: loosely speaking, you can't get more information out of T than it already possesses. (This prompted Chaitin to formulate his 'heuristic principle': "You can't get a 10-pound theorem from 5 pounds of axioms".)
This principle is central to the paper by Paterek et al. As axiomatic systems, they consider simple propositions about Boolean functions of a single variable -- in many ways, the most simple systems imaginable. They encode axioms and propositions into quantum states in such a way that a measurement can be regarded as a test of logical dependence -- whenever the proposition is dependent, i.e. its truth value can be deduced from the axioms, measurement yields a definite outcome; however, if the axioms do neither prove nor disprove the proposition, the outcome of the measurement is random!
I find this to be a remarkable result; however, before we discuss it in greater depth, perhaps a few words are necessary to ward of some potential confusions.
First, as Paterek et al. note, the systems they study are not subject to Gödel's theorem (in a sense, they are 'too simple' or too weak for Gödel's proof to apply); indeed, they can even be completed -- i.e. augmented with further axioms in such a way that they indeed are able to decide truth or falsity of any theorem that can be expressed in their language. But, it is important to note that Gödelian incompleteness, while unfortunately being subject to quite a bit of mysticism, is nothing special -- it is not a different kind of incompleteness, but fundamentally the same kind of thing as is present in 'ordinarily' incomplete systems, that can in principle be completed. What's interesting about it really is that it applies, by the Gödelian construction, to all systems of a certain power -- to all universal systems, where universality here means, as introduced previously, the capacity to emulate all other at most universal systems (since if a system is universal, it can 'emulate' the natural numbers, i.e. derive all theorems derivable about them, and, since the theory of natural numbers contains undecidable statements, i.e. is incomplete, it must be incomplete itself). (Which is equivalent to requiring the system to be capable of universal computation.)
One can draw an analogy to the notion of size after Cantor: the fact that the set of real numbers is bigger than the set of natural numbers is no different from the fact that a set of seven elements is bigger than a set of five elements; the surprising thing is that the notion of 'bigger' does not stop once you get to infinity -- that some infinities are bigger than others.
Both the existence of a 'biggest' set and the existence of a set of axioms capable of proving all mathematical truths are reasonable expectations -- however, both turned out to be wrong, in related ways.
One might also worry that the identification of random measurement outcomes and independent propositions is trivial -- certainly, after the fact, one can always find such an association, even in a classical setting. One can always find a mapping between measurement outcomes and truth values of propositions, simply by interpreting these propositions as being about the measurement outcomes. But this is not what's being done here. Rather, at least in principle, one can extend the scheme to experimentally test whether any given proposition is formally independent from some axioms, even without knowing the answer beforehand!
Now let's take a look at what Paterek et al. use to make this work.

Boolean Functions
A Boolean function is a function of Boolean variables, i.e. 'binary' variables that can take one of two possible values, commonly denoted as '0' and '1'. The simplest of these functions only take one argument, and return one value. It is easy to see that there can be only four such functions, each of which can be specified by two bits of data. The first function takes the argument 0 to the value 0, and the argument 1 to the value 0, as well -- we can abbreviate this as (0,1) → (0, 0). In this notation, the other three functions are (0,1) → (0, 1), (0,1) → (1, 0), (0,1) → (1, 1). Thus, to specify each of these functions, it suffices to specify its image, the second pair of values; (0,1) then uniquely picks out the second function, (1, 1) the fourth, etc. One can interpret this as each function being described by two bits of axioms, with each one bit axiom taking the form 'the argument x is taken to y'. The fourth function would be described by the following axioms:

The argument 1 is taken to 1
The argument 0 is taken to 1

Note that two bits of axioms is the minimum amount of information necessary to specify any one function -- as it must be, since two bits of information distinguish between 2² = 4 alternatives. Thus, knowing each one axiom, the other is left completely unspecified -- it is logically independent. Knowing that axiom 1 is 'the argument 1 is taken to 0' gives you no information of whether axiom 2 is 'the argument 0 is taken to 0' or 'the argument 0 is taken to 1' -- both are consistently possible. Thus, you can extend the 'axiom system' with both propositions, and obtain a consistent system.
These things might all seem rather trivial -- certainly, the axiom systems you can construct from Boolean functions of a single argument are not very interesting. But they do serve as a conceptual model, and it is often useful to analyse the simplest possible cases before moving on to more interesting things.
And some simple derivations are already possible even here -- for instance, from the axioms above, the proposition 'the function is constant' follows. Note that this proposition again does not serve to characterise the function, as it does not tell us whether the function's value is 1 or 0; it is thus a one bit proposition, and only augmented with one more bit of data (such as 'the argument 1 is taken to 0') can it serve as a replacement axiom system to characterise the function.
In order to talk about these axiom systems in a quantum mechanical setting, we need some quantum system capable of encoding their properties; such a system is found in the elementary two-level system, more familiar as a qubit.

Qubits and Qubit Observables
A qubit is a quantum system that, like a bit, can be in one of two states, written commonly as |0⟩ and |1⟩; however, unlike a bit, and unlike anything familiar from the classical world, it can also be in a superposition of these two states, |ψ⟩ = a|0⟩ + b|1⟩, where a and b are complex numbers such that |a|² + |b|² = 1. We will not at this point worry about what, exactly, a superposition is -- it is enough to know that performing an appropriate measurement on the qubit will find the state |0⟩ with probability |a|², and conversely the state |1⟩ with probability |b|².
The relation between the wave function of a quantum system and the outcome of experimental measurements is a bit of a technical one, and I will only give a brief overview here, without attempting to introduce the full Hilbert space operator formalism of quantum mechanics.
The central notion is that of an observable. In classical mechanics, an observable is a physical property of a system that can be determined by some manipulations, i.e. experiments, on the system. Observables thus may be things like the speed and position of a classical particle, or its energy, etc. In quantum mechanics, every observable is associated with a linear operator -- a mapping on Hilbert space that takes it to itself, i.e. that takes a state of a quantum system and maps it to another state --, for which an equation such as the following holds: Ô|o⟩ = o|o⟩, where Ô ('O-hat') is the operator associated to the observable O, o is the value a measurement will return, and |o⟩ is a state of the quantum mechanical system in which it can be taken to have a definite value with respect to the observable O.
So, in order to encode our axiom systems into quantum mechanics, we need operators that return dichotomic values upon measurement; for convenience, one uses the values -1 and +1 rather than 0 and 1, which does not really mean more than a change of notation. Thus, we want operators such that Ŝ|ψ_±⟩ = ±1·|ψ_±⟩. These can be found in the form of the three so-called Pauli matrices, σ₁, σ₂, σ₃. In fact, together with the identity operator I, whose action on a state is essentially to leave it as it is, these can be combined to form any observable of a two-level system.
The key property of these Pauli matrices is that if a system is prepared in a state |ψ_3±⟩, by which I mean a state such that a measurement corresponding to σ₃ yields either +1 or -1 with certainty (called an eigenstate of σ₃), i.e. σ₃|ψ_3±⟩ = ±|ψ_3±⟩, then measurements corresponding to σ₁ and σ₂ yield a completely random outcome, i.e. either +1 or -1 with 50% probability. One can interpret this as being a result of the finiteness of information contained within the system: a qubit has an information content of just one bit; this bit is exhausted by giving a definite answer to the question represented by the σ₃-measurement. Thus, one cannot extract any information through performing the other two measurements -- their outcomes must be maximally uninformative, i.e. random.
This can be interpreted as providing a direct link between information-theoretic incompleteness and quantum randomness -- as a formal system can't derive the values of more bits of the binary expansion of a halting probability than it has information itself, a quantum system can't give definite answers to all questions one can ask of it via measurement. This introduces the notion of complementarity into quantum mechanics: roughly, complementary observables have the property that gaining information on one observable entails a loss of information about the other. This should remind you of the well-known example of uncertainty in the measurements of position and momentum -- the better you know one, the less well-defined the other becomes.

Logical Independence and Quantum Randomness
We are now ready to take a closer look at the paper's main result. The idea is to encode a one-bit axiom -- like those exhibited above -- into a quantum state, such that subsequent measurement reveals the logical dependence of a proposition on this axiom. The procedure through which this is done is essentially to prepare the qubit in a well-defined state, then have it propagate through a 'black box' that encodes the values a given binary function yields on it through suitable transformations, and then perform a measurement equivalent to a certain proposition. As we have seen, this measurement will yield a definite answer if and only if the qubit is in an eigenstate of the corresponding Pauli matrix.
All that is left then is to establish a mapping between propositions and eigenstates, such that whenever a proposition is independent, measurement results are random. It is crucial here that this mapping is not arbitrary; indeed, the scheme can be extended to multi-qubit systems, such that one can perform experiments that correctly determine logical dependency even if it is not known beforehand whether or not a proposition can in fact be derived from a certain axiom system. In principle, it is thus possible to use quantum systems to decide the derivability of propositions within axiomatic systems.
A qualitatively new aspect emerges when one considers more complicated axioms and propositions. An n-bit proposition may be thought of as consisting of n one-bit propositions; these, individually, may be either derivable or independent from the axioms. Thus, the composite proposition can be taken to be 'partially derivable', meaning measurements corresponding to this proposition will be 'partially random' -- some outcome may be more likely than another. This could hint at an explanation for the origin of quantum probabilities.
As in the previous post, it is thus the notion of independence, of incompleteness, that introduces quantum mechanical complementarity into the physical world -- and with it, the many strange and fascinating phenomena of quantum theory.

Graphs, Logic, and Quantum Computation
The second paper I mentioned in the beginning of this post, Measurement-Based Quantum Computation and Undecidable Logic, again offers a hint at a connection between undecidability and quantum phenomena, though the connection is less clear, and the discussion more abstract. The discussion takes place within the framework of quantum computation. It is widely believed that quantum systems, utilised for computation, can outperform classical devices significantly -- even though they can't compute more than classical systems, they can in some situations do so faster.
An often-cited example is the factorisation of numbers: for any integer n, find those prime numbers that, if multiplied, are equal to n. This is an important problem in cryptography, because it is hard to solve, but its solution is easy to verify -- multiplying a few numbers takes, on a modern computer, barely any time at all. So the knowledge of the solution to a particular factorisation problem serves as a convenient ID -- you are likely to only have this knowledge if you have been told the solution beforehand, so a quick answer to the question 'What are the prime factors of n?' validates you as trustworthy.
Quantum computers have the promise (or, depending on how you look at things, pose the threat) to change this; they can quickly answer this question, without having been told the solution beforehand.
Van den Nest and Briegel, in their paper, then investigate under what conditions quantum resources may outperform classical ones. To do so, they focus their attention on a special kind of quantum state, known as a graph state. These are states that can be uniquely described by a graph that depicts their 'history', so to speak -- a n-qubit graph state is described by a graph with n vertices that encodes the correlations between the qubits, i.e. roughly which qubits have been brought into contact with one another.
In order to perform their investigation, they note that a graph can be interpreted as encoding a certain logic. Such a logic contains statements about properties of the graph, such as whether or not two vertices are connected by an edge, but also, through suitable connection and quantification, whether or not a graph can be 2-coloured (i.e. have its vertices painted alternately with only two different colours, such that no two vertices having the same colour are connected by an edge).
If all such statements (or their negation) can be derived from the logic associated to a certain graph (or more properly, a family of graphs), then the logic is said to be decidable; if conversely there exist statements that can't be derived -- that is, if they are independent --, the logic is undecidable.
Now, the remarkable result that Van den Nest and Briegel arrive at, which I will just state here, is that precisely if the logic is undecidable, i.e. contains independent propositions, it is possible for the graph state, used as a resource for quantum computation, to outperform a classical computing device -- or conversely, to be not efficiently simulatable classically. Note that this doesn't guarantee a speedup -- in fact, graph states with undecidable logics that don't perform better than classical systems are known. However, no graph state whose logic is decidable can possibly outperform classical systems. An undecidable logic thus appears to be a necessary, but not sufficient condition for a speedup. From a computational standpoint, graph states with a decidable logic are thus equivalent to classical systems.

The Takeaway
In this post, we have examined two examples of a connection between undecidability and quantum phenomena. Put succinctly, they are:

Quantum measurements yield random results, if they correspond to propositions independent of the axioms encoded in the quantum state
Quantum resources of a certain kind (graph states) may outperform classical ones, if the logic they are associated with contains independent propositions

These results seem to share a similar spirit. In particular, they both point in the direction of incompleteness, as being instrumental to the difference between the classical and the quantum world. As long as we don't ask undecidable questions, we will receive only determinate measurement outcomes that reproduce classical expectations; if we don't consider undecidable logics, we will only perform computations equivalent to classical ones.
Of course, this comes as no surprise: in the previous post in this series, we have already seen strong hints of the same connection. Incompleteness, the inability of a system to answer questions beyond a certain complexity, there caused the system to be only imperfectly localizable in phase space, leading to the emergence of Planck's constant h, the uncertainty principle, and with it, quantum complementarity, and ultimately, utilising the mathematical mechanism of deformation, to the emergence of the full formalism of quantum mechanics -- at least in a heuristic kind of way. The results discussed in this post then serve to bolster confidence in the considerations of the last one.

The Origin Of The Quantum, Part I: An Incomplete Phase Space Picture

2011-10-11T07:07:00.000-07:00

In the last post, we have familiarized ourselves with some basic notions of algorithmic information theory. Most notably, we have seen how randomness emerges when formal systems or computers are pushed to the edges of incompleteness and uncomputability.
In this post, we'll take a look at what happens if we apply these results to the idea that, like computers or formal systems, the physical world is just another example of a universal system -- i.e. a system in which universal computation can be implemented (at least in the limit).
First, recall the idea that information enters the description of the physical world through viewing it as a question-answering process: any physical object can be uniquely identified by the properties it has (and those it doesn't have); any two physical objects that have all the same, and only the same, properties are indistinguishable, and thus identified. We can thus imagine any object as being described by the string of bits giving the answers to the set of questions 'Does the object have property x?' for all properties x; note that absent an enumeration of all possible properties an object may have, this is a rather ill-defined set, but it'll serve as a conceptual guide.
In particular, this means that we can view any 'large' object as being composed of a certain number of 'microscopic', elementary objects, which are those systems that are completely described by the presence or absence of one single property, that may be in either of two states -- having or not having that particular property. Such a system might, for instance, be a ball that may be either red or green, or, perhaps more to the point, either red or not-red. These are the systems that can be used to represent exactly one bit of information, say red = 1, not-red = 0. Call such a system a two-level system or, for short, and putting up with a little ontological inaccuracy, simply a bit.

Doing a 'measurement' on a two-level system then amounts to finding out which state it is in; that is, asking of it the question: 'Are you in state r?', where r here may be taken for red, for instance. The outcome of the measurement then represents the answer to this question: 1 for yes, 0 for no. The measurement on a compound object may then be taken as the composition of measurements 'asking' about all the properties the object might have; it will thus yield a bit string uniquely identifying the object.
Now, this is, in a sense, an overly literal presentation: the gist really is just that any object has a unique description, and that description contains a finite amount of information; I needn't have talked about bits or two-level systems at all, but for concreteness, it is convenient to frame the discussion in these kinds of terms.
In any case, we now see that making a measurement on some object or physical system amounts to asking questions of this system. But what if it doesn't know the answer? That is, what if, the information contained in a system being taken to correspond to something akin to the information contained in some set of axioms, the answer to some question asked of the system through measurement doesn't follow from that information, i.e is formally independent of it? Putting it yet differently, what if the system can't compute an answer?
Thanks to the discussion in the previous post, we know that what we should expect in such a case is the emergence of randomness -- the emergence, thus, of facts not reducible to the information contained in a system (by the way, I largely use the terms 'object' and 'system' in an interchangeable manner -- mostly, whenever I think of something as just passively sitting there, being subject to some kind of procedure, it's an object to me, while if I think of it as actively doing things, such as computing an answer to a measurement, I think of it as a system -- but these two views really just constitute a change of narrative viewpoint, nothing of greater substance). From the simple argument that if physical reality comprises a universal system, we should expect the same limitations and boundaries to hold for it as hold for other universal systems, like computers and formal axiomatic systems, we conclude that we ought to expect to find randomness in the physical world.
This is a conclusion very different from the one that is usually drawn when considering the possibility that the world might be computable -- in general, it is thought that the appearance of randomness throws a spanner in the works, since after all, randomness isn't computable (that's what makes it random)!
The key here is that one should realize that the randomness does not occur 'within' the computation, but rather, at the edge of it -- and in this sense, it is possible to at least obtain finite amounts of randomness, i.e. approximate some random number, like, for instance, a Chaitin Ω-number, to a finite precision, as Calude et al. do in this paper (NB: That an Ω-number can be computed to finite precision is of course equivalent to Chaitin's incompleteness theorem, as we saw in the last post). The conclusion that randomness implies the impossibility of computational physics is thus a rather premature one!
However, it is difficult to see what this appearance of randomness 'at the edges' might mean for physics. In order to see it in action, we first must set the stage, and for this, I'll need the concept of phase space.

Phase Space Dynamics
The classic Newtonian model system is a point mass moving through space. At any given moment in time, it has a certain location, described by three numbers, its coordinates. These represent basically how far away it is from the floor, from the left wall, and from the back wall of your room; these three references make every point in space uniquely identifiable.
However, this description alone tells us nothing about how the point mass is going to behave, i.e. where it will be at some later point in time. In order to be able to know that, you need more data; specifically, you need to know how the point mass changes its position with time -- that is, you need to know its velocity. Since there are three numbers determining its place, there also need to be three numbers determining its velocity -- one each corresponding to the change of position parallel to the floor, left wall, and back wall. It is more convenient usually to refer to the particle's (where here 'particle' just means 'Newtonian point mass', no relation with more sophisticated things known as 'elementary particles') momentum, which is its velocity times its mass, in the simplest case.
Thus, three numbers relating to the particle's position, and three numbers relating to its momentum, are enough to characterize all possible states of motion of the particle. These are the axes that span the particle's phase space. A state of the particle is a point in this phase space; the particle's evolution, being its change of state over time, then traces out a line in this phase space.
For simplicity, we can imagine the particle being constrained to moving in just one direction; since the other directions are fixed, we can safely forget about them, and the particle's state is determined simply by one number for its position, and one for its momentum along the relevant direction -- the six dimensional phase space collapses to a two dimensional phase plane. This has the great virtue of being easy to draw on a two dimensional monitor:

Fig. 1: A particle in phase space

As you can see, the state of the particle is completely described by the two numbers (x, p) corresponding to the particle's position and momentum, respectively. The particle's exact location and momentum can be determined experimentally, via measurement. But how are we to think of this measurement, within the model of measurement as asking questions of the system?
One way would be to ask questions like: 'Are you at position x₀? Do you have momentum p₀? Are you at x₁?...'
It is clear that this would be a very ineffective way of measuring the particle's properties: you stand to ask many questions before you ever get an answer that gives you any information (worse, even, since there are in theory infinitely many positions for the particle to be, and infinitely momenta to have, you ought to expect having to ask infinitely many questions -- and really, who's got that kind of time these days). It would be like playing 20 questions by going through a list of all possible people: 'Are you Aaron A. Aaronson? Are you Aaron B. Aaronson?' and so on. Not a winning strategy!
It's better to phrase your questions in such a way that each of them gives you valuable information -- each of them needs to constrain the possibilities such that the possible follow-up questions to each question are limited by the answers you receive. The simplest trick is not to ask whether the particle is at a given position, but whether it is within a given range of positions (or momenta), an interval (for this, we need to assume the range of possible positions is bounded somehow, which translates to the assumption that in the experiment, the system you're experimenting on is at least present; we'll just call the maximum value 1000, for convenience). Your starting question might be: 'Are you between 0 and 500?'
Now, in each case, whatever the answer, you will learn something: either that the particle is between 0 and 500; or, that it is between 500 and 1000! Depending on this answer, you can narrow down the scope of your following question: for yes, you ask: 'Are you between 0 and 250?', and for no, you ask: 'Are you between 500 and 750?' This is called a nesting of intervals.
Now if there is some smallest interval, then this process with eventually terminate, giving you the interval (of either momentum or position) the particle is in. But why would there be? In general, the assumption is that space is continuous, and that momentum is likewise; so there exists an exact real number for each of the particle's coordinates in phase space. But real numbers have the property that between every two real numbers, there are infinitely more -- which means that you can infinitely refine your interval, and nevertheless never exactly pinpoint the particle's location! In other words, it would still take asking and answering an infinite amount of questions in order to precisely know the particle's properties.
Of course, one could be contented with just knowing them to finite accuracy -- after all, every real-world measurement has some finite error, thus, all one ever gets is really a range of positions, or respectively of momenta, for the particle to be in. But still, there is something profoundly strange to the notion that it should take an infinite amount of information to find a particle in phase space.
But what if the information needed turns out to be finite? What if, after a certain point, the system is just not able to answer any more questions? We have, after all, reason to suspect it ought to be so: if the physical world is a universal system, it is limited by incompleteness. Some things can't be known to arbitrary precision, and at the edge of these unknowables, we find the notion of randomness.
So let's now suppose that the system in question were described by a random number, so that the bit string obtained by measuring it is random in the sense of the previous post (i.e. it is incompressible and contains a maximum amount of information). It is then a direct consequence from Chaitin's incompleteness theorem that this bit string can only ever be known to finite accuracy through computable means; this is so because, as we have seen, any random (and left-c.e.) number is an Ω-number, and Ω-numbers can only be approximated to a finite degree of accuracy by computation. This poses a fundamental limit on the information that can be contained in a system; in particular, returning to the example at hand, it means that there is a fundamental limit to the localizability of a particle in phase space. This is not to be thought of as a restriction applying only to measurement, i.e. that there exists a definite phase-space location of the particle, we are just forever doomed to be ignorant about it; but rather, the definite location does not exist, nature herself, if she is a universal system as stipulated, doesn't know the location with greater accuracy. The questions regarding smaller intervals are not answerable by the system; they are undecidable, uncomputable, an intrinsic expression of the system's incompleteness.
This means concretely that neither x nor p can be known with perfect accuracy; i.e. that the particle can only be confined to a minimum area within phase space, but not further than that. Putting it into pictures:

Fig. 2: Minimum area of phase space localizability

It's important to notice that this does not put a bound on either momentum or position measurement individually, but only on the two of them combined; the restriction is on knowing the exact state the particle is in, which is comprised of both position and momentum -- intuitively, you can view this as the 'ellipse of ignorance' being arbitrarily deformable, so that you can gain more exact knowledge of the position at the expense of exactness in your knowledge of the momentum. It's the area of the ellipse that must stay constant.
If this sounds familiar to some, there's a good reason: the area of the phase plane has units of [momentum · position] = [energy · time] = [Joule · second], which are, not coincidentally, the units of Planck's constant h. The simple assumption of a fundamental limit to the amount of information that is extractable from a system, which is in turn a direct consequence, via the Chaitin/Gödel limitative theorems, from the assumption that the physical world is a universal system, introduces the core quantity of quantum theory into our description.
But we can do much better than that.

Phase Space Quantum Mechanics
First, observe that the postulate that the minimum area within which a system can be localized in phase space is equal to h introduces the constraint that at minimum, the product of uncertainty in position and momentum equals h, or, expressed in an equation:

ΔxΔp ≥ h

This is Heisenberg's famous uncertainty principle, a key relation of quantum mechanics.

Now, imagine a system composed of multiple 'elementary' systems, for each of which this relation holds. It is natural to expect for this system's phase space area to be a multiple of h; this gives the condition:

∫pdx = nh,

which is the quantization condition of what's nowadays known as the old quantum theory; it's used to select the quantum-mechanically possible states of the system from the classically allowed ones -- i.e. only if the system obeys the condition above is it in a quantum mechanically allowed state. This is a historical predecessor of the full quantum theory, and many important systems, such as the hydrogen atom or classical textbook problems like the potential well and the harmonic oscillator, where it yields the important realization that energy is quantized in units of hν (ν being the frequency of, for instance, light), can already be treated rather well with it. Also, it prompted de Broglie to propose the duality between particles and waves, another famous and deep aspect of quantum mechanics.

But, one can go further even than that and recover the full formalism and phenomenology of quantum theory, using as input only the realization that there exists a minimum quantum of phase space area. To understand this, we must take a look at a mathematical technique called deformation.

Deforming Phase Space

Roughly speaking, a deformation is about what you'd expect it to be -- you take something and push, pull or press it along some axis such that in the end, you have an object clearly related to, but still different from, the original one, which, if the deformation is undone, returns to the form you started out with. Typically, in mathematics, these deformations depend on some parameter, such that if the parameter is extremized in some appropriate sense (generally, taken to zero or infinity), you get the original structure back. An example is the deformation of a circle, which yields an ellipse, and depends on a parameter known as eccentricity: the lower the eccentricity, the more 'circle-like' an ellipse becomes, such that an ellipse with eccentricity 0 is nothing but a circle.

In physics, too, deformations are not unknown: for instance, Einstein's special relativity can be seen as a deformation of Newtonian mechanics, where the deformation parameter is the speed of light c (or more exactly, the quotient of a system's speed v and c). When you formally let the speed of light tend to infinity, or only look at systems with speeds v << c, you get the Newtonian theory back.
Now, Newtonian physics has a very natural and quite beautiful formulation in phase space, known as Hamiltonian mechanics. Given the preceding considerations, an interesting question is: does there exist a deformation of Hamiltonian mechanics, whose deformation parameter is Planck's constant h?
It turns out that the answer to this question is yes -- and the deformation turns out to be quantum mechanics! More concretely, there exists an essentially unique (unique up to isomorphism) mathematical structure depending on the parameter h (or again more accurately, h/S, where S is the system's action), and that structure is phase space quantum mechanics, a version of quantum mechanics that is equivalent to the more usual Hilbert space version, but has the virtues of introducing less abstract formalism, and working in the same 'arena' as classical mechanics -- which, for instance, makes it easy to make the transition between classical and quantum mechanics explicit: it just amounts to undoing the deformation, i.e. taking the h → 0 (or S → ∞) limit, i.e. either letting the minimum phase space area tend to zero, or looking at things on a large enough scale such that it becomes negligible.
It's beyond the scope of this blog to provide a full introduction into phase space quantum mechanics, which, in my opinion, is a quite beautiful and remarkable formalism that is less well known than it perhaps ought to be; we'll just content ourselves with the following two remarks:

There exists a mapping, called the Wigner-Weyl transformation, which associates each quantum mechanical operator in Hilbert space with an ordinary function on phase space; these functions, respectively operators, are the observables of quantum mechanics -- they dictate what you measure, effectively
Under the deformation, the usually commutative multiplication of functions on phase space is replaced by a noncommutative star product (commutativity means that the order in which you do multiplication doesn't matter, i.e. that a∙b = b∙a; it was Heisenberg who first realised that you needed something noncommutative to do quantum mechanics)

From there, quantum mechanics can essentially be regarded as statistical mechanics on deformed phase space -- in classical statistical phase space mechanics, the Liouville density gives you the probability of finding the system at a given point in phase space, i.e. in our example, at a given momentum and position. The Liouville density evolves according to Liouville's theorem, which basically states that the change of the Liouville density over time is given by the Poisson bracket of the Liouville density with the Hamiltonian, which encodes the system's dynamics.

With the concept of phase space point no longer being meaningful, you need to generalize to the Wigner distribution, whose time evolution is given by the deformed version (the bracket, called the Moyal bracket, is related to the star product in the same way the Poisson bracket is related to ordinary phase space multiplication) of Liouville's theorem; this equation is the equivalent of the von Neumann equation in traditional Hilbert space quantum mechanics, which is in turn equivalent to the famous Schrödinger equation that describes the time evolution of quantum mechanical systems (and consequently, the Wigner density function is the Wigner-Weyl transform of the quantum mechanical density matrix). Thus, we get a formalism fully equivalent to Hilbert space quantum mechanics, just from a simple deformation.

If that all seemed a bit much, don't worry, it won't be on the test!

Maximum Information

In this post, we have seen how quantum mechanics follows from the existence of a minimum phase space area, which in turn we have concluded must exist because of the fact that no universal system can approximate an Ω-number to greater than finite precision -- because of incompleteness, in other words.
However, I don't wish to claim this as a derivation of quantum mechanics in the proper sense. The argument, as such, is heuristic at best, and only intended to serve as a motivation for a direction in which to look for reasons for quantum mechanics -- a principle of quantumness, so to speak.
In the literature, an approach towards finding such a 'principle of quantumness' in a principle of maximum information is somewhat common, and has been shown to be quite fruitful. The basic idea is to limit the amount of information an observer can obtain from a system, and several approaches have managed to glean large chunks of quantum mechanics from not much more than this assumption. The interested reader might for instance want to take a look at Rovelli's Relational Quantum Mechanics, Chris Fuchs' Quantum Mechanics as Quantum Information, Zeilinger's Foundational Principle for Quantum Mechanics, or just peruse Alexei Grinbaum's review paper which compares, contrasts, and to a certain extent unifies these approaches.
The connection between 'maximum information' principles and the view I propose may seem obvious, but can be made more precise by appealing to what Chaitin called his 'heuristic principle' (since proven by Calude and Juergensen), that one cannot derive a theorem from a set of axioms if the Kolmogorov complexity of the theorem significantly exceeds that of the axioms.
The principle is intuitively obvious -- Kolmogorov complexity provides a measure of the information content of the axiom system, so obtaining a theorem with a higher Kolmogorov complexity would seem to create information 'out of thin air'; and the manipulations one performs in order to make a formal deduction, which can be cast in terms of purely symbolic operations, certainly seem information-preserving, since they work on an entirely syntactic, as opposed to semantic, level.
Thus, incompleteness really can be seen as being equivalent to a 'maximum information' principle for formal axiomatic systems -- corroborating the notion that incompleteness in axiom systems and quantumness in the real world are ultimately the same kind of thing.

Information through the Lens of Computation: Algorithmic Information Theory

2011-09-18T05:19:00.000-07:00

Up to now, previous discussions about information and computation have been rather informal -- I have neglected to give detailed mathematical introductions to concepts like Shannon entropy, Turing universality, codes etc. for the simple reason that for our purposes, a qualitative understanding has been fully sufficient.
However, if we want to make any quantitative statements about the notions of information and computation, it will benefit us to take another look at the subject in a slightly more formal context.
To this end, the most efficient framework seems to me to be that of algorithmic information theory, introduced and developed mainly by the mathematicians Ray Solomonoff, Andrey Kolmogorov, and Gregory Chaitin in the 1960s. According to Chaitin, AIT is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously"; as such, it will aid us in achieving a better understanding, in a quantitative way, of both subjects.
In a way, AIT is concerned with a concrete realisation of the somewhat abstract-seeming notion of information or information content within the realm of computation. To this end, its central concept is the so-called Kolmogorov complexity, which, loosely speaking, provides a measure of the information content of an object -- say, a string of binary digits -- by considering the shortest program, on a given computer, that outputs this object. This might seem at first like a rather arbitrary definition: different computers generally will have different programs, of different lengths, that generate the string, leading to different complexities for one and the same object. But it turns out that despite this ambiguity, Kolmogorov complexity is useful independent of the concrete computational implementation it is based on.
First, some technicalities. We fix a notion of computer as a machine that takes as input binary strings, and outputs them, as well -- a computer thus is a (partial) function from binary strings to binary strings (where 'partial' means that not for all possible inputs, an output necessarily exists). We require that the admissible binary strings have the property that no valid string x is a prefix of another valid string y -- i.e. that y can't be written as xz, obtained from x through extending it with another string z. This ensures that programs are self-delimiting: when a machine has received the string x, it 'knows' that the input is complete, and there is thus no need for a special stop-character. Such machines are sometimes called prefix-free, self-delimiting, or simply Chaitin machines. There are universal Chaitin machines, so we can do everything with them we can do with any other concrete realisation of computation.
Let now U be such a universal Chaitin machine. We write the action of U on a string p, the program, as U(p) (a program, here, comes equipped with all its input). If U(p) is defined, the result will be another string, the output x, and we write U(p) = x. We denote the length of a string with vertical bars, such that |x| = n means that the string x is n bits long.
This is enough already to define Kolmogorov complexity formally. The Kolmogorov complexity K_U(x) of a given string x relative to a given universal Chaitin machine U is:

K_U(x) = min {|p|: U(p) = x},

which means nothing but the shortest (minimal) program p that, if executed on U, produces x as an output.
Thus, Kolmogorov complexity measures the power of a given Chaitin machine to compress a certain string. This has an intuitive connection with the notion of information content: the more information a string x contains, the harder it should be to compress it, as there is less redundancy to be exploited for the compression process. There are also obvious connections to a notion of 'orderliness': very highly ordered strings can be very highly compressed, while strings showing little or no order are barely compressible.
We've already met the precursor to these ideas, when I related the anecdote of Leibniz and the ink blots, back in the first post: a collection of ink blots on a piece of paper can be said to follow a law if there exists a simple mathematical description for their distribution; otherwise, it must be judged random.
Kolmogorov complexity can be used to formalize this: a string is random if there exists no program of significantly smaller length producing this string, i.e. if K_U(x) ≈ |x|.
But there's still the issue to consider that as defined up to now, Kolmogorov complexity only considers one specific universal machine as reference -- so it seems like it might well be that results obtained for one machine U might differ greatly from results obtained for a different machine V!
Luckily, however, that turns out not to be so. A celebrated result in AIT is the so-called invariance theorem, which roughly states that no matter what machine you choose to concretely evaluate the Kolmogorov complexity, the obtained results will hold for all possible machines -- at least qualitatively.
Concretely, by universality, we know that any given universal machine can emulate any other. In our concrete formalization, there exists a string v, such that the machine U, given v and a program, will execute the same computation as V, if V is given that program. Formally: if V(p) = x, then there exists v such that U(vp) = x (here vp simply means the concatenation of the strings v and p; note that this entails that v alone is not a valid string for U). But this means that Kolmogorov complexities relative to U and V can differ from one another at most by the length of the 'simulating' string v (there are typically many such strings; we'll just take the shortest)! Formally, there is a constant c = |v| such that: K_U(x) ≤ K_V(x) + c. Often, results in AIT will only depend on the asymptotic behaviour, rather than the concrete value, of K; thus, we can ignore the constant, which is always small in the asymptote, and write independently of any machine implementation K(x) for the Kolmogorov complexity of some string x.
In any case, the exact value of K(x) is in general rather immaterial, as K(x) is not a computable function. To see this, imagine you have a machine that, upon being given a string x, outputs K(x). Now, we can write a program that takes as input a natural number n, and produces a string with complexity greater than n, simply by trying every string in succession until it finds one. This program has a constant length c, plus the length of its input, |n|. So, we have a program of complexity c + |n|, which outputs a number of complexity -- as stipulated -- n. But this is a contradiction, since evidently, there exists a value for n such that this program is shorter than the shortest one possible, which is of length n, since n grows much faster than |n|!
Or in other words, if K(x) were computable, we could find a string of complexity n -- i.e. a string for which the shortest program that computes is is of n bits length --, and create a program that computes this number, whose length is |n| + c < n bits -- shorter than the stipulated shortest possible program -- an obvious contradiction. Hence, K(x) can't be computable.
This is related to Berry's paradox: n is the smallest number not definable in under sixty-six symbols. Yet it's just been defined using only sixty-five (counting blanks as a symbol)!

The Number of Wisdom
Let us now focus our attention on a curious object, constructed as follows: for some universal machine, if p denotes a program that eventually terminates and produces an output (rather than, say, loop forever), form 2^-|p|; then, add up all these factors for all halting programs. 2^-|p| is the probability of hitting on the program p randomly, through a series of coin tosses (recall that, since we're working with self-delimiting programs, the coin tossing sequence automatically stops when you find a valid program); the sum of all these probabilities is thus the probability that a given machine, if fed a random program, halts. Thus, it is fittingly known as the halting probability; other names include Chaitin's constant (though since every machine has its own unique halting probability, calling it a 'constant' is a bit of a misnomer), or simply Ω. Formally:

Ω = Σ2^-|p|,

where the sum is taken over all halting programs.
This might seem at first to be a rather baroque construction of dubious significance, but it has some quite remarkable properties:

It is algorithmically random -- meaning that the Kolmogorov complexity of the first n bits of its binary expansion is at least n - c, i.e. it is asymptotically incompressible -- no program significantly shorter than n bits can compute significantly more than its first n bits
It is, however, left-computably enumerable (left-c.e.), i.e. there exists a program enumerating the set of rational numbers q where q ≤ Ω
Knowledge of the first n bits of its binary expansion enables deciding the halting problem for all programs up to n bits in size

Also, it can be shown (see here) that all random left-c.e. real numbers are halting probabilities.
It follows immediately from these points that there exists no axiom system that can derive the value of more than finitely many bits of Ω's binary expansion -- since if there were, such a system could be automated in the form of a Turing machine, which then would be able to compute Ω with arbitrary precision.
This is actually a surprisingly deep statement -- it is formally equivalent to the better known concept of Gödel incompleteness, which roughly states that for any axiomatic system S of sufficient complexity, there exists a sentence G, expressible in the system, which asserts 'G can't be proven in S', leading to an obvious paradox: if G can be proven in S, then S is inconsistent, since it proves a falsehood; if G can't be proven in S, then it is incomplete, since there are truths that it is incapable of proving.
The existence of bits in the binary expansion of Ω whose value a given axiomatic system can't determine similarly exhibits that system's incompleteness; this result, in slightly different form, is consequently known as Chaitin's incompleteness theorem.
This points towards a connection between randomness and incompleteness: if there were a system that could derive arbitrarily many bits of Ω, or equivalently, a machine that could compute Ω to arbitrary accuracy, then Ω wouldn't be random -- since the amount of bits needed to specify the system would then be the consequently bounded Kolmogorov complexity of Ω.
One should take note of the fact that both incompleteness results crucially depend on self-reference: the Gödel sentence is essentially a formalization of the liar paradox, i.e. the impossibility to assign a consistent truth value to the sentence 'this sentence is false' -- if it is false, it is true, however, if it is true, it is false --, while Chaitin incompleteness depends on the uncomputability of Kolmogorov complexity, which can be traced back to the aforementioned Berry paradox.
Alternatively, it can be seen to follow directly from the fact that if there were an axiomatic system able to enumerate Ω to arbitrary accuracy, this system could be used to solve the halting problem -- which of course is known to be unsolvable.
Which is a shame, really -- and here we get to the reason why Ω is also sometimes called 'the number of wisdom --, because if one could solve the halting problem, the solution to lots of mathematics' most famous problem would instantly become trivial. Take Goldbach's conjecture: Every even integer greater than two can be expressed as a sum of two primes.
We could imagine writing a simple program that, for every even integer, simply checks whether or not it is expressible as a sum of two primes -- by simply doing a brute-force test, i.e. checking the sum of all primes less than that number. Clearly, if Goldbach's conjecture is false, it will eventually find such a number, and then terminate. However, if it is true, then it will never halt, but continue searching forever.
Now, if one had a way to solve the halting problem -- i.e. for any arbitrary program, find out whether or not it will eventually halt --, then one could just write the program that checks for Goldbach numbers, run it through the halting checker, and instantly know the truth (or falsity) of Goldbach's conjecture: if the halting checker proclaims that the Goldbach checker halts, then Goldbach's conjecture is false -- there exists an even number greater than two such that there is no way to express it as the sum of two primes. However, if the halting checker judges the Goldbach checker to be non-halting, then Goldbach's conjecture is true.
If this has a sort of eerie feel to it, it's well justified: the program that checks whether or not there is a counterexample to Goldbach's conjecture is never run, yet we know what would have happened if it had been run. This is a strange way of coming into knowledge -- it seems as if information had been created out of nothing.
The most curious property of Chaitin constants then is that knowing Ω entails exactly the ability to do the above -- a halting probability serves as an oracle for the halting problem. How is that possible?
Well, assume you knew the first n bits of some machine's halting probability. Then, on that machine, you run in sequence of ascending length every possible program up to n bits length, and record those that halt. For each halting program p, you add the factor 2^-|p|. Once your total equals the number corresponding to the first n bits of Ω, you know the programs that haven't halted up to now, never will halt, as they would add a contribution to the total that would make it exceed Ω.

The Takeaway
In this post, we have made contact with a number of results that we will meet again in the (hopefully) not too distant future. First, we have found a formalization of the notion of randomness, and a connection between randomness and the notions of incompleteness and uncomputability. Secondly, and more specifically, we have learned that a certain construction of a random number, Ω, can, due to incompleteness, be known to only finite accuracy by computable means. Thirdly, all (computably enumerable) random real numbers are Ω-numbers. And finally, knowing Ω to arbitrary precision entails being able to solve the halting problem.
These results interlock in interesting ways: if Ω were computable, it wouldn't be random; if the halting problem were solvable, Ω would be computable; if there were no incompleteness, the halting problem would be solvable, etc. Somewhere at the heart of all this, lies the concept of self-reference -- though to bring it out clearly into the open will require some more work.

The Universal Universe, Part IV: Computational Complexity

2011-09-03T02:28:00.000-07:00

Let me start this post on a personal note and apologize for the unannounced break -- I was on vacation in Norway, and, due to poor planning on my part, had an exam right afterwards, which kept me from attending to the blog as I planned.
It all turned out for the best in the end, though, since during my stay in Norway, I stumbled across this great essay by Scott Aaronson (discussed on his blog here), which alerted me to something I haven't paid a lot of attention to, but should have -- the field of computational complexity, and especially its implications for philosophy. Having now had some time to digest the contents, I think there's some important issues for me to think, and write, about.
Of course, I was aware of the field of computational complexity and its basic notions prior to reading that essay, but thought of it as in some way concerning 'merely' problems of practicality -- making the classic philosopher's error (not that I'm a philosopher, but this apparently doesn't keep me from committing their errors) of being content with showing something 'in principle' possible, ignoring the issues involved in making it possible 'in practice' as mere technicality.
Aaronson, now, has taken up the challenge of breaking this unfortunate habit, and does so with great clarity, showing that often enough, thinking about merely what is possible in principle, or, in more appropriate terms, what is computable, without thinking about the difficulty involved in actually undertaking that computation, misses most of the good stuff. One particularly interesting argument is his resolution of the so-called 'waterfall problem', which essentially poses that any physical process can be interpreted as implementing any computation, making thus the proposal that sentience can come about merely through computation somewhat suspect -- apparently forcing us to conclude that if there is a computation that gives rise to sentience, every (or nearly every) physical system can be viewed as implementing that computation, and hence, giving rise to sentience.

But first, let's have a little look at the fundamentals of computational complexity. Basically, computational complexity concerns itself with the question of how difficult it is for a computer to solve a given problem. To that end, one takes a measure of the size of some instance of the problem, and investigates how the resources the computer needs to solve said problem scale with this size.
For instance, if the problem is multiplying two numbers of n bits length, a 'naive' algorithm takes roughly n² steps; however, if the problem is conversely factoring a number of length n bits, i.e. finding those prime numbers that if multiplied yield said number, a similarly naive algorithm takes 2ⁿ steps to completion, which is hugely less efficient (in both cases, there exist more efficient algorithms, but the qualitative difference remains so far).
One should not underestimate the magnitude of the difference between the two examples: for just a 32-bit input, the multiplication problem can be solved in roughly 1000 steps, while factoring needs a whooping 4.3 billion steps -- depending on input size and the speed of the computational hardware you have access to, this distinction between polynomial and exponential dependence on problem size may well equate to the distinction of a few seconds or minutes of computation versus a time greater than the age of the universe (of course, to a modern computer, 4.3 billion steps isn't actually all that much).
To sort problems according to their difficulty, complexity theorists have devised a number of complexity classes, whose ordering principle is that two problems lie in the same complexity class provided their solution is (asymptotically) similarly resource-intensive. One of the most well-known such classes is called simply P, the class of problems to which a solution can be found in 'polynomial time' -- i.e. for which the number of steps it takes to find a solution scales as some power of the problem size; another is called NP, which, contrary to what is occasionally claimed, does not mean 'non-polynomial', but rather, 'nondeterministic polynomial', and roughly comprises those problems for which it takes polynomial time to check whether a proposed solution is correct. Thus, multiplication lies in P, factoring in NP (since in order to check whether a proposed factorization is correct, one has to multiply the factors). The question of whether or not P equals NP is one of the central open questions in complexity, with strong evidence -- but no conclusive proof -- pointing towards the case of both being unequal, i.e. to the existence of problems whose solutions can be checked, but not found, efficiently. The importance that is placed on this problem can be gauged by the fact that it is one of only seven 'Millennium Problems', the solution of which carries a prize of US$ 1,000,000. (If you'd like to know more about complexity and complexity classes, you could do worse than peruse the Complexity Zoo, created and curated by the very same Scott Aaronson.)

The Meaning of Computations
Now, what good does that do us?
There's one problem in particular that I have previously had only a somewhat muddy response to that really benefits from Aaronson's discussion, which is loosely the question: given a certain computation, what does this computation mean?
At first, this hardly seems like a deep question: computations that we are used to present their meaning to us in a straightforward way, such as in the form of a printout, or display on a screen. But remember the lessons of computational universality: any universal machine can implement any other, so there is a way of translating between the computation done by one machine and the computation done by another, such that each computation of one gets mapped to a computation by the other. The display on the screen translates our computer's computation into something a human brain can understand; however, other translations are possible.
We've met this ambiguity before, in the previous post in this series, in Deutsch's argument regarding the alleged impossibility of finding any true explanation for the nature of our world -- for any theory of everything one might come up with, many others can be found, making reference to different fundamental entities and interactions, yet being just as phenomenologically adequate. We'd be in the position of entities in a computer simulation trying to figure out the hardware and software their universe runs on -- but their universe might be written in LISP and run on a Mac just as well as it could be a Perl program executed on a Windows PC, with no way for them to tell.
Here, we're faced, in a sense, with the flipside of this issue: just as there is ambiguity in what computation realizes a given physical system or process, there is a similar ambiguity in the computation that a physical system carries out.
To my knowledge, the argument was first raised by Hilary Putnam, who formulated the following theorem:

Every physical system implements every finite state automaton (FSA).

Here, a finite state automaton is an abstract model of computation, similar to, but strictly weaker than, a Turing machine: as the name implies, it can be in one of finitely many states; thus, it in a sense works with finite resources, as opposed to the Turing machine's infinite memory tape. Strictly speaking, all physical computers are finite automata, since there's no such thing as infinite memory; they only acquire the characteristics of a Turing machine in the limit, if you imagine continuously adding more memory to them.
It's perhaps not obvious upfront that the above theorem is true. But consider the evolution of, say, a waterfall. At the start, the waterfall may be in one of finitely many states s_i, and after some time has elapsed, it will have evolved to a state t_i. The waterfall's evolution thus constitutes a function from initial to final states f(s_i) = t_i, and, if we accept the laws of physics to be reversible, that function will associate each initial with exactly one final state. But what we take these final states to mean is arbitrary; we could, for instance, set up an interpretation such that the initial states represent natural numbers in some ordering, and the final states represent natural numbers in another -- meaning that we can take the evolution of the waterfall to implement any function from natural numbers to natural numbers (or, more accurately, from a finite subset to another finite subset thereof). The same goes, for example, for strings of bits -- and computing functions from strings of bits to strings of bits is ultimately all our computers do, as well. So the waterfall's evolution can be viewed to implement any computation our computers can execute.
This, or so the argument goes, deals a deadly blow to computational theories of mind: for, if there exists a computation that gives rise to a mind, then every physical system -- provided it possesses sufficiently many states -- can be thought of as executing that computation; thus, either all physical systems must be conscious -- moreover, must be conscious in all the ways it is possible to be conscious: for, if there is a computation that produces my conscious experience, and there is a computation that produces your conscious experience, or anybody else's, then a system can be seen to implement all those possibilities --, or, there can be no computation that gives rise to a mind. In other words, computation is purely syntactic -- works just at the level of symbols -- but leaves arbitrary the semantics, the meanings of those symbols; yet those meanings are what's central to consciousness and subjective experience and all that.
This would seem to be a quite devastating conclusion if you're a computationalist -- like myself, as is obvious from past postings -- but Aaronson sees the crux of this argument in the complexity of the implementation of interpreting a physical system as implementing a given computation.
To see this, we first need to introduce the concept of a reduction. In complexity theory, a problem is said to reduce to another problem roughly if, given the solution to that second problem, the first problem's solution becomes simpler. Thus, if you had in hand a list of solutions to the second problem, or a device that readily provides these solutions (called an oracle), the complexity of solving the first problem becomes greatly reduced.
In the present case, the waterfall can be viewed as such an oracle -- it provides solutions to a certain kind of problem (the evolution of a waterfall), which then are used to solve a totally different kind of problem (for definiteness, let's say that problem is coming up with a good chess move). Now, the crucial point is that the reduction is useful just in the case that appealing to the oracle confers an advantage -- if the asymptotic complexity of the problem remains the same, then one might just as well do without the oracle. The algorithm that 'interprets' the oracle's output -- the waterfall's final state -- as a solution to the problem of finding a good chess move does the same work an algorithm that just computes that chess move would -- the waterfall doesn't really share any of the computational load. However, it's clear that for certain other problems -- say, the problem of simulating a waterfall -- the waterfall-oracle is extremely useful: the computational task, quite difficult without aid, becomes trivial.
So there are computations that a physical system implements in a more natural way than others, and for some -- one might conjecture the majority -- it will be no help at all; the work needed to interpret the system as implementing that computation is the same as just doing the computation itself.
An analogy may help to fully drive this point home. Consider a water-driven flour mill. At its heart, there is a simple apparatus that translates the energy imparted by, say, a waterfall driving a mill-wheel, into the grinding action of two stones versus one another, in order to grind the grain down to flour. Now, a miller, intimately familiar with this construction, comes -- say, through inheritance -- into possession of a plot of land. Naturally, he will want to build a mill on his new land. However, there is no waterfall available anywhere on it to drive his mill!
But, the miller is a clever sort, so, single-handedly advancing the state of technology, he invents an ingenious motor-driven pump system capable of pumping water up to a certain height, from which it is then allowed to fall, driving the mill wheel to grind the grain; at the bottom, the water is once more pumped up, closing the circuit (of course, this will require additional fuel for the motor; on this blog, we obey the laws of thermodynamics).
Of course, the ludicrousness of this scheme is apparent -- instead of using his motor to pump water, he could just have directly used it to drive the mill wheel, saving himself a lot of trouble. The waterfall drives the mill only apparently -- it could entirely be done without, just as it could be if there is a chess program computing chess moves as efficiently without a waterfall as the program that interprets the waterfall's evolution as chess-move computation does.
The conclusion is then that if the waterfall does not yield a reduction in the complexity of the problem of chess-move computation, then it can't be said to implement this computation. There thus are computations that it is natural to associate with a physical system -- the system's own evolution being the prime example --, while for the vast majority of other computations, a system can't be meaningfully said to implement them, as they are as efficiently (or inefficiently, as the case may be) possible without the system's aid at all.
Thus, most physical systems won't implement computations giving rise to consciousness after all -- saving us from having to apologize to every stone we stub our toes on, adding insult to injury.

Reasonable and Unreasonable Explanations
Having now a bit of a handle on what one might mean by the 'meaning' of a computation, we can extend the reasoning to further ameliorate the problem raised by Deutsch of finding a good explanation of the fundamental nature of the world in a computational universe.
Again, the argument goes something like: if we live in a universe that is in some sense equal to a computer executing a computation, then it is impossible for us to deduce the precise nature of this computation -- every computational theory we might come up with is in principle (but see above!) equally suited to doing so.
In the last article in this series, I have argued that this problem can be addressed by considering only theories that are 'well-suited' to human processing -- Newtonian gravity thus yielding a good explanation of planetary orbits, while electrodynamics doesn't, even though it is in principle possible to use it to compute them -- if through no other way than calculating the evolution of a computer, which essentially is nothing but an evolving electromagnetic field, that is tasked with computing orbital trajectories (alternatively, one might consider using the equations of hydrodynamics to calculate electromagnetic fields, by calculating the evolution of a waterfall implementing the necessary computation).
With the tools of computational complexity in hand, this reasoning can be made much more precise. Now, we can say that a theory of some physical phenomenon is only a good theory if it facilitates efficient computation of this phenomenon; in particular, a reduction of a phenomenon to some known theory is only practical if it actually lessens the computational load -- thus, hydrodynamics is not a good theory of electromagnetism, since there exists a theory (Maxwell's electrodynamics) able to implement the same calculations at less computational cost. Reducing electrodynamics to hydrodynamics is thus as pointless as reducing chess-move computations to waterfalls -- all the work is done by the reduction itself, so to speak.
Thus, contrary to Deutsch, we can find theories able to describe the world that are 'more natural' than others, at the very least. This does not address the question of 'what's really out there', perhaps: the universe might be 'really' something quite different from the mathematics we use to describe it. But I'm not sure there is a way the universe really is anymore than there is a path the photon really takes in a double-slit experiment. Perhaps the fact that there is a most efficient, i.e. least (or at least less) complex, way to describe it, is all we really get -- and need: just as the evolution of the waterfall being most efficiently seen as a computation of the waterfall itself justifies seeing the waterfall as a waterfall, rather than a chess-move computing system, or a conscious mind, maybe the computation running 'beneath' the observable reality of the universe being most efficiently seen as the evolution of the universe justifies seeing this computation as the universe. In this view, the computation would be primary, and complexity considerations would justify picking out one 'meaning' of this computation (or a small class of possibly related ones, perhaps).

Computing Universes
There's yet another problem that might be helped by these insights. In recent years, a few proposals have come up of what is sometimes called 'ensemble theories of everything', or 'ETOE's (some examples are due to Jürgen Schmidhuber, Russel K. Standish, and Max Tegmark). The basic idea of most ETOEs is that it is easier, in a well-defined way, to compute/generate all possible universes, than it is to just compute one -- the reason for this is that essentially it's easy to write a short program that produces all possible outputs, but producing any given output may be hard -- indeed, most possible outputs don't have a short program describing them (this ties in with a concept known as Kolmogorov complexity, about which much more in the next post).
So what you can do is to set up a computer that first generates, then executes all possible programs in a dove-tailing fashion (i.e. something like: perform the first step of the first program; then, perform the first step of the second program, and the second step of the first program; then, perform the first step of the third, the second step of the second, and the third step of the first program; and so on). Eventually, any possible program will be generated, and run -- thus, if, say, there is a program computing our universe, this program, too, will be generated and run.
For definiteness, let's fix a Turing machine taking as input binary strings, and outputting binary strings as well. It's fed with a master program, whose task it is to generate every possible binary sequence in some order, interpret each such sequence as a program, and execute it in the dovetailed manner outlined above. Sooner or later, one of its computations will correspond to our universe -- we'd have to wait a bit, sure, but it saves a whole lot of effort in creating universes.
But wait a moment -- how do we tell which of those computations corresponds to our universe? Sure, we could fix some interpretation -- some way to look at the output generated by all the programs --, and under this interpretation (say, a translation of the binary strings to output on a screen in some way), eventually, we'll find our universe within the outputs. But, this interpretation is completely arbitrary! In fact, we can find an interpretation such that any output conforms to our universe. Worse, in the absence of an interpretation, everything we have really are just binary strings -- Turing machines don't output universes, they output symbols! Thus, we seem to be dependent on somebody or something interpreting the outputs in some certain, arbitrary way in order for this scheme to give rise to our universe, or in fact to any universe at all. I've called this problem in the past the 'Cartesian computer', because of its resemblance to Dennett's Cartesian theater (see this previous post): some meaning-giving entity must be invoked that 'looks at' the computer's output, and recognizes it as our universe. This is a deeply unsatisfactory situation, in my opinion.
But, knowing what we know now, we can point to a possible resolution of this problem: for most (probably almost all) possible interpretations, the computational work needed to implement them will be intractable; contrariwise, natural interpretations exist that correspond to true reductions of the task of interpreting -- to a certain extent, then, this removes the arbitrariness of the interpretation, and helps fix one computation, or a small class of related computations, as 'computing our universe'.

Consciousness Explained Anyway

2011-07-26T10:11:00.000-07:00

Today, we are going on a slight diversion from the course of this blog so far, in order for me to write down some thoughts on the nature of human consciousness that have been rattling around in my head.
In a way, this is very apropos to the overarching theme of computationalism (which I personally take to be the stance that all of reality can be explained in computable terms, a subset of physicalism) that has pervaded the posts so far (and will continue to do so), because the idea that consciousness can't be reduced to 'mere computation' is often central to supposed rebuttals.
In another way, though, consciousness is far too high-level a property to properly concern ourselves with right now; nevertheless, I wanted to write these things down, in part just to clear my head.
My thoughts on consciousness basically echo those of the American philosopher Daniel Dennett, as laid out in his seminal work Consciousness Explained. However, while what Dennett laid out should perhaps most appropriately be called a theory of mental content (called the Multiple Drafts Model), I will in this (comparatively...) short posting merely attempt to answer one question, which, however, seems to me the defining one: How does subjective experience arise from non-subjective fundamental processes (neuron firings, etc.)? How can the impression of having a point of view -- of being something, someone with a point of view -- come about?

On this question, there seem to be two intuitions, both very reasonably motivated on their own terms: 1) it's simple, and 2) it's impossible. I'll representatively discuss two thought experiments motivating each of the two points of view.
The first is what I call the 'brain of Theseus'-experiment (after the well-known 'ship of Theseus' paradox): Imagine one single neuron. Its functional characteristics consist essentially of a list of conditions under which it fires. This can be easily modelled artificially, with a little electronic circuit, or a computer program. Suppose one can match the characteristics of the neuron exactly. Now, in such a way as to make the transition appear seamless, the 'fake' neuron is substituted for a 'real' neuron in a living brain. None of the surrounding neurons notices anything: they receive the same firings when the same conditions are met as before. Thus, there is no difference to the brain as a whole. So, let's go on: replace a second neuron. And a third. And so on. If the previous considerations were correct, at no point should the brain notice any change -- everything continues to work the way it always did.
Now, imagine at some point, one hemisphere of the brain has been entirely replaced. The other, still, won't notice anything off -- any signals it sends across the pons elicit the same responses they would if there were still a 'real' hemisphere there, instead of a 'fake' one.
Then imagine completing the transition from 'real' to 'fake' brain. If the first exchanged neuron did not make a difference, and neither did the second, and so on, then it's hard to avoid the conclusion that the new, 'fake' brain will still work the exact same way the original one did -- if the brain with one neuron replaced still thought, experienced, felt and behaved like before, and so did the brain with two neurons replaced, etc., then the new, fake machine-brain will do, as well.
Since the new machine-brain is essentially nothing but a complicated computer, thus, consciousness and sentience can be generated by a computational structure, and could, for instance, be simulated on a computer.
The other thought experiment, which is in some sense the exact opposite to the preceding one, is known as the 'zombie argument'. I'll give a slightly modified form to better fit the context of computation. A philosophical zombie, somewhat removed from the brain-gluttons of horror movie lore, is a being that is, in its actions and behaviour, indistinguishable from any ordinary human, but lacks any sort of consciousness, or subjective experience. That is, if the zombie is subject to certain stimuli -- such as, for instance, being poked with a sharp stick --, he will react in exactly the same way as a human would -- recoiling from the offending stimulus, uttering a cry of pain, and possibly a choice selection of profanities directed at whoever is at the other end of the stick. However, he would not feel the pain; he would not mean the profanities, at least not in the same way we do. His reaction will be entirely an 'automatic' response, triggered by the presence of a certain stimulus.
Ultimately, all human behaviours can be brought into this paradigm: under certain conditions, certain behaviours are produced. From this, one could abstract a computational model, which would react just like a real person, while lacking all the rich inner life that ultimately makes us human. Moreover, one could continue refining this model: add some more fine-grained descriptions of the person's biology, chemistry etc. -- all of which are equally well just simple chains of certain conditions evoking certain responses, and are thus not going to trigger some sort of spontaneous 'awakening' to consciousness.
Indeed, the same continues to hold at ever more fine-grained levels of simulation, up to cellular level, or even beyond. The zombie's neurons, after all, are just this: outputs, firings, evoked by certain conditions. This is essentially the same model we described in the previous thought experiment -- however, arrived at in this 'top-down' manner, the conclusion appears reversed! It seems that there is no way that this simple collection of rules for generating certain responses could have anything at all like what we call 'consciousness'.
We're faced with quite the dilemma: with seemingly equally good reasons, we have arrived at contradictory viewpoints. How can this be reconciled?

Zombots
First, we'll backtrack to the point where both thought experiments still agree -- which is, that it is possible to create a being, indistinguishable from a human in action and reaction, using computational means. Using this agreed-upon starting point, we will show that this is actually all that we need, thereby eliminating the apparent paradox.
For definiteness, let's imagine a machine running a 'human-simulation' in the form of a chat-window one can type into; call such a device a zombot for whimsy. The possibility of zombots will be taken as a given from now on.
As stipulated, such a zombot would pass the Turing test: i.e. to any human it converses with, it would seem indistinguishable from another human; likewise, it would pass any other test for 'consciousness' that can be administered in this way. But it would, of course, not be actually conscious.
More than that, though, it could also administer Turing tests, and as well as any human can -- i.e. it would be 'convinced' of its testee's consciousness whenever a human would be, too (though of course, not being conscious, it would only be convinced in the sense that it might print out the words 'Well, I'm convinced', or something similar -- it would not actually feel convinced, or be in some mental state of convincedness, neither having feelings nor mental states).
So, when a zombot is presented with another zombot to test, it would be just as 'convinced' of the zombot's consciousness as a human would be. We need not limit the zombots to text-based interaction, for this experiment; they may exchange data in whatever form suits a zombot best -- though of course, all the information they might exchange in other ways can, as we have already learned, be recast in the form of a question-answering process.
But now, what happens if we pull a trick on the zombot -- if we just wire its output to its input, directing its interrogation 'inwards', onto itself?
Well, by the reasoning above -- it would pronounce itself conscious!
Of course, that doesn't mean much -- actually, it doesn't mean anything, at least not to the zombot. All that's going to happen is that maybe a little lamp lights up somewhere to indicate 'test subject is conscious', or the zombot prints out words to that effect.
Nevertheless, in any interaction with the zombot, it would claim itself conscious -- and to the best of its ability to tell, that's nothing but the truth. Moreover, in any interactions the zombot might have with itself, it will insist on being conscious.

Evening Matinee in the Cartesian Theater
In order to properly gauge the significance (or lack thereof) of the conclusion we just arrived at, I'd like to step back for a moment to consider a different issue, which is how perception works -- or more precisely, how it doesn't work.
The intuitive picture most people have of their own perceptual process is that of somehow being presented with percepts, i.e that objects of perception, or perceptual attention, are represented in the mind, for the self to behold. It's as if there were an inner stage (which Dennett calls the Cartesian theater), on which a play, 'inspired by actual events' -- those in the real world that are being perceived -- takes place.
This idea has a certain obviousness to it: the senses yield data, which is in some way prepared by the computational apparatus of our brain -- there's always some editing, details judged unimportant and cast to the cutting room floor, some embellishment of salient scenes, various other steps of post production (my brain, for instance, seems to like adding a soundtrack), and often quite a shocking amount of artistic license --, to then be perceived by the self.
But wait a minute -- this picture was supposed to explain perception (of the outside world by us), but now it turns out to crucially depend on perception (of the representation of the outside world by our selves). This is rather blatantly circular -- how is this next step of perception to work? Again by invoking some process of representation and perception, on a yet higher level? Or, if this second-level perception does not depend on such a scheme, why was it necessary in the first place -- if our selves can perceive the representation brought before them without recourse to another level of representation/perception, then why can't we, using the same process, perceive the outside world without creating a representation for our selves' benefit?
And indeed, the regress is vicious: as our perceptual act, in order to be completed, depends on the completion of our selves' perceptual act, so does the selves' perceptual act's completion depend on the completion of the analogous perception on the level above them, and so on. We have run headforward into the homunculus fallacy (where 'homunculus', i.e. little man, denotes the entity perceiving the representations in the mind's eye).
This, despite its obviousness, can actually be a very hard problem to spot, and an even harder one to get rid of. Most theories in which mental content is generated as a representation of the outside world, or a representation of some mental state, suffer from it in some form.
In order to rid ourselves of it, we will first consider the related, but simpler question of how vision works -- I distinguish vision here from perception, in the sense that the former entails no conscious awareness: vision is perception in the sense a TV camera, or a zombot, might possess it, capable of producing mechanical reactions to the object of perception, but not capable of giving rise to mental states having as their content said object.
The simplest idea of how vision might work is a sort of bottom-up process: out of the data provided by the retina, an image is built up, and each object in it is identified through some sort of pattern-matching (identification, again here of a totally non-conscious sort, being necessary to produce the proper reaction to the image's contents) -- perhaps a list of its visual properties is generated, and anything that fits this list is retrieved from some sort of memory.
This is a possible, but rather cumbersome process. 'Building up' an image in this way is going to be very demanding, computationally, and reaction to visual stimuli correspondingly will likely be rather slow. So, unless resources are of no importance -- and they pretty much always are, in nature as much as in science --, it seems unlikely that this process is what underlies vision.
In fact, nature chose a rather more elegant -- and surprisingly scientific -- scheme in order to bestow her creatures with vision.
How does science come to know the world? Through formulating hypotheses, and subjecting them to empirical testing. Those found wanting are discarded; those continually in account with the data are kept, on a provisional basis. This process ensures that knowledge can only ever grow, and ultimately converges onto a faithful picture of reality.
Vision works similarly, at least to a first approximation. An agent enters a scene with certain expectations of what he will see; these expectations are met with actual visual data, and either fulfilled, or renounced. Concretely, one may imagine a question-answering process, where the questions are formulated in such a way as to lead to quick dismissal of expectations. So, rather than going from the general to the specific, as one would normally do in a game of '20 questions', for a certain limited set of critical hypotheses, the specific will be preferred.
This has the advantage that one needs less information to decide a limited number of important hypotheses than one would ordinarily need if one followed the 'bottom-up' strategy. Going for the general to the specific, encircling the object being viewed, leads to results in an uniform way -- for all possible objects that can be identified asking a roughly equal number of questions, i.e. that can be described with a similar amount of information, it takes roughly the same amount of time to identify them.
However, in reality, certain objects that might be in the field of view are far more important than others, and hence, their presence (or absence) needs to be recognized quickly -- one would want to know whether there's a tiger lurking somewhere in the scene as quickly as possible.
So, the strategy that is being taken is that the questions that are asked of the visual data are geared towards falsifying the hypothesis that there is a tiger somewhere in the field of view -- perhaps looking for a characteristic orange-black-orange pattern, which can be done relatively easily -- and if that hypothesis can't be dismissed easily, there's at least a chance of tiger, so it is probably a good strategy to flee.
This leads to false positives -- sometimes, we see things that aren't really there. But, evolutionarily speaking, that's not a bad thing -- better to flee from a non-existing tiger, than to miss one that actually is there!
Nowadays, however, this tendency towards false positives can be quite distracting -- it causes us to see faces, which is something you'd want to recognize exceptionally quickly (chances are, if you can see a face, the face's owner can see you -- always a potentially dangerous situation), nearly everywhere. Just take this little guy: :-). Objectively, that does not look very much like a face at all -- yet we have no trouble parsing it as such. The name for this effect is pareidolia, Greek for something like 'wrong image'.
We should take away from this excursion that seeing can be described as a question-answering process, in which the questions that are asked are in part determined by the expected answers.

A Blind Spot to Help Us See
Now take the case in which your vision is occulted in part or all of the visual field, perhaps by a cataract, or some external obstruction. What you'll see is that you don't see something: there's a part of the visual field that's noticeably obscured, that doesn't produce data even though it should. There's something very noticeably missing. The reason for this is, essentially, that the questions asked of this part of our visual field go unanswered, or are answered uniformly with 'darkness'.
But, in everybody's eye, there exists a spot -- the aptly named blind spot -- in which there are no light sensitive cells, because at that point, the optical nerve punches through the retina, which is necessary because the human eye, unlike, say, the cephalopod version, is wired backasswards.
Typically, we don't notice this defect in our visual field, unless we go to some lengths, as in the test provided in the wikipedia link above (if you've never done it, go ahead! It's quite striking.).
The question is, why don't we notice this blind spot? Why is there no sense of something noticeably missing at that point?
The usual answer, and perhaps the most intuitive one, is that the brain somehow 'fills in' the missing information, 'papers over' the hole in the visual field, so as not to disrupt the enjoyment of the audience in the Cartesian theater with any ugly blemish. In our picture, this would mean that the questions asked of that particular area will receive special made-up answers. But this is actually completely unnecessary.
Moreover, it can't be the whole story: in the test above, why does the brain, doing the filling in, forget about the O, supplying just random background -- yet, confronted with a highly complex picture, such as a fractal, no part seems 'blank' or different from the rest in any way? Why does the brain fail at the seemingly much easier task?
Well, the reason is simple -- it's not that the questions asked of the blind spot are met with made-up answers; it's simply that there are no questions asked, at all. There is nothing noticeably missing because there's nobody looking for anything there. That's why our field of vision seems perfectly continuous -- no alarm gets raised by the missing of data from a certain area, because there are just no detectors that could raise this alarm. The blind spot constitutes an absence of representation, not a representation of absence, as in the case of a physical obscuration of a part of the visual fields -- which are two very different things.
This effect is not limited to the blind spot -- there is a more general phenomenon known as a scotoma, which can be caused by various forms of damage to the retina or optical nerve, that exhibits similar phenomenology. Going out on a limb, one might even speculate that the condition known as blindness denial or more prosaically as Anton-Babinsky syndrome, in which a person may be blind, while being unaware of the fact, has a similar cause: the neurological damage incurred may inhibit the asking of questions; since thus no signals of missing answers arrive, the patient judges himself, wrongly, sighted.
For a different metaphor, consider the often crazy and jumbled logic of dreams, with changing plotlines, locations, actors and circumstances, which nevertheless often seems perfectly sound to the dreamer: it's not that there are elaborate measures in place to hide the logical gaps, rather, it may just be that those parts of the brain that would ordinarily expose the flaws and point to something being amiss are asleep, or otherwise not acting according to their normal function.
In any case, it evidently may be the case that something seems to us to be a certain way, because there is nothing there to expose it being 'truly' different -- it seems to us as if our field of vision were contiguous, because no mechanisms exist to tell us otherwise. We are blind to our blind spots.

Vorstellung
Let's now turn out gaze inward, to where we imagine the things that we imagine are. The German word 'vorstellen' literally means 'putting before'; this captures the intuitive idea we have of how our imagination works: when we imagine something, or visualize it, what we imagine we imagine is the picture of this something, drawn in the mind, for 'us' to look at. But of course, this is just the Cartesian theater again, and with it, the threat of vicious regress rears its ugly head.
In fact, it is easy to see that this idea of creating an actual visualization 'in the mind's eye' holds no water: whatever we could learn from the visualization, we already must know in order to create it. Think of a computer producing a drawing on its screen. It does so for the benefit of the user. But in the case of the mind, our selves, and the mind's eye, the computer is the user. It would make no sense to equip it with a camera and have it behold the picture it itself drew -- all the data that can be gained from the picture is already present, stored in the computer's memory. It must be -- else, it could not have drawn the object!
So, why would the mind go through all this trouble to tell itself things it already knows? Why create any visualization at all?
Well, in a manner of speaking, it doesn't -- it just makes it seem as if it did. Recall how vision can be viewed as a question-answering process. So, too, can inner vision: when you visualize something, you ask yourself question about that something's appearance, which are met with the appropriate answers -- as a result, something seems to be visualized. The apparent visualization is merely the actualization of latent visual knowledge, prompted by question asking -- which in these circumstances perhaps should be called introspection. The object seems visualized in the same way the blind spot seems filled or dream logic seems consistent -- because there is nobody there to ask questions -- to actually go inside your mind and look -- and say otherwise. Knowing how it would seem if you actually visualized something is no different from actually visualizing something, at least as far as you are concerned.
Moreover, this inner viewing comes implicitly with an inner viewpoint -- there is no observation without an observer. View and viewpoint, observation and observer, imply each other, thus, through making it seem as if there were an object visualized inside the mind, the process of actualizing knowledge via introspection, via question-answering, has the side effect of making it seem as if there were something or someone beholding the visualization, or perhaps just holding it in his mind's eye.
The remarks made here about vision and visualization can be extended to other modes of perception and representation; in general, mental content is generated by answering questions about mental content, and is itself represented in the answers to these questions -- this reciprocity produces both the appearance of observing this mental content, and the observer doing this observing.

I, Zombot
We have come quite a way towards answering the question: how can subjective, experiential states emerge from non-subjective processes?
Let us review where we started: a zombot, a machine capable of emulating the behaviour of a conscious human being perfectly, was given the task of determining its own consciousness. How did it do so?
Well, whatever the process may be in detail, ultimately, if it is limited to the exchange of information, it can be modelled as a question-answering process. The zombot, thus introspecting, asking questions of itself, proclaimed itself conscious, as it had to. This, as we surmised, did not provide sufficient grounds for believing such a bold assertion.
But in the end, when you determine your own consciousness, what do you do? You introspect -- you ask questions of yourself -- and those are answered as if you were conscious. It seems to you that you are conscious, and so you claim -- and believe -- yourself to be. And similarly, it seems to be thus to the zombot -- who, on equal grounds as yours, then believes himself conscious.
But wait, have I not just tried to sneak one past you? Certainly, before something can seem a certain way to the zombot, before he can believe anything, he must be conscious -- thus, it seems he must be conscious in order to be conscious, and it seems we haven't made it past the regress after all!
And it is true that there is an element of self-referentiality here, but the circularity is not vicious. Remember how the zombot can come to know ('knowing' used here in the sense of 'having data stored in his memory') how it would be to visualize some object, through a process of introspection; this, itself, isn't consciousness. It's just an ability to answer certain questions.
But, the zombot then can repeat his introspection, with the object to be 'visualized' this time being his knowledge of how it would be to visualize a certain object -- somewhat more abstract, certainly, but it can be just as well represented in the form of answers to certain questions. Thanks to this iterated introspection, the zombot knows not only how an object looks (his 'visualization'), but he knows that he knows it (which is equivalent to actually visualizing an object, or at least, believing to do so), and possibly even knows that he knows that he knows, and so on, up to a certain point; this iterative process can be ended whenever we wish -- we can climb the ladder of self-reference upwards, rather than having to descend downwards, from infinity, as in the case of the homunculus watching the play in the Cartesian theater. This certainly mirrors our own experience -- ordinarily, we may think about something, then may have cause to think about our thoughts, and occasionally perhaps even think about our thinking about our thoughts; but rarely go things further. The explanation here is that there simply are no more levels, rather than that they somehow must get lost in the mist as the tower of regress climbs up to infinity.
The zombot thus can gain subjective states through gaining knowledge about how it would be to have subjective states -- which he then can again gain knowledge about.
Yet still, one might think that there is some difference between the real consciousness of a human being, and the fake consciousness the zombot fake-experiences. That, while the zombot can't tell himself from an actually conscious entity, an actually conscious entity can nevertheless point to some subtle difference in mental content that differentiates both cases.
And it is entirely possible that such a fundamentally subjective, irreducible quality to consciousness exists. But, even if that is the case, how could you ever tell whether or not you possess it? Anything you could point to, a zombot would equally well point to -- 'deceived' into believing he possessed it. But if the zombot can be thus deceived, how do you know you aren't, as well? Any attempt to find out loops back onto itself; at every point, you might as well be the zombot. Your knowledge of your own consciousness is just data you have access to; but the zombot has access to equivalent data, generated through his introspection.

Being Real and Seeming Real
This may seem to be a deeply unsatisfactory explanation to some; indeed, it seems as if there is no 'real' consciousness left anymore, that everything is just a clever trick with mirrors.
And in a way, that's true. Whenever one sets out to explain magic, the explanation can contain no magic in the end, or it is not an explanation at all -- but that does lead to the paradoxical consequence that the phenomenon that one has set out to explain now seems to have vanished all together, that it has been explained away rather than explained.
But even if some phenomenon is reduced to its fundamental constituents, this does not make the phenomenon any less real. On the level of cells, it makes no sense to talk about my arm -- yet clearly, this does not mean that my arm 'doesn't exist'. Emergent properties are no less real than 'fundamental' ones, they are just answers to a different set of questions, that it would make no sense to ask on the lower level.
So, too, is it the case with consciousness -- but in the mind, things get an extra twist. The reason for this is that there is no objective fact to what's real about subjective experiences -- so whatever seems real about them, is, or at least, can be taken to be. Consider a migraine: is there a difference between the case where you have a migraine, and the case where it merely seems to you as if you had a migraine? The two are identical: in both cases, you have a splitting headache, no less real in the second than in the first. Thus, there is no real meaning to 'seeming conscious': if to you, you seem conscious, then you are conscious -- both yield an identical phenomenology. Thus, by making it seem as if there were consciousness, consciousness can be created.
In the end, we're all zombots -- just suffering from unconsciousness denial. However, unlike the case of seeing, where the appearance or the belief of possessing sight still leaves you blind, the appearance or the belief of possessing consciousness suffices to establish consciousness.

The Universal Universe, Part III: An Answer to Wigner

2011-07-17T02:37:00.000-07:00

Eugene Wigner was a Hungarian American physicist and mathematician, who played a pivotal role in recognizing and cementing the role of symmetries in quantum physics. However, this is not the role in which we meet him today.
Rather, I want to talk about an essay he wrote, probably his most well-known and influential work outside of more technical physics publications. The essay bears the title The Unreasonable Effectiveness of Mathematics in the Natural Sciences [1], and it is a brilliant musing on the way we come to understand, model, and even predict the behaviour of physical systems using the language of mathematics, and the fundamental mystery that lies in the (apparently) singular appropriateness of that language.
Wigner's wonder is two-pronged: one, mathematics developed in one particular context often turns out to have applications in conceptually far removed areas -- he provides the example of π (or τ if you're hip), the ratio of a circle's circumference to its diameter, popping up in unexpected places, such as a statistical analysis of population trends, which seems to have little to do with circles; two, given that there is this odd 'popping up' of concepts originally alien to a certain context in the theory supposedly explaining that very context, how can we know that there is not some other, equally valid (i.e. equally powerful as an explanation) theory, making use of completely different concepts?
In a way, we find ourselves wondering, again, why our language, our mathematics, should be any more suitable as yielding an explanation of the world around us in terms of physical theories than a dog's bark, very much analogous to the question of why our minds should be any more capable of understanding the universe than a dog's mind is of grasping advanced calculus. The answer, I believe, will turn out to be analogous as well; but first, we need to think a bit about what, exactly, we mean by things like explanation, mathematics, or physical theory. We'll start out by considering:

The Perfectly Reasonable Effectiveness of Computers in Simulations
Nobody, to the best of my knowledge, has ever marvelled at the singular appropriateness of computers for simulations of physical systems. So let me be the first: why is it that, no matter what system you consider, it seems so eminently possible to construct a virtual likeness of it on an appropriately instructed computer?
From galaxy formation to the weird world of quantum mechanics, few if any areas seem in principle unamenable to computer simulation. And given the criteria in Wigner's article, the wonder here should be even greater! The toolbox of mathematics is vast, comprising hosts of diverse fields, all with their own concepts, symbols, rules, etc. Compared to this, computers are incredibly limited, being ultimately forced to do all they can do by manipulating 1s and 0s according to very few rules.
Nevertheless, it does not seem surprising that a computer is capable of closely matching observed reality through simulation -- and it should not. To anybody who has read the last two posts in this series, the answer is probably clear: computational universality is what bestows upon a computer its magical power to effectively 'behave like' any other system (provided that system is not more than computationally universal itself). So if our universe is computable, then of course computers should be capable of simulating every part of it.
Now, what does this mean with respect to Wigner's essay? Well, if mathematics were similarly computationally universal, then at least part of the problem -- why mathematics should be capable of describing the natural world -- would be solved. But now, remember that Turing machines were thought up to emulate mathematics -- to automate the processes by which mathematicians do whatever mathematicians do (which entails a convenient definition of mathematics as 'what mathematicians do').
The converse is similarly possible: you can embed the functioning of a Turing machine within mathematics. To do so, one makes use of a trick known as Gödel numbering: recall that a Turing machine has a finite set of symbols, its alphabet. Now, one simply associates a number to each of those symbols -- easy enough to do. Any string of symbols can then be represented as a number formed from the concatenation of all the numbers for each symbol; especially, the string of symbols on the tape of a Turing machine when it starts operating -- its program -- can be thus represented.
The functioning of the Turing machine now consists of manipulating the symbols it reads in a certain way, according to its rules. These rules can be translated into algebraic ones: each rule takes a string of symbols, and returns a different one; each algebraic rule then takes a number, standing for that particular string of symbols, and calculates a new one, standing under the same correspondence, the same code, for the string of symbols the Turing machine returns. Thus, the operation of a Turing machine on a string of symbols can be mapped to algebraic manipulations of numbers -- we can embed, or emulate, any given Turing machine within number theory. This also proves an assertion I made in the previous post, that there are only as many Turing machines as there are natural numbers: a Turing machine's alphabet can be encoded in a finite natural number, and its rules can be, too -- thus, there exists a natural number associated to every Turing machine. Since there are infinitely many finite alphabets and rules, every natural number is associated to a Turing machine.
Thus, as we should have expected, mathematics is (computationally) universal -- that it can be used to emulate any physical system then is not so surprising after all. This answers Wigner's first problem: mathematics developed in some context (say, number theory) can, due to its universality, emulate completely different systems (say, a Turing machine simulating galaxy formation).
In attempting to answer Wigner's second problem, though, we are going to hit a snag: if we consider the working of some computationally universal system, say, a Turing machine or a computer program, as an explanation for the behaviour of some physical system, then we are forced to conclude that this explanation is not unique: by universality, there exist many inequivalent Turing machines or computer programs yielding the same phenomenology. This is nothing else but the statement that one and the same problem can be solved by many different computer programs -- i.e. there is not just one unique program yielding the animations linked above.
Yet, Wigner asserts that one of the principal mysteries of mathematics' applicability to natural science is that it apparently yields unique explanations! In fact, we are faced with an even worse problem: if we accept a 'computational ontology', then what form should it take? What fundamentals 'really' exist? What language is the program that computes the universe written in?

The Myth of Ontic Definiteness
Picture a scientist trapped in the Matrix, i.e. confined to a simulated universe. His task, being dedicated to uncovering ultimate capital-T Truths about the world he finds himself in, is to uncover the fundamental laws that describe his universe.
Unfortunately for him, in this task he is doomed to fail, as we now know. There is no set of laws that could be pointed to as 'the' fundamental ones, at least none that he could discover. The reason for this is simply that there are many inequivalent programs, many inequivalent computational systems, that yield the same output -- his world, and himself in it. There is no way for him to find out which is the one that is 'actually' being run, or what the supporting hardware looks like and how it works. These levels are simply closed off to him, and any supposed fundamental entity he comes up with whose behaviour he postulates explains the appearance of his world is likely to be fiction -- even though it may be in perfect accord with all the observations he makes. There is always a host of different entities yielding the same observations, and thus, no way to choose between them.
It's computational universality that does him in -- since every universal system can emulate the behaviour of every other (at most) universal system, there is simply no way for him to tell which system actually comprises the foundations of his world. It could be some Turing machine, but it could also be a cellular automaton emulating that Turing machine, or a gadget going through number-theoretical derivations, or something else entirely. To him, there is no fundamental ontology.
It is for this reason that David Deutsch rejects the possibility of the world being 'merely a program running on a gigantic computer' [2], because, according to him,

It entails giving up on explanation in science. It is in the very nature of
computational universality that if we and our world were composed of software, we
should have no means of understanding the real physics – the physics underlying
the hardware of the Great Simulator itself.

This leaves us in a bit of a pickle! As I have argued in the previous post, an explanation in terms of 'more than' computable means amounts to no explanation at all, as it could never be checked; now, apparently, computable explanations suffer the same fate (and although Deutsch doesn't acknowledge it explicitly, it is more than just the literal picture of the universe as a giant computer that runs into these troubles -- a computer is just a particular kind of universal system, but the reasoning applies generically).
So, what are we left with?

The Benefit of Multiple Explanations
Perhaps the problem lies not with the possibility of finding explanations, but rather, with our expectation of what explanations to find. Typically, when we are faced with some phenomenon, we expect that there is one unique and true explanation in terms of 'what really happens', which enables us to gain an understanding of that phenomenon: the Sun rises because the Earth rotates. This is right, everything else is wrong.
But... is that actually ever what we get?
To answer this, we must first consider what, exactly, we mean by an explanation. Typically, when we are being related an explanation for the behaviour of a certain system, at some point, we experience a moment of understanding -- suddenly, we know how it works. Well... how does that work?
One possible explanation would be that we assemble in our mind a structure whose behaviour matches the behaviour of the system in question -- think of it as a model: we are being told that the Sun rises (and sets) because of the Earth's rotation, so we imagine a rotating globe relative to a fixed source of illumination -- and all becomes clear, we know how it works. In general, however, that model may be much more abstract; the salient feature is that we generate an internal representation from which the behaviour of a system can be abstracted, such that, for instance, we could reliably predict the system's behaviour even in situations in which we have no direct experiential knowledge of its behaviour.
But now consider the following situation: I am teaching you chess. At some point, I'll come to the knight's movement. I could say something like: "It moves two squares vertically and one square horizontally, or two squares horizontally and one square vertically." This would surely suffice as an explanation of how the knight moves. However, I could equally well say: "It moves one square diagonally and one square straight, either horizontally or vertically." Or: "It moves in the shape of an 'L'." Or, somewhat more extravagantly: "It moves three squares to the right, and then either one or two squares diagonally, either upwards or downwards to the left; or, it moves three square to the left, and then either one or two squares diagonally, either upwards or downwards to the right."
All would get the idea across equally well -- in all cases, the knight ends up on the same possible squares; however, they differ with regards to what the knight actually does.
Or let's say I believe in a more hands-on learning approach, and just move the knight -- perhaps while you have your back turned. In your mind, you could now construct a set of distinct, yet equivalent models in order to explain the knight's behaviour, including, but not limited to, the ones I provided above. You could also come to completely different explanations, in entirely different terms; for instance, you could simply recognize and memorize the following pattern:

Now, which of these explanations is 'the right one'? How would one single any out? I don't think it's possible, or indeed useful, to do so. They're all right; some might sit better with you than others, but ultimately, regarding the salient features -- the positions the knight may validly end up in -- they're all equivalent. There is simply no fact of the matter regarding what way the knight actually takes.
(A word of caution, though: some people might be tempted to invoke Occam's razor here to arrive at a unique explanation that is 'the simplest one' in a particular way -- certainly, the more long-winded rules have some disadvantages compared to the short and crisp ones, but it's at best ambiguous to decide whether or not the picture is 'simpler' than the rules written down. However, strictly speaking, no additional 'explanatory entities' in the razor's sense are postulated in any of the rules. Occam's razor has its valid application in ensuring the predictivity of hypotheses: whenever an explanation is proposed that 'adds' overhead machinery to the simplest explanation necessary, then that explanation is to be discarded, as otherwise there is no unique pick of theory to be falsified possible. So, for instance, if I taught you chess on a Thursday, many inequivalent theories would be equally well in accord with your observations: for instance that the knight moves like an L, or that the knight moves like an L on Thursdays, but just diagonally on Fridays. The latter theory is what the razor is made to shave off.)
Thus, we see that explanations need not necessarily be unique in order to be valid -- all of the presented, distinct 'theories' allow us to construct an equally good model of the knight's movements.
Indeed, viewed from another angle, the notion of an unique explanation begins to look downright suspect: if there is an explanation for some system's behaviour in terms of 'more fundamental' entities, then either those fundamental entities demand an explanation themselves, or the chain of explanations terminates, leaving some basic layer unexplained. Either one never gets a 'final' explanation, or an explanation is ultimately left open. Faced with the equations that describe the final layer of our explanatory cake, we are left with the question, as Hawking famously mused: "What breathes fire into the equations?"
If one posits some entity as ontologically fundamental, one immediately may ask the question: why that particular entity? Why not any other?
Attempts have been made to answer, or at least ameliorate, these questions: it has been proposed that consistency is what selects the true fundamental theory -- but if one theory is consistent, there exists another, just as consistent, able to give rise to the same phenomenology, while built on totally different fundamentals. Or, it has been proposed that all such structures exist, and that the selection of which one we exist in is anthropic: we live in this universe, because it is capable of supporting our existence. But this, too, does not suffice to argue for a unique fundamental structure: a (computationally universal) cellular automaton could give rise to the same universe as a Turing machine, or any other computational paradigm.
Conversely, in a computational universe, there is no more an ontologically fundamental entity than there is a way the knight actually moves.
A little excursion is in order here. The notion of multiple explanations is nothing alien to physics. In some theories, it manifest itself under the guise of gauge invariance: there exists a field, known as the gauge field, which yields the same physics -- the same observations -- in different forms; i.e. two completely inequivalent gauge fields may lead to the same phenomenology. Electromagnetism is the prototype theory of this kind: different choices of electromagnetic potentials may yield the same magnetic and electric fields; and since it is those latter fields that are actually observed, these manifestly different potentials have the same physical content.
In another way, explanatory heterogeneity, to coin a term, is manifest in general relativity in the form of background independence: different choices of reference frames, as long as they are connected by smooth coordinate transformations, yield the same physics. This is viewed as a good thing -- after all, why should there be a preferred point of view singling out the 'proper physics'?
But then, why should there be any preferred mathematical structure? And if there can be such a preferred mathematical structure, then why not a preferred frame of reference -- which, after all, is just a mathematical structure --, as well?

The Right Way to be Wrong
So, multiple explanations may not be a bad thing; indeed, they may even turn out to be fortunate. But how does that help the scientist trapped in the Matrix? He does not have merely a couple of explanations to choose from; rather, any possible universal system in principle suffices as an explanation. A true embarrassment of riches!
The key to helping him out is, I believe, to answer the question of how, given all of the above, there can be wrong explanations. Of course, one possibility is simply for them to be analogous to 'buggy' programs: within the chosen framework, they just don't do what they're supposed to do; this is, in a sense, the trivial version of 'wrongness', and perhaps most wrong theories are of this kind.
But consider the following: a computer runs a simulation of the solar system, which is used to predict the occurrence of a lunar eclipse. Now, the computer does not know anything about gravity -- it is merely a universal system, emulating the behaviour of another system. At its bottom level, it is nothing but a succession of states, mapped to the succession of states of the solar system at certain points in time. But this set of states is encoded in electromagnetic field configurations, and everything that happens inside the computer as it goes through its computation is dictated by the laws of electrodynamics. There thus exists a mapping between the states of the electromagnetic field that make up the computer, and the (logical) states of the computation itself.
But then, there also exists a mapping between the states of the electromagnetic field, and the states of the solar system -- which means that I could have just as well solved Maxwell's equations (the equations describing the evolution of electromagnetic fields) and arrived at a prediction for the lunar eclipse! (A shorter way to say this would have been: Maxwellian electrodynamics is universal.)
Nevertheless, as a theory of our solar system, beautiful thing though it might be, Maxwell's electrodynamics is simply wrong. The artifice that would have to be heaped onto the theory in order to make it resemble the evolution of our solar system is just too staggering -- nobody would ever consider using it in this way; nobody could ever use it this way, as the calculations would just be too difficult and gigantic to ever carry it out in practice. That does not take anything away from the fact that in principle, it is possible to accomplish this feat; but it is not in principle, but in practice, that we create and use our theories.
Thus, the 'wrongness' and 'rightness' of a theory is derived from its applicability; a theory that is not in practice applicable as an explanation of some system's behaviour is the wrong explanation.
An immediate objection to this picture might be that there manifestly are computations with different results, programs with different outputs, systems with different behaviour. And that's true, but an explanation, in terms other than those of the system that is to be explained, is always a mapping, an analogy from the behaviour of another system to the behaviour of the to-be-explained system; and between universal systems, such a mapping always exists -- so those different results, outputs, and behaviours can be mapped onto one another, provided all the systems in question are universal.
This, then, finally allows us to answer Wigner's second problem: there is not in general a unique theory, a unique mathematical edifice that allows us to formulate an explanation of some phenomenon; but, the somewhat narrow scope of practical applicability serves to single out one, or at best, some small set of closely related ones (which are then often thought of as 'dual' formulations of one another).
There is thus an anthropic selection after all: but it operates on the end user level, rather than on the fundamentals. Universal systems can emulate other universal systems with varying efficiency; systems, i.e theories, that are 'close' to our way of thinking will thus be more readily considered than very distinct ones, even though those we can in principle 'understand' just as well. Other intelligences, able to emulate different systems with greater ease, might consider completely different theories reasonable, and find ours nigh incomprehensible.

Heteroontology
There is an interesting consequence regarding the question of ontology, i.e. the question of what kinds of things can ultimately be said to exist, and what bearing physical theory has on this question. Naively, one might expect that whatever entities our most successful theories require to work ought to be regarded as 'really existing'. This is a version of W. V. O. Quine's so-called indispensability argument, though this refers generally to the existence of indispensable mathematical entities, and constitutes in this form an argument for mathematical realism.
But, given the arguments provided here, we should expect for the following situation to arise: for a given system to be explained by physical theory, two or more theories exist of equal explanatory power, yet making reference to partially or completely distinct entities. Indeed, such a situation exists in the so-called AdS/CFT correspondence, a realization of the holographic principle in which a gravitational theory is dual to, i.e. yields the same physics as, a quantum field theory in a space of lesser dimensionality. Even though both theories have the same explanatory power, they disagree on something as fundamental as the number of spacetime dimensions! So which one is 'right'?
The conservative argument here would be perhaps to consider the superfluous dimensions of the gravitational theory as 'dispensable', and thus, regard them as 'not real'. This works in cases where the disagreement is only limited; but in principle at least, it is possible to find theories -- though they may be contrived as the electromagnetic theory of the solar system above -- promulgating a completely distinct ontology. Which one would be right, there?
I don't think this question has a definite answer -- and moreover, I think that's a good thing. For every ontology we might settle on as the 'true' one, immediately throws up the question: whence this ontology?
Only in becoming independent from this question -- in becoming truly background-independent -- can any answer that by a reasonable measure may be considered final be reached.

References:
[1] Wigner, E. P. (1960). "The unreasonable effectiveness of mathematics in the natural sciences. Richard courant lecture in mathematical sciences delivered at New York University, May 11, 1959". Communications on Pure and Applied Mathematics 13: 1–14. doi:10.1002/cpa.3160130102 (online text)
[2] Deutsch, David (2004) "It from Qubit", in Science and Ultimate Reality, ed. Barrow, J. D., Davies, P. C. W., Harper, C. J. (2004), 90-102 (pdf link)

The Universal Universe, Part II: ...but is it?

2011-07-12T10:39:00.000-07:00

I have ended the previous post with the encouraging observation that if the universe is computable, then it should be in principle possible for human minds to understand it -- the reasoning essentially being that each universal system can emulate any other. But the question now presents itself: is the universe actually computable?
At first sight, there does not seem any necessity for it to be -- after all, computation and computational universality may be nothing but human-derived concepts, without importance for the universe as it 'really is'. However, we know that it must be at least computationally universal, as universal systems can indeed be build (you're sitting in front of one right now) -- the universe can 'emulate' universal systems, and thus, must be universal itself (here, I am using the term universal in the somewhat loose sense of 'able to perform every calculation that can be performed by a universal Turing machine, if given access to unlimited resources'). Thus, the only possibility would be that the universe might be more than universal, i.e. that the notion of computation does not suffice to exhaust its phenomenology.
And indeed, it is probably the more widespread notion at present that the universe contains entities that do not fall within the realm of the computable. The discussion is sometimes framed (a little naively, in my opinion), as the FQXi did recently in its annual essay contest, in the form of the question: "Is reality digital or analog?"

What's most commonly at stake here is the nature of spacetime -- an 'analog' continuum versus a 'digital' discretum, like a point set or a lattice.
What does this have to do with the question of computability? Well, the problem is essentially that, although there are infinitely many, there are not enough Turing machines to compute every point of a continuum -- such that if a particle moves through such a continuum, only a small fraction of the time will we be able to give its position a numerical value using computable means.
This has to do with a result we will get to again later, known as the diagonal lemma. The diagonal lemma is a mathematical result due to Georg Cantor, which can be used to show that there are 'more' real numbers than there are natural numbers, even though there are infinitely many of the latter. It's a simple enough proof to give here; all one really has to know is how to count, and even that I'm going to review briefly.
If we have two sets of things, say apples and pears, we know that there are equally many of each if we can pair them with one another without having any left, one apple to one pear. If in the end, we have unpaired apples left, there are more apples; if we have surplus pears, those are more numerous.
For finite sets, this is well in line with out ordinary notion of sizes or amounts; however, when it comes to the infinite, certain apparent oddities turn up. One is that one can use this technique to show that there are as many even numbers as there are natural numbers in total, even though the latter comprise all the even numbers, and the odd ones! To see this, we only need to pair each natural number with an even number, which we can do like so: (1, 2), (2, 4), (3, 6), (4, 8),...(n, 2n),...
Every natural number gets an even-number partner. The two sets are said to be of equal cardinality. One might say, 'well, that's actually fine: the set of even numbers is infinite, and since infinite is is the biggest thing there is, adding to it can't make it bigger; thus, adding the odd numbers must yield another infinite set'. And in a sense, that's right; however, thinking about infinity always comes with a snag, so in another sense, it's not really right at all.
That snag shows itself when contemplating the real numbers, or reals for short. If infinite is the biggest thing there is, and there are infinitely many natural numbers, then we should be able to find a natural number to pair with every real, right?
Well, let's try -- and let's even simplify things a little by confining us to reals between 0 and 1. The ordering does not matter much, so we can just randomly associate numbers: (1, 0.634522345...), (2, 0.323424129...), (3, 0.973247638...), and everything seems to go smoothly. However, now think of the following number, which we'll call d for diagonal -- it's constructed the following way: take the first digit of the first number, and lower it by one, and make it the first digit of d ('roll over' to 9 if you're already at 0); then take the second digit of the second number, and lower it by one, and make it the second digit of d; then take the third digit of the third number...
Now, the number thus constructed is obviously different from the first number, since it is different in the first digit; it is obviously different from the second number, since it is different in the second digit; it is different from the third number, since it is different in the third digit -- generally, it is different from the kth number, since it differs in the kth digit. But that means that it is different from all the numbers we have yet paired with a natural number -- it is 'left over'!
Of course, there is an easy fix for this: pair it with a natural number. There's infinitely many of 'em, so it's not like we're gonna run out! However... then, we can just play the game again, constructing a new number d' that has no natural partner. And in fact, there's no end to this -- we always can construct a new diagonal, yielding a real number without partner in the naturals.
The inescapable conclusion is then that even though there are infinitely many natural numbers, there are more real numbers. One often says the natural numbers (or any set of equal cardinality) is countable (or countably infinite), whereas the reals are uncountable (or uncountably infinite).
Now let's get back to computability. It turns out that you can associate a natural number to each Turing machine that identifies it uniquely -- i.e. there are countably many Turing machines. But this means that there are only countably many numbers that can be computed -- meaning that some real numbers are non-computable. Thus, if spacetime is continuous, which just means that there are as many points as there are real numbers, there may be matters of fact that no computer could ever tell us -- for instance, if two particles are separated in space by a distance whose magnitude, given a certain, fixed unit of length, is equal to a non-computable real, no computer could ever produce this result.
This seems like a dire result for computability: most theories of physics, including General Relativity and Quantum Mechanics/Quantum Field Theory, are build on the continuum, and, if evidence for a physical theory is considered evidence for the existence of the mathematical entities it presupposes (which is a strong, but not unanimously held position -- Hartry Field in his book Science Without Numbers has advanced the view known as (mathematical) fictionalism, that mathematical entities, in order to yield valid physical theories, may be regarded as useful fictions rather than 'real things', which can in principle be done away with), their strong experimental support would seem to crush the hopes of the computationalist.
Be that as it may, we know that these theories can't be the full picture -- in certain regimes, unfortunately experimentally inaccessible at the present moment, they become nonsensical, or contradict one another. It is thus in those areas that one might hope for a departure from continuous structures -- and indeed, promising theoretical evidence from several research programmes exist, the strongest perhaps coming from the holographic principle (discussed briefly in this previous post): it suggests that there is an upper limit to the entropy within a given part of spacetime, proportional to the area of the surface of its volume. But the usual interpretation of entropy is as a measure of microstates of a system; if it is finite, so must the number of microstates be, and the finite is always computable.
Nevertheless, it is important to spend a moment or two discussing what, exactly, a non-computable universe might mean with regards to the prospect of ever gaining a working understanding of it.
At first, it might seem that we need not be too bothered by the prospect of non-computability; indeed, there might be cause to cheer for it, for the ability to harness this non-computability in principle would enable us to build so-called hypercomputers.
Hypercomputers differ from Turing machines and other universal systems in their capacity for computation: problems unsolvable by Turing machines may be solved with the aid of a hypercomputer. Equivalently, a hypercomputer is a system whose behaviour can't be emulated by any universal system.
This looks like a very welcome thing on first brush: problems unsolvable by Turing equivalent computation suddenly become manageable. One particular such problem is known as the halting problem. Turing machines, depending on their input, either halt, and produce an output -- or never halt, but continue running forever (they may get 'stuck in an infinite loop'). The halting problem then is the question if, given a certain input, a certain Turing machine will ever halt and produce an output. No ordinary Turing machine can solve this problem (I plan to talk about the reason for this in a future post -- it is actually quite similar to Cantor's proof of the uncountability of the reals above).
However, a hypercomputer can! And while this may seem somewhat academic, there are actually a lot of nice things you can do once you can solve the halting problem -- for instance, you might wonder whether there exists a number with a certain property, but can't think of any nice way to prove that there is or isn't one. Now, you might set up a computer to successively test numbers for this property -- and sure enough, if there is one, it will eventually halt, and output that number. The problem is: how long do you wait? For any given waiting time, the computer will only have explored a finite number of possibilities, meaning that there might yet be an answer ahead; or, it will continue happily chugging along forever.
But, with the power of hypercomputation, you can just set up the program, ask the hypercomputer to determine whether or not it will halt, and presto -- if it does, there is a number having the property in question; if it doesn't, there is none. This last answer is impossible to get without the hypercomputer.
However, a new question arises: how do you know the hypercomputer is right?
For an ordinary computer, it's easy: if need be, you can always break out pen and paper, and check for yourself, at least in principle. However, all that you can accomplish this way is equivalent to what a Turing machine can accomplish -- thus, you can't check the hypercomputer's result. Of course, there are other ways of indirectly convincing yourself of the hypercomputer's accuracy -- consistency of results, possibly with other hypercomputers, knowledge of its design and operating principles, etc. Yet, these can't absolutely cement your faith in the hypercomputer.
Worse yet, there does not even seem to be a way to convince yourself that some device even is a hypercomputer -- if I hand you a black box, as long as it has more computing power -- more computational resources -- than you have access to, it can fool you into thinking it was a hypercomputer. For instance, it can fake being able to solve the halting problem, and you won't be able to catch it out: if it can run any program for significantly more steps than you can, if the program halts, and you are able to eventually discover that, then it will be able to do so, as well; if the program either doesn't halt, or halts after a time greater than you can run the program for, it can pronounce the program to be non-halting (or halting), and you won't be able to prove otherwise.
These problems extend to describing physical reality in hypercomputational (or non-computational) terms. Everything we actually think about, or at any rate, everything we ever write down, is certainly computable, as is evidenced by our ability to write it down. Everything we measure, we measure to finite accuracy -- and the finite is always computable. So, even though our current theories make reference to non-computable entities, i.e. the continuum of real numbers in particular, everything we do with them is actually perfectly computable -- as is evidenced by the success of computer models in modern theoretical physics. There is thus no need for the existence of non-computable entities that we could ever become aware of this way -- if there were, then we would be faced with a theory whose consequences we could not compute; the computation might, for instance, depend on being able to solve the halting problem. Of course, such a theory, lacking predictivity, could never be checked.
So, then, what does this mean -- is reality digital, or analog? Well, as I hinted at earlier, I do not necessarily think this is a good way to pose the question. It recalls older questions, such as 'is light made of particles or waves?', and 'is matter discrete, or continuous?', both of which have in the light of modern theories been shown to be false dichotomies.
And indeed, information-theoretic considerations seem to point a way towards unmasking the digital/analog distinction as similarly apparent: a bandlimited signal -- a signal which contains no frequencies above a certain threshold -- can be exactly reconstructed from a sampling at finite intervals (where the sampling interval is anti-proportional to maximum frequency present in the spectrum), i.e. it suffices to know the signal at a discrete set of points to interpolate the signal over the full continuum. This is the essential content of the Nyquist-Shannon sampling theorem. Both the continuous and the discretized, sampled signal thus contain the same information (which is a nice factoid to know if one is confronted with an audiophile harping on about the supposed superiority of analog signals over digital ones).
This has a bit of a catch in the need for bandlimitation -- physically, this equates to a so-called ultraviolet (UV) cutoff: basically, a highest possible energy, or frequency. Whether or not there is such a thing is unknown -- though basic old General Relativity seems to strongly hint at it: concentrate too much energy in too small a space, and you create a microscopic black hole, making it impossible to probe spacetime beyond a certain scale.
So, in this sense, whether the universe is digital or analog is not the question; more pertinently, we should think about whether it is computable or not -- and here, I think, most lines of evidence point towards a computable universe.

The Universal Universe, Part I: Turing machines

2011-07-08T09:53:00.000-07:00

In the mid-1930s, English mathematician Alan Turing concerned himself with the question: Can mathematical reasoning be subsumed by a mechanical process? In other words, is it possible to build a device which, if any human mathematician can carry out some computation, can carry out that computation as well?
To this end, he proposed the concept of the automated or a-machine, now known more widely as the Turing machine. A Turing machine is an abstract device consisting of an infinite tape partitioned into distinct cells, and a read/write head. On the tape, certain symbols may be stored, which the head may read, erase, or write. The last symbol read is called the scanned symbol; it determines (at least partially) the machine's behaviour. The tape can be moved back and forth through the machine.
It is at first not obvious that any interesting mathematics at all can be carried out by such a simplistic device. However, it can be shown that at least all mathematics that can be carried out using symbol manipulation can be carried out by a Turing machine. Here, by 'symbol manipulation' I mean roughly the following: a mathematical problem is presented as some string of symbols, like (a + b)². Now, one can invoke certain rules to act on these symbols, transform the string into a new one; it is important to realize that the meaning of the symbols does not play any role at all.
One rule, invoked by the superscript 2, might be that anything that is 'under' it can be rewritten as follows: x² = x·x, where the symbol '=' just means 'can be replaced by'; another rule says that anything within brackets is to be regarded as a single entity. This allows to rewrite the original string in the form (a + b)·(a + b), proving the identity (a + b)² = (a + b)·(a + b).
You can see where this goes: a new rule pertaining to products of brackets -- or, on the symbol level, to things written within '(' and ')', separated by '·' -- comes into effect, allowing a re-write to a·a + b·a + a·b + b·b, then rules saying that 'x·y = y·x', 'x + x = 2·x', and the first rule (x² = x·x) applied in reverse allow to rewrite to a² + 2·a·b + b², proving finally the identity (a + b)² = a² + 2·a·b + b², known far and wide as the first binomial formula.

This provides a glimpse of how 'mathematical reasoning' can be carried out by a Turing machine: it reads the symbols on its tape; those symbols determine the rules to apply; those rules prompt the rewriting of the string of symbols into a new one; these new symbols again determine manipulations to be carried out on them; and so on, up to the point where either some predetermined target is hit, or no further manipulations can be carried out.
The next important realisation is that every calculation or derivation you can write down uniquely relies on such manipulations; otherwise, the preceding line would not imply the next one unambiguously, and you would have to make a choice of which 'path' to take, which leads to multiple possible results, each of which as justified as each other. This would be rather troublesome in mathematics -- typically, one would like one result to be the 'right' one, rather than being presented with a 'choose your own derivation' playbook...
At this point, I should perhaps fix some notation, in order to minimize possible ambiguities. I will abbreviate a string of symbols with the symbol x; in the example above, x = (a + b)².
The action of a Turing machine on some string of symbols is written as f(x), so in the above case, f(x) = f((a + b)²) = (a + b)·(a + b), for instance. If this is confusing to you, because you're used to f(x) standing for a function taking a number to another number (such as 'f(x) = x²' takes 2 to 4, 3 to 9 etc.), just consider that we can set up a code taking each string of symbols to a string of 0s and 1s -- since again, the question answering game can be used to uniquely determine the symbols --, which is effectively nothing but a binary number, i.e. a number expressed in base 2 rather than the usual ('decimal') expression in base 10; so, on this view, f(x) does exactly what you're accustomed to.
Repeated action of a Turing machine on a string of symbols is again a Turing machine action: f(f(x)) = f(f((a + b)²)) = f((a + b)·(a + b)) = a² + 2·a·b + b² = g(x), which effectively just means that you can 'skip steps', and invent a single rewrite rule equivalent to the consecutive action of multiple rewrite rules (for instance, having shown using the application of multiple rewrite rules that (a + b)² = a² + 2·a·b + b², we could now include this as a rewrite rule directly).
Though far from a formal proof, I hope this at least makes plausible the notion that a Turing machine with a sufficiently rich set of symbols it accepts and rewrites it can perform indeed is capable of carrying out any computation that can be carried out at all. This notion is also known under the name Church-Turing thesis.
Now imagine you had such a 'universal' Turing machine. It has an alphabet of symbols it accepts, and a set of rules governing its behaviour. But, as we have already seen, the set of rules can be modified: certain rules may be 'chunked' together to give new rules; other rules might be broken apart further into sub-rules; it might even be the case that certain rules can be entirely replaced by different ones. Also, of course, the choice of alphabet is arbitrary: setting up a proper code, each string of symbols can be mapped to a different representation -- perhaps a binary one using 0s and 1s, or a decimal one using numbers, or anything else.
Thus, we can build another Turing machine, which can compute everything that the first one can compute, and hence, can also compute everything that is computable at all.
But if both can compute everything computable, that means they must always agree in their computations -- for every computation the first machine ('T₁') carries out, there must be a corresponding one on the second ('T₂'); so if T₁ maps a string x to another string y = T₁(x), there must be a corresponding string w that T₂ maps to an again corresponding string z = T₂(w). Here, corresponding means that w and x, and z and y are related as the two sides of a code: knowing one and performing some operations on it yields the other.
But these operations are again only of the symbol-manipulating kind, and symbol manipulation is what Turing machines do! So, there exists a Turing machine that takes w to x, and z to y, and vice versa: t(x) = w, t(y) = z. The output and input of T₁ and T₂ can be translated into one another -- which only makes sense, as there must be some way to check whether or not both actually do compute all the same things.
Putting it all together, we then get: T₁(x) = y, T₂(w) = T₂(t(x)) = z = t(y). Viewed through the 'lens' of the translator t, T₂ thus acts on x the same way T₁ does; it is thus able to fully emulate the action of T₁. This also applies the other way around, and more generally: any universal Turing machine is able to emulate any other. There is no Turing machine that is able to do anything any generic universal one can't do, which is only sensible, since all any Turing machine can do is compute, and a universal Turing machine can compute anything computable.
But, just how 'married' are we to the construction of 'infinite tape and read/write head' characteristic for Turing machines? There are certainly other ways of embodying the concept of symbol-manipulating computations -- one is for instance 'you, a pencil, and a stack of paper', which after all is the system Turing set out to imitate.
Indeed, people have come up with all sorts of clever schemes -- notably, Church's λ-calculus, register machines, μ-recursive functions, etc. And indeed, all of these can be shown to be equivalent with regards to their computational capacity: i.e. all of them can compute exactly the same things. In turn, they can all be emulated on Turing machines, and are able to emulate Turing machines themselves.
This is embodied in the notion of computational universality: every computationally universal system is capable of emulating every other (at most) computationally universal system.
The parenthetical 'at most' here references two things: first, there are systems 'less' than computationally universal, which trivially can be emulated by computationally universal systems as well; and second, it is at least conceivable that there may be systems that are 'more' than computationally universal. For instance, there exist problems that no computationally universal system can solve (about which, much more later), and one may postulate the existence of a system able to find a solution; such a system is typically termed a hypercomputer.
In this sense, there is no real difference between different kinds of realisations of computational universality, so I will simply use the term universal system as a catch-all in the future, or occasionally speak of Turing machines as a kind of pars pro toto for the whole class of universal systems.
A caveat ought to be attached here: in the definition of Turing machines, it was stated that the length of the tape had to be infinite; in general, any given computation may be arbitrary complex, and thus, need arbitrarily great resources. Of course, in the real world, all resources we can muster are necessarily finite. This means that no true Turing machine can ever be built; at best, they exist as a limiting concept: whenever memory threatens to become scarce, more may be added, using (eventually) all of the resources available in the universe; the question of the finiteness of resources then becomes the question of the finiteness of the universe.
Nevertheless, automata that 'could be' Turing complete (which is a shorter way of saying 'equivalent to a universal Turing machine') if given sufficient memory can be built -- you're sitting at one now, though its architecture (based on work by the Hungarian-American mathematician John von Neumann) is rather different from that of a Turing machine.
Computational universality is the reason that it doesn't matter whether or not you read this on an Apple- or Windows device, or on something else entirely: all of them, being able to emulate one another, yield essentially the same results (or at least, are capable of doing so -- the implementation can in practice be a bit crude occasionally).
Universality also accounts for the possibility of creating virtual machines, simulations of one computer on another. The most common example of this is given by the Java environment. Basically, one might imagine a 'Java machine', a Turing machine that takes inputs and produces outputs according to rules set by the Java programming language, and a 'translator', which tells the machine you use how to interpret inputs in such a way as the Java machine would do, and produce the appropriate outputs. Essentially, this creates a hierarchy-like structure: at bottom, your Windows-, OS X-, Linux- or whatever-system is happily chugging along doing its own idiosyncratic thing, while the translator accounts for the simulation of the same Java machine on a higher level, uniformly on all underlying platforms. (In general, though, the hierarchy is far more complicated than that: at the bottom lies the machine code, which is, essentially, your computer's 'native language', consisting mostly of maps between strings of 0s and 1s; 'above' this, compilers and interpreters translate between different programming languages; programs, machines in themselves, are executed on yet higher levels, in parallel or serially; some may implement virtual machines of varying sophistication, etc. -- but this is unimportant for the main point.)
This has the advantage that, given a proper translator, a programmer does not have to take into account possible differences in end-user systems, as the Java machine works the same way everywhere.
A final thought: earlier on, I had mentioned that 'you, a pencil, and a stack of paper' are just as much a computationally universal system as a Turing machine is -- actually, the pencil and the paper are just auxiliaries, they're not necessary functionally. But then, this means that you can emulate any other universal system -- and especially, do any computation a Turing machine can do, so there's really no reason to be afraid of maths. But even more than that -- if the universe itself is 'nothing but' a universal system, i.e. if it or its salient features can be simulated on a computer, then, at least in principle, everything that goes on within it can be emulated by the human brain -- meaning that the universe is, at least in principle, understandable. It is sometimes said that there is no reason to expect humans to be able to understand the universe any more than there is reason to expect dogs to understand calculus; here, then, there is such a reason.

A Difference to Make a Difference, Part II: Information and Physics

2011-07-04T15:05:00.000-07:00

The way I have introduced it, information is carried by distinguishing properties, i.e. properties that enable you to tell one thing from another. Thus, whenever you have two things you can tell apart by one characteristic, you can use this difference to represent one bit of information. Consequently, objects different in more than one way can be used to represent correspondingly more information. Think spheres that can be red, blue, green, big, small, smooth, coarse, heavy, light, and so on. One can in this way define a set of properties for any given object, the complete list of which determines the object uniquely. And similar to how messages can be viewed as a question-answering game (see the previous post), this list of properties, and hence, an object's identity, can be, too. Again, think of the game 'twenty questions'.
Consider drawing up a list of possible properties an object can have, and marking each with 1 or 0 -- yes or no -- depending on whether or not the object actually has it. This defines the two sides of a code -- on one side, a set of properties, the characterisation of an object; on the other side, a bit string representing information this object contains. (I should point out, however, that in principle a bit string is not any more related to the abstract notion of information than the list of properties is; in other words, it's wrong to think of something like '11001001' as 'being' information -- rather, it represents information, and since one side of a code represents the other, so does the list of properties, or any entry on it.)
The usual point of view consists of considering the properties to be more fundamental, to be 'physically real'; after all, objects actually are red, green, big, small, heavy or light, etc. But one can just as well take the point of view that it's not the difference in properties that differentiates objects from one another, providing the possibility of storing information; but rather, that the differentiation is due to the difference in information content, that the properties an object has are just a representation of information, just like a string of bits is.
This viewpoint does not actually carry any ontological commitments yet -- it's fully dual to the 'real physical properties' one, and leads to the same descriptions, just with a different emphasis.
There is a subtlety here, though, and one that is easily overlooked. Consider the case if the two indistinguishable spheres were not spheres, but rather identical cars, of the same make, model, color, and so on. And I do mean identical here, down to the last molecule. Now, if we again color one car, we can again store one bit; if we add more distinguishing properties, we can store correspondingly more information. But there are considerably more properties that one would already associate with the cars themselves than there were in the case of the spheres: cars have tires, doors, exhausts, windshield wipers, and much more. But all of those common properties are, in a certain sense, 'hidden': they are not usable to store information; in fact, the universe consisting of identical cars plus a few distinguishing properties is identical, with respect to an informational description, to the universe consisting of identical spheres plus those same distinguishing properties.
Properties thus are relational entities: only if objects are distinguishable from other objects by having them -- alternatively, if there are objects that don't have them -- do they enter a description such as the one proposed. This foreshadows an important ontological issue that I plan to revisit in a future post.
I have up to now treated entropy as a purely information-theoretical notion, but it has a very physical meaning: roughly, it quantifies the energy available to do useful work in thermodynamic processes. How does this relate to redundant or random strings of symbols?
To understand this, we first need to realise that, when we look at a physical system, we don't look at it at the 'symbol-level', but rather, at a highly compressed, coarse-grained one -- where the compression in this case is of a lossy kind, which is a compression such that in general, knowing only the compressed string, the original string can't be fully reconstructed, whereas all the compression schemes we have considered so far are lossless, i.e. knowing the compressed version and the method of compression (the code), it is always possible to perfectly reconstruct the original. Lossy compression is used if the details don't matter, in some sense -- for instance, the popular image format .jpg uses a lossy compression scheme, because we don't notice differences at the pixel level, so it's not necessary to store the precise color value of every single pixel. In a manner of speaking, lossy compression throws some information that is deemed inessential away.
Lossy compressions comprise what is called a 'many-to-one' mapping, while lossless schemes are called 'one-to-one'. The reason is simple: in a lossy scheme, there are many possibilities for the precise form of the original string, many original strings get mapped to one and the same compressed string; while in a lossless one, the original is uniquely determined -- each original gets mapped to only one compressed version. So, for instance, describing a string of n instances of the symbol a as 'na' is lossless, while describing a string of n random symbols as 'a string of n random symbols' is lossy: there are many possible original strings that fit this description.
In physics, the relevant notions are macrostates and microstates. The microstate is analogous to the original symbol string, only that the symbols here are, for example, atoms and their configuration. The macrostate is the coarse-grained, lossy-compressed physical system we see. Roughly, the microstate corresponds to the complete description of a given system or object, and the macrostate is comprised of those properties we can perceive, and interact with; those that have an actual influence on us. The microstate determines the system's macroscopic properties, and thus, the informational description we ascribe to it. So for a system like a volume of gas, its microstate would include, for example, the complete specification of the position and velocity of all of its constituent atoms -- a huge amount of information --, while the macrostate consists of those variables that 'matter' to us, like temperature, volume, and pressure.
This means that the reason for the lossyness is simply our obliviousness towards the microscopic details: exchanging an atom here with an atom there does not make any difference to us; rather, the macroscopic variables we see are related to aggregate, as opposed to individual, properties of the microscopic constituents.
A system which has few microstates corresponding to a given macrostate, i.e. a system (or a particular state of a system) which is easily compressed to a human-manageable description, has a low entropy; a system that has many different microstates yielding one and the same macrostate, i.e. a system in which there is much loss in the compression, has high entropy.
There are many more high-entropy states for a given system than there are low-entropy ones. Consider a model system consisting of a sequence of 100 coin throws. The microstate is the detailed description of the sequence: heads, tails, tails, heads, tails... The macrostate is merely the total number of heads and tails. For the extreme cases of 100 heads, or 100 tails, there exists exactly one microstate -- knowledge of the macrostate characterises the microstate uniquely, entropy is minimal (think of the aaaaaaaa... string). For the case in which there are 50 heads and fifty tails, of any ordering, there are '100 choose 50', or about 10²⁹ possible microstates. It is thus for any given system much more likely to be in a high entropy state than it is to be in a low entropy one, simply by virtue of there being many more high entropy states, and thus, any evolution the system might undergo is (much) more likely to take it to states of ever higher entropy -- this is nothing else than the famous second law of thermodynamics.
It's important not to underestimate the probabilities at work here: already in the comparatively small example of 100 coins, there was an enormous difference between the number of high- and low-entropy states (and consequently, the number of evolutions going from high to low entropy versus the number of evolutions going from low to high entropy). Now imagine the magnitude of these numbers in systems of, say, one mole of gas -- which, for air, corresponds to a volume of roughly 23 liters, hardly a cosmic amount: it contains not 100, but rather, 6.022*10²³ constituents! The time scale on which even a minuscule reduction of entropy, brought about by pure chance alone, can reasonably be expected, exceeds the lifetime of the universe by many orders of magnitude.
As a side note, it's important to realise that this is not an empirical law as much as it is a law of probability, of logic -- it simply states, effectively, that more likely states occur more often. This simple, yet powerful statement is what inventors of purported perpetual motion machines are up against.
A system thus evolves from low entropy to high entropy states -- this evolution 'goes by itself', so to speak. Using this tendency, one can thus prepare the system in such a way that its evolution drives the evolution of some other system -- say, if the system is a volume of gas under pressure, its expansion may drive a piston in a combustion engine. But once the system has reached its maximum entropy, it stays there (at least, with overwhelming probability), and all evolution is limited to small thermal fluctuations. No energy can be extracted from it any more; one first would have to expend some to drive the system back to a lower entropy state, in order to be able to extract energy from it again.
The lower a system's entropy, thus, the more useful energy can be extracted from it.
So now you know -- whether or not you can extract energy from a system is related to whether or not it can be compressed losslessly; in other words, Diesel engines work because of fuel compression.

A First Glimpse of Holography
As we have seen, entropy (within a closed system), with an overwhelming likelihood, can only ever increase, or at best, stay constant. It measures the complexity of a system's microstate, i.e. the amount of information needed to uniquely specify it. Think back to the coin example: 1oo bits -- i.e. the complete description of the sequence of throws, writing, say, 1 for heads and 0 for tails -- are needed to specify the highest entropy states; while just a few bits, the equivalent to, say 'all 1', or 'all 0' (which could be, depending on the coding scheme, realized with just a single bit, 1 or 0) suffice to specify the states of lowest entropy. In general, a system which can be in any of k (micro-)states is said to have n = log₂(k) bits of entropy. So the set of states, i.e. coin throw sequences, described by 'as many heads as tails', of which we surmised there were roughly 10²⁹, has an entropy equal to log₂(10²⁹), which roughly works out to 96.3 -- close enough (to 100) for government work, as they say.
It's natural to check how this understanding holds up in extreme circumstances. Take a very big and very hot system -- a system with a very high entropy, in other words -- such as a star. Stars, though often used as a poetic metaphor for eternity and immutability, are nevertheless finite things -- at some point, they'll run out of steam, so to speak. Knowing what we know now about entropy and the second law of thermodynamics, this should not come as a shock: at some point, everything runs out of steam.
However, stars are essentially huge nuclear explosions in a precarious equilibrium: the power of their inner nuclear furnace, pushing outwards, counterbalances the gravitation of their own mass, which wants to concentrate itself as highly as possible. Thus, once the nuclear fire dies, gravitation wins out -- the star collapses.
The collapse itself is a complicated and fascinating process, and it is what gives rise to the phenomena of novae and supernovae, but for present purposes, all that matters is that 1) the star shrinks, and 2) entropy goes up (as it, of course, must).
Now, if the star is massive enough (roughly twenty times as massive as our own sun), it eventually shrinks down to a point where the gravity at its surface is so strong, the escape velocity -- the speed you need to impart on anything in order to have it escape from a body into deep space, as opposed to 'falling back down' -- exceeds that of light: thus, since the speed of light is the fastest anything can go, no radiation, no signals, nothing ever reaches the outside universe from beyond that point -- a black hole is born. The invisible boundary in spacetime that marks this 'point of no return' is called the event horizon. (This concept -- if not fleshed out to present sophistication, obviously -- is much older than is generally thought; already in the 1780s, Reverend John Mitchel and French physicist and mathematician Pierre-Simon Laplace talked about hypothetical 'dark stars'.)
Now, it is a characteristic of black holes, treated with the machinery of Einstein's General Theory of Relativity, that they can be described using just a few numbers -- namely their mass, charge, and angular momentum; this is known as the no hair theorem [1].
We thus have quite a short description that applies to the black hole; compare this to the very messy, very complicated microstate of a collapsing star. It is clear that some very lossy compression has taken place; thus, we should expect for black holes to be very high entropy objects.
However, it is actually the case that in General Relativity, mass, charge and angular momentum characterise the complete microstate of the black hole -- thus, its entropy, as defined by the logarithm of the number of microstates, must be quite low!
What gives?
Well, here, a theorem by Stephen Hawking comes into play. He discovered in the 1970s [2] that in all the processes a black hole can undergo, its total surface area -- i.e. the area of its event horizon -- can only ever increase, or at best stay constant. Sounds familiar?
Indeed, it shows the same behaviour as entropy (of a closed system). One may thus postulate a relationship between horizon area and entropy, and indeed, it turns out that the most simple relationship -- a straightforward proportionality -- does the job [3]. Thus, whenever a system with some entropy is 'thrown into' a black hole, the black hole's horizon area increases by an amount proportional to that entropy -- the constant of proportionality (in Planck units) simply being equal to 1/4.
This value, known as the Bekenstein bound, places an upper limit on the amount of entropy within a given volume of spacetime, which is only reached by black holes (they 'saturate' this bound); this has important consequences, suggesting, among other things, that the amount of information in a finite volume of space must be finite, which is at odds with the assumption of spacetime itself being a continuous quantity, an issue which I will return to in a future post. (For those wanting to read ahead a little, an interested-layman level discussion by Bekenstein can be found in [4].)
There is something puzzling about this area-dependence: starting with a volume of gas, and adding more gas molecules to it, since entropy depends on the number of total states, which is in turn related to the volume the gas occupies (since the positions of the 'new' atoms can be arranged everywhere in that volume, and they can move freely within it), one should expect entropy to scale with the volume; but evidently, at some point, this expectation must break down.
This peculiar feature has led to the conjecture known as the holographic principle: the complete, three dimensional information within a given 'part' of space can be thought of as being encoded on the two-dimensional surface bounding it. The precise way in which this encoding works, though, is still subject to some debate.
Another question is left open: where does this entropy come from? Does a black hole have certain 'microstates' that account for it? If so, of what kind are they?
This is currently an active topic of research, which I plan to return to. For the moment, we can rest assured in the knowledge that the second law continues to hold, albeit in slightly modified form: the sum total of (thermodynamic) entropy and black hole horizon area can only ever increase, or at best stay constant.

References:
[1] Ruffini R. and Wheeler J. A.: Physics Today, 24, no. 12, 30 (1971)
[2] Hawking S.W.: Physical Review Letters, 26, 1344 (1971)
[3] Bekenstein J.D.: Lettere al Nuovo Cimento, 4, 737, (1972) (pdf link)
[4] Bekenstein, J. D.: Information in the holographic universe, Scientific American, 289, no. 2, 58-65 (2003) (weblink)

A Difference to Make a Difference, Part I: Introducing Information

2011-07-01T05:57:00.000-07:00

Picture a world in which there are only two things, and they're both identical -- let's say two uniform spheres of the same size and color, with no other distinguishable properties.
Now, ask yourself: How do you know there are two of them? (Apart from me telling you there are, that is.)
Most people will probably answer that they can just count the spheres, or perhaps that there's one 'over there', while the other's 'right here' -- but that already depends on the introduction of extra structure, something that allows you to say: "This is sphere number 1, while that is sphere number 2". Spatial separation, or the notion of position, is such extra structure: each sphere, additionally to being of some size and color, now also has a definite position -- a new property. But we said previously that the spheres don't have any properties additionally to size and color. So, obeying this, can you tell how many spheres there are?
The answer is, somewhat surprisingly, that you can't. In fact, you can't even distinguish between universes in which there is only one sphere, two identical ones, three identical ones etc. There is no fact of the matter differentiating between the cases where there are one, two, three, etc. spheres -- all identical spheres are thus essentially one and the same sphere.
This is what Leibniz (him again!) calls the identity of indiscernibles: whenever two objects hold all the same properties, they are in fact the same object.
Now consider the same two-sphere universe, but one sphere has been painted black. Suddenly, the task of determining how many spheres there are becomes trivial! There's two: the one that's been painted black, and the one that hasn't. But how has this simple trick upgraded the solution of this problem from impossible to child's play?
The answer is, of course, that now the two spheres are discernible. Now there exists a property that one sphere has, but the other sphere lacks: being black.
Now, what is this good for? Well, the interesting thing is that now, once you have distinguishable spheres (or distinguishable anythings, really), you can do stuff with them -- for instance, you can answer questions: the black sphere for no, the non-black sphere for yes. You can signal choices. Make decisions. If you work out a clever scheme, you can even communicate: a sequence of five spheres could correspond to a letter, or another orthographical sign -- writing b for black and n for not black, nnnnb could stand for 'a', nnnbn for 'b', nnnbb for 'c' and so on. For instance, nbnnn, nnbnb, nbbnn, nbbnn, nbbbb would mean 'hello' (I've just added the commas to improve legibility; they're not actually necessary, you could just as well count the number of ns and bs). Though cumbersome, anything you could wish to communicate can be communicated this way.
And that's not the end of it -- utilising clever enough rules, you can use the spheres to make logical deductions, carry out computations, etc; in fact, it's easy to see that one could use them to simulate one of the cellular automatons I mentioned in the previous post, particularly a computationally universal one, which means that any computation that can be carried out at all can be carried out with these spheres.
All this -- communication, computation, decision making, etc. -- becomes possible only because the spheres have been made distinguishable from one another. Really, a difference that makes a world of difference!
Of course, most have probably spotted that the story so far has really been about bits, and hence, about information. In fact, this is where the title derives from -- Gregory Bateson, a noted pioneer of systems theory and cybernetics, characterized information as "a difference that makes a difference".
At first, this has little to do with how we define 'information' in our everyday lives. When we usually talk about 'getting information' about something, we mean acquainting ourselves with relevant facts. I'll later get back to what this has to do with 0s and 1s, or black and non-black spheres.
First, we must consider the question that stood at the origin of modern information theory: given a certain signal, how can we tell how much information is contained within it? Not knowing the code, it is not at all obvious that nbnnn, nnbnb, nbbnn, nbbnn, nbbbb and 'hello' contain the same information -- one is five times as long as the other, for starters.
To this end, two notions we will need are compressibility and predictability. It's clearest to just give examples: a highly uniform string (of symbols), such as aaaaaaaaaaaaaaaaaaaa, is highly compressible -- you could just as well give me the string '20 times a' or '20a' as a description of it, and I would get all the information that was contained in the original. On the other hand, a random, highly diverse string, such as wtokngmnoiakahklaijtg, is not really compressible in any obvious way, even though it is of the same length as the previous one.
The reason for this difference in compressibility lies in the difference in predictability between the two strings: for the first, predicting the next symbol on the basis of the previous ones becomes easier the more of the string you read; for the second, prediction is impossible (this, in fact, can be taken as a definition of randomness: a sequence of symbols is random if it is impossible to predict the next symbol based on only knowing the previous ones). The first string thus can be given a short description easily, while for the second, every description that contains the same information will on average be at least as long as the string itself. (Note the resemblance here to the story of Leibniz and the inkblots.)
Strings of symbols that can't be compressed very much, i.e. that are very unpredictable, are also said to have high entropy; conversely, highly redundant, compressible strings have low entropy.
Another notion we will need is that of a code: basically, a code is a map between certain sets of symbols. We have already encountered one instance of a code, which mapped letters to patterns of black and non-black spheres, or alternatively, bits. Generally, the symbols on one 'side' of the code are assumed to be understood, to have a certain meaning, so that using the code, a meaningless string of symbols can be translated into a meaningful one (though how one string of symbols can be meaningful at all is a very difficult question, which we shall ignore for the time being). That way, the meaningless bs and ns of the previous example get mapped to the meaningful 'hello'. This is how 0s and 1s, strings of symbols, etc. connect to the everyday notion of information: applying a suitable code, such entities can be converted into something that has meaning to us -- such as sentences in English, for example. Thus, the notion of information we've been using so far is really related to the potential of meaningful information that a string can carry -- the amount of meaningful information that can be extracted using a suitable code.
Codes are generally not unique -- I could invent another mapping that takes the same meaningless string to a different meaningful one, so the meaningless string on its own does not suffice to determine both the code and the meaningful string it is intended to map to; this is the reason ancient, extinct languages are, without any point of reference such as, for instance, the stone of Rosetta, impossible to translate.
The key realisation now is this: for most codes, strings that have a high entropy carry more information than strings of the same size that have a lower entropy.
The reason lies in the notion of predictability. First, let's go back to the spheres. One of the first things we discovered distinguishable spheres let us do was answering questions. How did that work again? Well, to answer 'no', one could just show the black sphere; to answer 'yes', show the non-black one instead. The distinguishability of the spheres thus lets us decide alternatives: yes or no, non-black or black, 1 or 0. Now, any kind of message can in principle be rephrased in terms of a set of (yes/no) questions to answer. In fact, that's basically what our simple code does -- the first spot represents the answer to the question: "Is the letter found in the first fifteen letters of the alphabet?" (n for yes, b for no), the second spot answers "Of the remaining letters, is the sought one in the first seven?", and so on, until eventually, a single letter is picked out.
It's like the game 'twenty questions': you successively eliminate possibilities until at the end, only one remains. The question of how much information is contained in a message thus becomes the question of how many (yes/no) questions I can answer using it.
It's easy from here to arrive at a precise quantification of information content: Optimally, every answered question can cut the amount of possibilities in half. So, with one answered question, you can decide between two alternatives; with two answered questions, between four; three answers allow you to uniquely pick out one of eight objects, and so on -- generally, n answers -- n bits -- allow you to distinguish between 2ⁿ objects. Thus, in the game of 'twenty questions', the twenty answers allow you to distinguish between 2²⁰ possibilities -- which works out to a staggering number of 1,048,576!
Thus, any string that can decide between 1,048,576 alternatives is said to have an information content of 20 bits, or, more generally, if it can decide between n alternatives, its information content is log₂(n) (where log₂ is the base-2 logarithm, i.e. the inverse of the exponentiation operation 2ⁿ).
With every answered question, you learn something new about the sought letter, person, or object, and thus, about the content of the message.
This way the communication of a message can be recast as a question answering process; the more questions that are answered, the more detailed can the message be.
Now, for the string consisting just of twenty repetitions of the letter a, you learn nothing new with each symbol you read, no further questions are answered, since you could have predicted it with certainty in advance -- but with the random string, since every new symbol comes unexpected, you gain some knowledge with each one, and get new answers. A redundant string can't answer many questions, while a random one can -- in fact, it can answer the maximum amount of questions for a string of its length (written with a particular set of symbols); its entropy, and hence its information content, is maximal.
However, there's a caveat attached to this: it's only true on average. In principle, nothing keeps you from creating a code that maps the letter a to the complete works of Shakespeare, so that just one letter, despite not being able to answer many questions at all, is able to transmit quite a long and detailed message. However, you always end up paying for this 'illegal' gain in efficiency at some point, in having to use a highly incompressible, long and complex string in order to encode some comparatively short and simple message.
To see this, consider Borges' Library of Babel: it contains every book (and by book, I mean every possible combination of characters), of a certain length, say 130,000 for definiteness. You're faced with the task of cataloguing this awesome collection. So, you start simple: you give the book that is identical to Shakespeare's Hamlet (which contains about 130,000 letters) the number one, the book that differs from Hamlet in that the last character is an a (if it isn't in the original) the number two, the book whose last letter is a b the number three, and so on.
At first, you might think you've hit upon a highly efficient scheme -- after all, strings of 130,000 characters length get mapped to very short ones. But things will get bad, and go from bad to worse to terrible, rather quickly: for the books only differing in the last letter, you will need the numbers one to 26, for the books differing in the last two letters, the first 26*26 numbers, and finally, to catalogue all the books, you will need the numbers up to 26^130,000, which is a number with about 184,000 digits -- and thus, far longer than the books themselves are!
Your previously efficient coding scheme has come back to bite you: where at first you were able to save significant amounts of space, now you end up expending correspondingly more to make up for it. On average, there exists no magical space saver like that -- and thus, on average, it remains true that strings with a high entropy carry more information than low entropy ones.

Leibniz' Dream

2011-06-26T05:25:00.000-07:00

The Edge of Chaos
Gottfried Wilhelm Leibniz, who holds the distinction of being the only philosopher a biscuit was named after, once expressed the hope that every philosophical dispute may be settled by calculation:

"The only way to rectify our reasonings is to make them as tangible as those of the Mathematicians, so that we can find our error at a glance, and when there are disputes among persons, we can simply say: Let us calculate [calculemus], without further ado, to see who is right." [1]

To this end, he proposed two concepts, which were to be employed in this calculation: the characteristica universalis, which was to be a kind of conceptual language, able to symbolically represent concepts of mathematics, science, and metaphysics alike, consisting of what he called real characters, capable of directly corresponding to an idea, embodying it in the way a number embodies the idea of quantity, as opposed to merely referring to it, as words do; and the calculus ratiocinator, a system or device used to perform logical deductions within the framework set by the characteristica.

It is not precisely clear whether Leibniz intended for the ratiocinator to be an actual machine -- after all, Leibniz was one of the pioneers of mechanical calculation machines with the construction of the Stepped Reckoner --, or merely an abstract calculus, a forerunner to modern symbolic logic -- whether it was software or hardware, so to speak.

For present purposes, however, the distinction is somewhat immaterial, and we can consider the ratiocinator in the way Hartley Rogers did, as

"an algorithm which, when applied to the symbols of any formula of the characteristica universalis, would determine whether or not that formula were true as a statement of science." [2]

In modern terms, one would perhaps regard the characteristica as a kind of formal system, and the ratiocinator, understood as above, as a decision procedure for said system.
But rather than dwell on precisely what this means, I'll just tell you the punchline: thanks to pioneering work in logic and the theory of computation undertaken mainly in the first quarter of the 20th century, and most notably through the works of Kurt Gödel and Alan Turing, we now know that this can't work. In general, for any sufficiently powerful formal system, there exists no algorithmic procedure such that it can tell true from false statements in a finite amount of time.
It is this failure of Leibniz' dream that this blog will be most concerned with, though often in a hidden and somewhat roundabout way. This failure, far from being a crushing defeat to the search for reason and comprehensibility in the universe, actually opened up the door to some of the most fascinating developments and ideas of the past 100 years -- and perhaps, of the entirety of human history.
I am not merely talking here about things like the celebrated incompleteness theorem, due to Gödel, though that is certainly a big part of it, but rather of the more general cluster of phenomena centered roughly around notions of self-reference, such as computational universality, self-organizing systems, and criticality -- phenomena that occur along the so-called 'edge of chaos', the fine line that separates systems that are boring because their behaviour is simple, repetitive and predictable, from systems that are boring because their behaviour is random and patternless. It's along this line that most interesting things happen.
A very simple example, and by far not the most interesting one, is furnished by the behaviour of so-called cellular automata. A (one dimensional) cellular automaton is a kind of game -- a zero-player game that 'plays itself' -- that consists of a row of cells, each of which can be either on (usually denoted by painting the cell black) or off (leaving the cell white). Based on the pattern of on and off cells of the row and a simple rule characteristic of the automaton, the next line is calculated, then the next one from that, and so on. An example for a possible kind of rule would be that a cell is black if in the previous step either of its immediate neighbours was black; otherwise, it is left white.
There are 2³ = 8 possible patterns a cell and its immediate neighbours can produce (writing 1 for black and 0 for white, these are 000, 001, 010, 011, 100, 101, 110, 111), and since each rule assigns to each pattern either a black or a white cell in the following line, there are 2⁸ = 256 different elementary cellular automata. Thus, a number between 0 and 255 uniquely identifies each such automaton. This number is called its Wolfram code, after physicist and computer scientist Stephen Wolfram; much (much...) more about cellular automata and similar systems can be found in [3].
A typical example of a cellular automaton's evolution is this (whose Wolfram code is 50) (all CA pictures were generated with Wolfram|Alpha, by the way -- just type in 'rule' followed by a number if you fancy some experimentation):

Starting from random initial conditions, i.e. a random assignment of black and white cells, very quickly a boring pattern emerges: there is no point in tracing its evolution any further, nothing 'new' is going to happen.
At the other end of the spectrum, there are automata like this one (rule 45):

No apparent patterns emerge; the evolution is totally chaotic.
But between these extremes lies the edge of chaos, with automata like rule 110, which at first looks rather chaotic, like rule 45 above:

But, if we follow it for a little longer, it begins to exhibit unexpected features:

As can clearly be seen, certain localized structures emerge and persist, interacting occasionally. This automaton will neither repeat, nor decay into complete chaos -- rather, it will stay in that magical in-between zone, where interesting things happen. And in fact, in this case, there are more interesting things about rule 110 than meets the eye -- surprisingly, it is computationally universal, which means that one can use it to carry out any computation that can be carried out at all. Quite a feat for a simple coloring game!
Thus, the failure of Leibniz' dream is nothing to despair about -- in fact, had his dream proven realizable, we would live in a rather boring universe: everything would be predictable through rote calculation, through mindless manipulation of symbols. This way, at every corner, genuine novelty awaits; there is always potential for surprise and amazement in every area of study, or of life.
However, one must be careful not to rashly conclude that the universe is lawless, or that its fundamental nature (whatever that may mean) is unknowable. Indeed, the cellular automaton example blows such speculation out of the water: it is certainly a completely lawful, even deterministic, system, and the law it follows is simple enough. Yet nevertheless, certain questions about its evolution can't be answered using algorithmic methods of reasoning.
In the quest of explaining the phenomena of the natural work, one often encounters two seemingly contrary stances: those that have a certain yearning for order, for analysis, getting to the bottom of things and solving the mysteries of nature; and on the other hand those that feel a loss in tearing away the veil and exposing the magician's tricks, that revel in mystery, to whom the label 'inexplicable' is not a sign of defeat, but rather the reassuring glimpse of something beyond human reason. To the latter, the former often seem cold, sterile, and in discarding anything not measurable, they feel that everything that makes life worth living is thrown out; while the former often think of the latter as being somewhat naive, even delusional, and see a weakness in the need to 'sugarcoat' objective (and sometimes, perhaps, harsh) realities with mystery and magic.
The failure of Leibniz' dream does not mean victory for the latter group of people. Rather, it means something far more interesting: namely, that the perceived dichotomy between lawfulness and mystery simply doesn't exist! A system's fundamental dynamic may be completely known, and even boring, but nevertheless, new phenomena emerge everywhere, bringing with them all the surprise, amazement and wonder one could hope for (without being overly greedy). Granted, it's not obvious through looking at cellular automata -- it takes a special sort of person to be continually amazed by little black and white squares. But bear with me.

Dramatis Personae (Introducing the Main Characters)
You've probably noticed the (perhaps somewhat overwrought) artwork in the header of this blog. This isn't just something that I thought looks pretty, rather, it is composed of certain parts that have some connection with the things I plan to write about. They're characters in Leibniz' sense, if you will, forming our own mini-characteristica. Every new blog post will be prefaced by one or more of them, indicating what it is going to be about -- a sort of visual tag. First, for easy reference, here's the picture again, reduced to the functional parts:

We'll start on the left. The spherical object you see there is actually composed of several spheres on different levels, plus some more decoration. In no particular order:

Quantum Mechanics: This is a representation of the state space of a quantum bit, or qubit for short, known as as Bloch sphere. A qubit, as the name indicates, is the quantum version of the fundamental unit of information, the bit (short for binary digit). Unlike its classical counterpart, it is not restrained to being sternly either 0 or 1 -- rather, it can enter into superpositions of both possibilities. Every point on the sphere's surface represents such a superposition. This capability is essentially what enables a quantum computer to accomplish certain tasks much faster than any classical one ever could. In a sense, the qubit can also be thought to stand for the most basic quantum mechanical system, a so-called 'two level'-system. No matter its actual physical realization, its state space will always be isomorphic to the picture. I will use this symbol to denote posts concerned chiefly with quantum mechanics.

Holographic Principle: On the next 'level', the sphere painted with (classical) bits represents the Bekenstein bound, or more generally, the holographic principle. Faced with the puzzle of where the information -- or more accurately, the entropy -- of matter falling into a black hole goes, Jakob Bekenstein made the quite surprising discovery that there is a maximum amount of entropy that can be 'stored' within a given part of space, and that black holes saturate this bound (they were previously thought to be rather low-entropy objects). Interestingly, that maximum amount turns out to be proportional to the area of the surface of the part of space, rather than to its volume. Thus, it was conjectured that the information about the matter that has fallen into a black hole is stored on the black hole's surface, its event horizon, much the same way three dimensional pictures are encoded on two dimensional surfaces using holography -- this is, essentially, the so-called holographic principle. I will use this icon whenever posts deal chiefly with holography, entropy, or things like the so-called AdS/CFT correspondence (an explicit realization of holography in string theory).

Information Theory: Moving on, the strange patterns on the sphere's surface, which look already somewhat familiar from the earlier pictures in this post, are indeed related to cellular automata, or, more specifically, a particular automaton known as Conway's Game of Life, or just Life among the cognoscenti. Unlike previous examples, Life is a two dimensional automaton; like rule 110 above, it is also computationally universal. The pictures are examples of a particular structure known as a glider: a pattern that, after a few steps of evolution, returns back to its original configuration, having moved itself a few steps across the grid in the process. Here's an animation of a glider in motion. A Java implementation of Life can be found here. This icon will mainly preface posts dealing with information theory and related subjects.

Particle Physics: This little graph, painted across the gliders, is (part of) a so-called Feynman diagram. Feynman diagrams are a pictorial way of representing elementary particle interactions -- the somewhat tongue-in-cheek implication here being that the gliders on the sphere could be seen as some kind of 'elementary particles' in a cellular automaton world. More specifically, this is the so-called QED vertex -- an electron comes in, emits a photon, and moves on. This is the basic interaction of the theory of quantum electrodynamics, which is the quantum theory accounting for all electromagnetic phenomena -- which includes almost all the phenomena we experience in our everyday lives (the phenomena related to gravity would be the other big category). This icon will stand more generally for quantum field theory and particle physics.

General Relativity/Spacetime: This picture is probably familiar to most -- a spherical mass, together with a schematic representation of its gravitational field. Thanks to Einstein and his general theory of relativity, we know that what we call the gravitational force really is an effect of the geometry of spacetime. As John Wheeler put it, "mass tells space-time how to curve, and space-time tells mass how to move". This icon will stand for general relativity in particular, but also for theories and considerations on the nature of spacetime in general.

Emergence: This picture, looking somewhat like a depiction of magnetic field lines, is actually a schematic representation of the momentum space topology of a condensed matter system known as a Fermi liquid (or to be more exact, a certain universality class thereof). Let's take this slowly: Fermions are elementary particles whose spin (roughly, a quantum mechanical analogue of angular momentum) can take only half-integral values, i.e. 1/2, 3/2, etc. They obey the so-called Pauli exclusion principle, which means that no two of them can be in the same state at the same time. Thus if you pack a bunch of them together, the available energy levels get filled gradually, such that below a certain energy, all levels are filled, and above, none will be. This boundary -- called the Fermi surface -- between occupied and non-occupied states is what the sphere in the picture represents.
Typically, the fermions in such a system will be coupled to each other, as in this system of mass points and springs. Each disturbance of the system will thus propagate, like ripples on a pond. These ripples again will have a certain energy, and it is their energy the 'field lines' symbolize. In some systems, the ripples will have an energy some fixed amount 'above' the Fermi energy boundary -- these are called fully gapped systems. However, more interesting things happen in systems like the one depicted, that exhibit so-called Fermi points, where the energy of the excitations vanishes. The story is a bit longer and more complicated, but as it turns out, the excitations in the vicinity of these Fermi points resemble very much the elementary particles of our universe! I will use this icon to stand for condensed matter physics specifically, or the notion of emergence in general.

Philosophy: Moving on to the other side, the first thing that leaps to attention is a silhouetted head with a blank where the brain ought to be. However, this should not be taken to suggest mindlessness! Quite to the contrary, the white space is more analogous to an empty thought bubble, the form of a thought, waiting to be filled with content. It is the state of mind before thought: ready and anticipating. This icon will stand for philosophy, and especially for the philosophy of mind.

Computation: This complicated apparatus is a part of Charles Babbage's analytical engine, the planned, but never completed, successor to his difference engine, an early automated calculating machine. Broader in scope and capacity than the difference engine, the analytical engine was the first machine to be computationally universal -- or rather, it would have been, had it ever been built. It was fully programmable -- in fact, a program to calculate Bernoulli numbers using the analytical engine, written by Lady Ada Lovelace, daughter of Lord Byron (yes, the first programmer was a woman), still survives. As the first computer, this icon will stand for computation, in theory or in practice.

Updates/Internal Matters: Finally, the inkblot represents -- well, an inkblot. Ink spilled from a carelessly handled quill. Nothing more to it.
Well, that's not quite true, perhaps. In fact, there's an interesting story about inkblots and Leibniz that I would be remiss not to relate here. In his Discours de métaphysique [4], Leibniz asks the reader to imagine a page that has been spattered with ink. He then notes that there is always a curve that passes through these points -- even though their distribution is completely random. It seems thus that merely having a mathematical description of something does not imply that it actually follows some rule. But then, how should one tell 'lawful' from 'lawless' systems?
Leibniz' resolution is roughly the following: if there is nothing that can be gained from describing the set of randomly distributed points through a mathematical law -- i.e. if the purported law itself is as complicated as the spatter-pattern -- then there is no merit in the mathematical description; the pattern is effectively random. This foreshadows important concepts that we will meet again on this blog, such as algorithmic information theory and Kolmogorov complexity.
For the moment, however, this particular inkblot is just an inkblot; I will use it simply to denote posts that feature updates, personal notes, or any other flotsam and jetsam that does not fit with the things I am planning to post, but nevertheless want to write about.

Dénouement
If all of this seemed a bit much, don't worry -- consider this post more as an extended table of contents; it's meant just to name, not to fully explain, the topics and themes this blog will revolve around. I hope what little explanation I gave was sufficient to convey at least a taste of things to come, so to speak.
Also, if this post appealed to you at all, then there's a good chance forthcoming ones may, too; I will naturally go more into the depth of the subjects in blog posts dealing with them explicitly, but on the whole, I plan to go neither overly technical, nor to distort the subject in an attempt to spoon feed you appropriate soundbites. This is a fine line to straddle -- it's a bit like the edge of chaos, again --, and I'm very open to any and all feedback, or further questions.
The main aim of this blog is to collect and present interesting ideas, some of which are just now developing at the vanguard of science (like the holographic principle), some of which already have been around for a while, sometimes longer than is generally believed (like computation), all with an eye towards exploring the common currents underlying them.
However, everything you read here should be regarded the same way you should regard stuff you read on the internet in general -- skepticism is always healthy, and especially warranted where my own views diverge from the mainstream, or I discuss my own ideas, which I will always strive to point out.
I'll be somewhat more 'formal' than most blogs in that I'm going to source things -- i.e. for certain claims or quotes, point out where they are taken from, as in the 'References' section below. This isn't done out of academic stiffness -- the reason is merely that if you consider something I write interesting, you won't have to dig around looking for resources to read more about it. Whenever possible, I'll include a link to texts available online.
I think that's about all I wanted to say in this introduction -- well, actually, it's rather more than I originally planned to say. I had not set out to write about cellular automata at all, for instance. This is characteristic of systems on the edge of chaos: they tend to develop their own momentum. Let's hope it will lead to interesting things.

References:
[1] Leibniz, Gottfried Wilhelm, The Art of Discovery 1685, Wiener 51
[2] Hartley Rogers, Jr. 1963, An Example in Mathematical Logic, The American Mathematical Monthly, Vol. 70, No. 9., pp. 929–945
[3] Wolfram, Stephen, A New Kind of Science. Wolfram Media, Inc., May 14, 2002. ISBN 1-57955-008-8
[4] Leibniz, Gottfried Wilhelm. Discourse on Metaphysics and the Monadology, 1686, parts V and VI; online text