Montag, 4. Juli 2011

A Difference to Make a Difference, Part II: Information and Physics

The way I have introduced it, information is carried by distinguishing properties, i.e. properties that enable you to tell one thing from another. Thus, whenever you have two things you can tell apart by one characteristic, you can use this difference to represent one bit of information. Consequently, objects different in more than one way can be used to represent correspondingly more information. Think spheres that can be red, blue, green, big, small, smooth, coarse, heavy, light, and so on. One can in this way define a set of properties for any given object, the complete list of which determines the object uniquely. And similar to how messages can be viewed as a question-answering game (see the previous post), this list of properties, and hence, an object's identity, can be, too. Again, think of the game 'twenty questions'.
Consider drawing up a list of possible properties an object can have, and marking each with 1 or 0 -- yes or no -- depending on whether or not the object actually has it. This defines the two sides of a code -- on one side, a set of properties, the characterisation of an object; on the other side, a bit string representing information this object contains. (I should point out, however, that in principle a bit string is not any more related to the abstract notion of information than the list of properties is; in other words, it's wrong to think of something like '11001001' as 'being' information -- rather, it represents information, and since one side of a code represents the other, so does the list of properties, or any entry on it.)
The usual point of view consists of considering the properties to be more fundamental, to be 'physically real'; after all, objects actually are red, green, big, small, heavy or light, etc. But one can just as well take the point of view that it's not the difference in properties that differentiates objects from one another, providing the possibility of storing information; but rather, that the differentiation is due to the difference in information content, that the properties an object has are just a representation of information, just like a string of bits is.
This viewpoint does not actually carry any ontological commitments yet -- it's fully dual to the 'real physical properties' one, and leads to the same descriptions, just with a different emphasis.
There is a subtlety here, though, and one that is easily overlooked. Consider the case if the two indistinguishable spheres were not spheres, but rather identical cars, of the same make, model, color, and so on. And I do mean identical here, down to the last molecule. Now, if we again color one car, we can again store one bit; if we add more distinguishing properties, we can store correspondingly more information. But there are considerably more properties that one would already associate with the cars themselves than there were in the case of the spheres: cars have tires, doors, exhausts, windshield wipers, and much more. But all of those common properties are, in a certain sense, 'hidden': they are not usable to store information; in fact, the universe consisting of identical cars plus a few distinguishing properties is identical, with respect to an informational description, to the universe consisting of identical spheres plus those same distinguishing properties.
Properties thus are relational entities: only if objects are distinguishable from other objects by having them -- alternatively, if there are objects that don't have them -- do they enter a description such as the one proposed. This foreshadows an important ontological issue that I plan to revisit in a future post.
I have up to now treated entropy as a purely information-theoretical notion, but it has a very physical meaning: roughly, it quantifies the energy available to do useful work in thermodynamic processes. How does this relate to redundant or random strings of symbols?
To understand this, we first need to realise that, when we look at a physical system, we don't look at it at the 'symbol-level', but rather, at a highly compressed, coarse-grained one -- where the compression in this case is of a lossy kind, which is a compression such that in general, knowing only the compressed string, the original string can't be fully reconstructed, whereas all the compression schemes we have considered so far are lossless, i.e. knowing the compressed version and the method of compression (the code), it is always possible to perfectly reconstruct the original. Lossy compression is used if the details don't matter, in some sense -- for instance, the popular image format .jpg uses a lossy compression scheme, because we don't notice differences at the pixel level, so it's not necessary to store the precise color value of every single pixel. In a manner of speaking, lossy compression throws some information that is deemed inessential away.
Lossy compressions comprise what is called a 'many-to-one' mapping, while lossless schemes are called 'one-to-one'. The reason is simple: in a lossy scheme, there are many possibilities for the precise form of the original string, many original strings get mapped to one and the same compressed string; while in a lossless one, the original is uniquely determined -- each original gets mapped to only one compressed version. So, for instance, describing a string of n instances of the symbol a as 'na' is lossless, while describing a string of n random symbols as 'a string of n random symbols' is lossy: there are many possible original strings that fit this description.
In physics, the relevant notions are macrostates and microstates. The microstate is analogous to the original symbol string, only that the symbols here are, for example, atoms and their configuration. The macrostate is the coarse-grained, lossy-compressed physical system we see. Roughly, the microstate corresponds to the complete description of a given system or object, and the macrostate is comprised of those properties we can perceive, and interact with; those that have an actual influence on us. The microstate determines the system's macroscopic properties, and thus, the informational description we ascribe to it. So for a system like a volume of gas, its microstate would include, for example, the complete specification of the position and velocity of all of its constituent atoms -- a huge amount of information --, while the macrostate consists of those variables that 'matter' to us, like temperature, volume, and pressure.
This means that the reason for the lossyness is simply our obliviousness towards the microscopic details: exchanging an atom here with an atom there does not make any difference to us; rather, the macroscopic variables we see are related to aggregate, as opposed to individual, properties of the microscopic constituents.
A system which has few microstates corresponding to a given macrostate, i.e. a system (or a particular state of a system) which is easily compressed to a human-manageable description, has a low entropy; a system that has many different microstates yielding one and the same macrostate, i.e. a system in which there is much loss in the compression, has high entropy.
There are many more high-entropy states for a given system than there are low-entropy ones. Consider a model system consisting of a sequence of 100 coin throws. The microstate is the detailed description of the sequence: heads, tails, tails, heads, tails... The macrostate is merely the total number of heads and tails. For the extreme cases of 100 heads, or 100 tails, there exists exactly one microstate -- knowledge of the macrostate characterises the microstate uniquely, entropy is minimal (think of the aaaaaaaa... string). For the case in which there are 50 heads and fifty tails, of any ordering, there are '100 choose 50', or about 1029 possible microstates. It is thus for any given system much more likely to be in a high entropy state than it is to be in a low entropy one, simply by virtue of there being many more high entropy states, and thus, any evolution the system might undergo is (much) more likely to take it to states of ever higher entropy -- this is nothing else than the famous second law of thermodynamics.
It's important not to underestimate the probabilities at work here: already in the comparatively small example of 100 coins, there was an enormous difference between the number of high- and low-entropy states (and consequently, the number of evolutions going from high to low entropy versus the number of evolutions going from low to high entropy). Now imagine the magnitude of these numbers in systems of, say, one mole of gas -- which, for air, corresponds to a volume of roughly 23 liters, hardly a cosmic amount: it contains not 100, but rather, 6.022*1023 constituents! The time scale on which even a minuscule reduction of entropy, brought about by pure chance alone, can reasonably be expected, exceeds the lifetime of the universe by many orders of magnitude.
As a side note, it's important to realise that this is not an empirical law as much as it is a law of probability, of logic -- it simply states, effectively, that more likely states occur more often. This simple, yet powerful statement is what inventors of purported perpetual motion machines are up against.
A system thus evolves from low entropy to high entropy states -- this evolution 'goes by itself', so to speak. Using this tendency, one can thus prepare the system in such a way that its evolution drives the evolution of some other system -- say, if the system is a volume of gas under pressure, its expansion may drive a piston in a combustion engine. But once the system has reached its maximum entropy, it stays there (at least, with overwhelming probability), and all evolution is limited to small thermal fluctuations. No energy can be extracted from it any more; one first would have to expend some to drive the system back to a lower entropy state, in order to be able to extract energy from it again.
The lower a system's entropy, thus, the more useful energy can be extracted from it.
So now you know -- whether or not you can extract energy from a system is related to whether or not it can be compressed losslessly; in other words, Diesel engines work because of fuel compression.

A First Glimpse of Holography
As we have seen, entropy (within a closed system), with an overwhelming likelihood, can only ever increase, or at best, stay constant. It measures the complexity of a system's microstate, i.e. the amount of information needed to uniquely specify it. Think back to the coin example: 1oo bits -- i.e. the complete description of the sequence of throws, writing, say, 1 for heads and 0 for tails -- are needed to specify the highest entropy states; while just a few bits, the equivalent to, say 'all 1', or 'all 0' (which could be, depending on the coding scheme, realized with just a single bit, 1 or 0) suffice to specify the states of lowest entropy. In general, a system which can be in any of k (micro-)states is said to have n = log2(k) bits of entropy. So the set of states, i.e. coin throw sequences, described by 'as many heads as tails', of which we surmised there were roughly 1029, has an entropy equal to log2(1029), which roughly works out to 96.3 -- close enough (to 100) for government work, as they say.
It's natural to check how this understanding holds up in extreme circumstances. Take a very big and very hot system -- a system with a very high entropy, in other words -- such as a star. Stars, though often used as a poetic metaphor for eternity and immutability, are nevertheless finite things -- at some point, they'll run out of steam, so to speak. Knowing what we know now about entropy and the second law of thermodynamics, this should not come as a shock: at some point, everything runs out of steam.
However, stars are essentially huge nuclear explosions in a precarious equilibrium: the power of their inner nuclear furnace, pushing outwards, counterbalances the gravitation of their own mass, which wants to concentrate itself as highly as possible. Thus, once the nuclear fire dies, gravitation wins out -- the star collapses.
The collapse itself is a complicated and fascinating process, and it is what gives rise to the phenomena of novae and supernovae, but for present purposes, all that matters is that 1) the star shrinks, and 2) entropy goes up (as it, of course, must).
Now, if the star is massive enough (roughly twenty times as massive as our own sun), it eventually shrinks down to a point where the gravity at its surface is so strong, the escape velocity -- the speed you need to impart on anything in order to have it escape from a body into deep space, as opposed to 'falling back down' -- exceeds that of light: thus, since the speed of light is the fastest anything can go, no radiation, no signals, nothing ever reaches the outside universe from beyond that point -- a black hole is born. The invisible boundary in spacetime that marks this 'point of no return' is called the event horizon. (This concept -- if not fleshed out to present sophistication, obviously -- is much older than is generally thought; already in the 1780s, Reverend John Mitchel and French physicist and mathematician Pierre-Simon Laplace talked about hypothetical 'dark stars'.)
Now, it is a characteristic of black holes, treated with the machinery of Einstein's General Theory of Relativity, that they can be described using just a few numbers -- namely their mass, charge, and angular momentum; this is known as the no hair theorem [1].
We thus have quite a short description that applies to the black hole; compare this to the very messy, very complicated microstate of a collapsing star. It is clear that some very lossy compression has taken place; thus, we should expect for black holes to be very high entropy objects.
However, it is actually the case that in General Relativity, mass, charge and angular momentum characterise the complete microstate of the black hole -- thus, its entropy, as defined by the logarithm of the number of microstates, must be quite low!
What gives?
Well, here, a theorem by Stephen Hawking comes into play. He discovered in the 1970s [2] that in all the processes a black hole can undergo, its total surface area -- i.e. the area of its event horizon -- can only ever increase, or at best stay constant. Sounds familiar?
Indeed, it shows the same behaviour as entropy (of a closed system). One may thus postulate a relationship between horizon area and entropy, and indeed, it turns out that the most simple relationship -- a straightforward proportionality -- does the job [3]. Thus, whenever a system with some entropy is 'thrown into' a black hole, the black hole's horizon area increases by an amount proportional to that entropy -- the constant of proportionality (in Planck units) simply being equal to 1/4.
This value, known as the Bekenstein bound, places an upper limit on the amount of entropy within a given volume of spacetime, which is only reached by black holes (they 'saturate' this bound); this has important consequences, suggesting, among other things, that the amount of information in a finite volume of space must be finite, which is at odds with the assumption of spacetime itself being a continuous quantity, an issue which I will return to in a future post. (For those wanting to read ahead a little, an interested-layman level discussion by Bekenstein can be found in [4].)
There is something puzzling about this area-dependence: starting with a volume of gas, and adding more gas molecules to it, since entropy depends on the number of total states, which is in turn related to the volume the gas occupies (since the positions of the 'new' atoms can be arranged everywhere in that volume, and they can move freely within it), one should expect entropy to scale with the volume; but evidently, at some point, this expectation must break down.
This peculiar feature has led to the conjecture known as the holographic principle: the complete, three dimensional information within a given 'part' of space can be thought of as being encoded on the two-dimensional surface bounding it. The precise way in which this encoding works, though, is still subject to some debate.
Another question is left open: where does this entropy come from? Does a black hole have certain 'microstates' that account for it? If so, of what kind are they?
This is currently an active topic of research, which I plan to return to. For the moment, we can rest assured in the knowledge that the second law continues to hold, albeit in slightly modified form: the sum total of (thermodynamic) entropy and black hole horizon area can only ever increase, or at best stay constant.

[1] Ruffini R. and Wheeler J. A.: Physics Today, 24, no. 12, 30 (1971)
[2] Hawking S.W.: Physical Review Letters, 26, 1344 (1971)
[3] Bekenstein J.D.: Lettere al Nuovo Cimento, 4, 737, (1972) (pdf link)
[4] Bekenstein, J. D.: Information in the holographic universe, Scientific American, 289, no. 2, 58-65 (2003) (weblink)