Complexity and Information Theory

David_Bowman@georgetowncollege.edu
Tue, 30 Nov 1999 18:38:11 -0500

Concerning the "Complexity and Information Theory" essay (by a
physicist-friend of Susan) first posted by Susan on 12-Oct-99 and later
reposted by Chris on 29-Nov-99:

I find much to agree with in this essay--especially the criticisms of
certain creationist abuses of the second law regarding biological
evolution and the conclusions and overall upshot of the piece. However,
I think that the author has both overstated the differences between
thermodynamic entropy and information-theortic entropy as well as said
some things that are liable to confuse an otherwise uninformed reader by
using certain key words such as information and entropy in a sloppy and
confused way. This latter problem is, unfortunately, common in the
field. Since the article is so long, and since I agree with much of it,
I will only discuss the parts of it that I think are at least misleading,
and possibly just wrong. Thus, this criticism is not meant to be a
thorough trashing of the entire piece even though it might seem that way
from just my comments below. For the most part I agree with the parts I
haven't criticised.

>But what about this idea of information then? If the Second Law of
>Thermodynamics says nothing about information, obviously information theory
>has a lot to say about it. Does evolution, the idea that organisms can
>become more complex over succeeding generations, violate information theory?
>The short answer is "no." For the long answer, keep reading.

It is not true that "the Second Law of Thermodynamics says nothing about
information". In fact, thermodynamic entropy (with which the 2nd law is
very concerned) is an example of an information-theoretic quantitity.
To be precise, the thermodynamic entropy possessed by a macroscopic
system *is* the average minimal amount of further information necessary
to determine, with certainty, which microscopic state the system is
actually in, given that the system is defined by only its macroscopic
state. The thermodynamic entropy of a system is a measure of how much
ignorance about the system's microscopic state attends a merely
macroscopic description of the system. *If* one was actually given all
the information necessary to determine the system's microscopic state
(the measure of the *amount* of such is the entropy) (and one knew what
to do with this information by way of calculation) then this information
in addition to that provided by the previously known and defined
macroscopic state of the system constitutes, in principle, a perfect
determination of everything there is to know about the system at hand.

>One of the reasons I kept using the perhaps annoying phrase "thermodynamic
>entropy" in the above discussion of the Second Law of Thermodynamics is that
>"entropy" shows up in information theory as well. However, "informational
>entropy" and "thermodynamic entropy" are not analogs.

Again, this is either not true, or at best, it is quite misleading. The
thermodynamic entropy is a *special case* of informational entropy. It
is the informational (i.e. Gibbs/Shannon/Weaver) entropy of Bayesian
probability theory applied to the statistical ensemble whose probability
space is the set of all allowed microscopic states of the system as
constrained by the system's macroscopic description.

>Temperature is an
>important aspect of thermodynamics, but there is nothing comparable to
>temperature in information theory (Yockey, 1992).

This is true, but it does not mean that thermodynamic and information-
theoretic entropy are necessarily unrelated. Rather, it is a consequence
of the fact that information theory, as such, on a probability space
relevant to other kinds of areas (such as communication theory) has no
analog of the concept of the internal energy. Since the *meaning* of
thermodynamic temperature is that it is the rate at which the system's
internal energy changes (i.e. partial derivative) when the entropy is
increased quasistatically by an infinitesimal amount (under conditions
which forbid any macroscopic work from being done during the change),
we see that we cannot define a temperature analog until we first define
an analog of the internal energy. In the view of statistical mechanics
the concepts of *both* internal energy *and* entropy are logically prior
concepts to the concept of temperature.

>Thus one should keep the
>concept of entropy in thermodynamics separate from the quantity known as
>"entropy" in information theory.

In statistical thermodynamics *and* in information theory *and* in
Bayesian probability theory in general the entropy is:
SUM{i, p_i*log(1/p_i)}. Here p_i is the probability that the i-th
outcome of occurs when a sample is drawn from a particular probability
distribution characterized by the set {i} of possible outcomes. The
only diference between the various kinds of entropy is merely the
difference due to the the fact that the various entropies are defined on
different probability spaces. But this difference is enough to make
thermodynamic entropy considerations irrelevant in the calculations of
the probability calculations that have to be done for biological
evolution.

>However, one thing "thermodynamic entropy"
>and "informational entropy" have in common is that they both have a tendency
>to increase. (Although, as we have discussed, thermodynamic entropy can
>decrease locally under the right conditions.)

This is one thing that is not necessarily the case in general for generic
applications of the entropy concept. It just so happens that the 2nd law
implies a tendency to increase in the thermodynamic entropy and in the
context of a communication system that is prone to errors the Shannon
entropy of the message ensemble also tends to increase. It is not
necessary for any generic application of the entropy concept to have an
increasing tendency with time. It depends on just what the context of
the specific application is.

>.... However, a reasonable definition of information content can be
>described as "a measure of the informational entropy of a message." For the
>purpose of this article I will define informational entropy as the
>complexity of the message. Therefore, for the purposes of this article,
>information content of a message can be described as the minimum number of
>instructions needed to describe the message with certainty. This definition
>approximates the idea of the Kolmogorov-Chaitin algorithmic entropy of a
>message (see Yockey, 1992). The consequence of these definitions is that a
>complex message has high informational entropy and high information content.
>Conversely, a highly ordered, repetitive message will have a low
>informational entropy and a low information content.

This above quote is a rather unfortunate conflation of the concepts of
entropy and complexity. Unfortunately, the author here is not alone is
making this confusing identification. IMO one should *not* use the term
'entropy' in the context of the Kolmogorov-Chaitin concept algorithmic
complexity. These concepts are *different* and each such concept should
be labeled with its own word. We should only use 'entropy' to refer to
the SUM{i, p_i*log(1/p_i)} functional of a probability distribution
{p_i}, and we should only use the phrase 'algorithmic complexity' to
refer to the minimal (bitwise) length of a complete description of some
entity.

>Let's look as some examples.
>
>The string of bits
>
>010101010101010101
>
>has low informational entropy. While you could say the string is 20 bits
>long and say that is its information content, the definition I am using says
>that the information content is low for the sequence because the sequence
>can be described as a simple instruction such as
>
>1. Repeat "01" 10 times

Again this is bad terminology. The above string does not even possess
an entropy (in the Gibbs/Shannon sense). It is much better to say that
the above string has a low *algorithmic complexity* (or just low
complexity for short). The above string is a single realization or
sample, and as such, it does not define an appropriate probability
distribution {p_i} which is needed to evaluate the entropy for that
distribution. The entropy of a statistical ensemble is a statistical
property of the whole ensemble/probability distribution; it is not a
property defined on the individual outcomes of the distribution. OTOH,
the complexity *is* defined on such individual strings. The entropy
of a probability distribution is the average minimal amount further
information necessary to determine, with certainty, which outcome
occurs (from an enumerated list of such outcomes) when a sample is
drawn from that distribution, given that the only prior information
about the process is contained in the values of the probability
distribution itself. OTOH the complexity of an individual
state/occurance/string/etc. *is* defined, and is just the
shortest-length complete set of instructions needed to exactly
reproduce that individual state/occurance/string/etc.

>In other words, this highly ordered sequence is simple rather than complex.

This is true, and that is why we should say the string has a low
'complexity' rather than confuse things by saying is has a 'low
informational entropy'.

>On the other hand, if we look at 20 random bits
>
>01011010101000111001
>
>the complexity of the message is greater. The information content, as
>determined by the Kolmogorov-Chaitin algorithmic entropy is also higher.
>This is because it takes a longer instruction set to describe the bit
>sequence.

Yes, the complexity of this string is greated than the previous string
for the reason stated. But we should not call this complexity the
"Kolmogorov-Chaitin algorithmic entropy". If anything we should call it
the complexity (or, at most, the 'Kolmogorov-Chaitin algorithmic
complexity').

Just because the entropy and the complexitiy are both information
measures, that does not mean that they are both the *same* information
measure, and they should not be confused with each other, because they
*are* different.

Except for the previously noted problem with the terminology of the
article I tend to agree with the jist of the rest of that article.

David Bowman
David_Bowman@georgetowncollege.edu