Algorithmic Complexity [1/2]

Brian D. Harper (bharper@magnus.acs.ohio-state.edu)
Tue, 23 Jan 1996 15:06:36 -0500

Sorry about the delay, I intended to finish this up and post it yesterday but
got side-tracked by several other things.

Due to length, I'll post this in two parts. As a reminder, this is a
rough draft of one section of an article I'm writing so I would
appreciate it if it doesn't go beyond the reflector at this point.

Also, this section deals primarily with the algorithmic definition
of such things as order, complexity, and randomness and doesn't
address directly some of the interesting implications. These will
be in later sections. Just to give you a taste for the article as
a whole, here are some of the other sections that will be included:

** Intuitive complexity: various objections to algorthimic complexity

** Confusion over words [and how AC can clear up those confusions]

** Assessment (using AC) of various "paradigms" in complexity

a) "emergence at the edge of order and chaos"
[if chaos ==> deterministic chaos (as it usually does)
then this is nonsensical since determistic chaos *is*
order, there's no edge ;-)]

b) "self-organized criticality" the new paradigm should read
"self-ordered criticality"

** the point of the above section is not to be critical for the sake
of being critical but rather to point out that certain directions
are most likely dead-ends. I'll also try to point out the positive
aspects of the "sciences of complexity".

Here's the first part of my post, still a little rough but I figured I
better go ahead and post it before everyone loses interest :-).

============================================================================

*Algorithmic Complexity*

Suppose that you are an experimentalist observing and recording
data for some phenomena that can be in only one of two possible states.
For example, you may be observing widgets moving down a conveyor belt.
The widgets are either red or blue and you record 0 if the widget is
red and 1 if it is blue. Instead, you may be watching a simple pendulum
swing back and forth, recording 0 if it swings clockwise at a given
instant and 1 if it swings counterclockwise. Or, you may be flipping
a coin and recording 0 for heads and 1 for tails. Suppose that after
twenty observations your data looks as follows:

01010101010101010101

At this point you would like to construct a "theory" from your
experimental observations. You would like this theory to do two things:
(1) account for the data presently available and (2) make predictions
about future observations. To begin with, we will write our theories
as we have written the data, i.e. as binary strings. Upon reflection,
it should be obvious that there are many theories that we could write
down for our data, here are a few of the possibilities:

(A) 01010101010101010101000000000000000000000000000000

(B) 01010101010101010101001001001001001001001001001001

(C) 01010101010101010101010101010101010101010101010101

(D) 01010101010101010101000111000111000111000111000111

(E) 01010101010101010101001011101110010110100011011100

Note that each theory incorporates the original data. The numbers following
the original data are predictions about how the system will behave in the
future. The present case is very simple, given that we have observed 10 01's
in a row the best prediction is that the system will continue to produce 01's
_ad infinitum_. Thus, the best "theory" (theory C) is intuitively obvious
in the present case. Nevertheless, we would still like to have some
rational, objective way of deciding among various possible theories that
does not rely
too heavily upon intuition. This may prove very useful for situations
where the data is not quite so "simple" as above.

In 1964, Ray Solomonoff (Solomonoff, 1964) proposed, in analogy with
Occam's razor, that the "best" theory is the "simplest" theory where simple
takes on a very precise, objective meaning, i.e. the simplest theory is the
one with the shortest descriptive length. By descriptive length one normally
means the length of a computer program required to print out the theory
(i.e. the binary strings above). All kinds of questions naturally arise
at this point as to whether this is really an objective measure, i.e.
it may seem that the descriptive length would depend upon which specific
computer and computer language is used. These details are beyond the
scope of this article, suffice it to say that these problems have been
worked out (Cover and Thomas, 1991) and that it is actually sufficient
for us to discuss descriptive length in terms of what we might call
"word programs". Following are five such programs that will print out
the five "theories" above:

(A) PRINT 01 10 times, PRINT 0 30 times

(B) PRINT 01 10 times, PRINT 001 10 times

(C) PRINT 01 25 times

(D) PRINT 01 10 times, PRINT 000111 6 times

(E) PRINT 01 10 times, PRINT 001011101110010110100011011100

First note how the original data can be compressed to "PRINT 01 10 times"
which is considerably shorter than "PRINT 01010101010101010101". More
importantly, the form of this compression allows for "long" predictions
with only minor increases in length in the theory. The theory in this
case could be considered a "law": "PRINT 01 X times". The length of the
prediction [2*(X-5)] grows much more rapidly than the length of the
compressed theory. For example, even if X were 10 trillion, the theory
still fits comfortably on a single line

"PRINT 01 10,000,000,000,000 times"

Two years after Solomonoff published his seminal work Gregory Chaitin,
a bright undergraduate at City University of New York, arrived at
essentially at the same idea (Chaitin, 1966). Chaitin was interested
in finding an objective way to define the "complexity" of a finite
binary string. Does it make any sense to say that the first 1000 digits
of the binary expansion of the number PI is more or less complex than
the first 1000 digits of e or than a string produced by flipping a
coin 1000 times recording 1 for heads and 0 for tails?

There are several ways to discuss the significance of Chaitin's result.
First, we consider the situation of a scientist trying to make sense
of a set of experimental data. While this is very similar to Solomonoff's
problem there is an important distinction. In this case the scientist
is not interested in making predictions about future observations but
rather in discovering whether the data itself contains any patterns or
regularities, i.e. whether there is a simple theory that explains the
data. Chaitin defined the complexity of a finite binary string as the
size of the smallest program which calculates it. If the string can be
compressed into a very short program then one would conclude that the
string has a pattern, that it follows a law, that it is simple.
Similarly, if the string cannot be compressed at all then the sequence
is said to be maximally complex or random. The idea can also be
generalized to include "things" other than numbers, in Chaitin's words, "the
complexity of something is the size of the smallest program which computes
it or a complete description of it" (Chaitin, 1970).

All this talk of computers, programs and finite binary strings may,
at first, be somewhat intimidating and may also suggest that this
algorithmic complexity business is of interest only in rather specialized
areas of computer science. Actually, nothing could be further from the
truth and it is hoped that a few examples may dispel such notions. First
of all, algorithmic information theory provides a precise definition of
complexity allowing one, in principle at least, to measure the complexity
of an object, yielding a number with units (usually) expressed in bits.
True, in many practical situations actually carrying out this measurement
may be infeasible; i.e. what is the complexity of an elephant in bits?
Nevertheless, the ideas presented here are very powerful even in cases
where it is infeasible to actually carry out the computations. For example,
it should be clear from the above that the complexity of a bicycle is much
less than that of a car, not because of some subjective notion we have
about cars being more "complex" than bicycles but because the set of
instructions required to build a car would obviously be much longer
than those required to build a bicycle. An even better illustration can
be found in John Casti's book <Searching for Certainty>. Casti replaces
the Turing machine (what one generally means by "computer" in algorithmic
information theory) by a virtual Chocolate Cake Machine (CCM) and the
computer program by the recipe for making a specific chocolate cake:

To operate the CCM, we shovel eggs, milk, flour, chocolate,
and all the other raw materials that might be needed for a
cake into one end, along with a recipe for making, say, the
famous _Sachertorte_. The CCM would then proceed to process
the ingredients in accordance with the instructions given by
the recipe, eventually serving up the called-for _Sachertorte_
at the machine's out-put slot.
-- John Casti (Casti, 1990, p. 324).

Casti treats this example in some detail in order to illustrate a
number of key concepts in algorithmic information theory. Here we
emphasize the point that algorithmic information theory would rank
the complexity of chocolate cakes in an intuitively appealing way
according to the length of the set of instructions required to make
the cake. I imagine that most pastry chefs would arrive at the same
notion of cake complexity without taking a single computer science
course.

At this point, the skeptic is probably thinking that they jolly well
don't need to be writing and compressing algorithms to figure out that
a bicycle is less complex than a car or that a cupcake is less complex
than a _Sachertorte_ [1]. In establishing the usefulness of a measure
of complexity it is essential to consider simple cases first in order
to guarantee that the measure deals correctly with situations that are
intuitively obvious. This provides some confidence in applying the
measure in more difficult situations. Some such applications will
be considered later in this article.

==== end part 1 =================

========================
Brian Harper |
Associate Professor | "It is not certain that all is uncertain,
Applied Mechanics | to the glory of skepticism" -- Pascal
Ohio State University |
========================