Algorithmic Complexity [2/2]

Brian D. Harper (bharper@magnus.acs.ohio-state.edu)
Tue, 23 Jan 1996 15:43:44 -0500

Now, let's apply our intuition to two more binary strings

(C) 0110101000001001111001100110011111110011101111001100100100001000

(D) 1111110110111101110010111001111111111111011110101111110110011111

At first sight, sequence (C) appears to be very much like sequence (B)
above. There is no obvious pattern and, in fact, the sequence passes
most tests of randomness. Is this result "typical" for a stochastic
process such as flipping a fair coin? In fact, the sequence is very
atypical for such a process since it happens to be the binary expansion
of SQRT(2) - 1 and is thus compressible, simple, ordered. From this we
learn that there is never any guarantee that we will be able to discover
the order in a particular set of data (binary string). In fact, Chaitin
has shown that it is impossible to prove that any specific result is
random. This turns out to be one of the most surprising things to
result from algorithmic information theory. From the results of the
counting argument presented above we know that practically all strings
of a given length are random, nevertheless we cannot prove that any
specific one actually is random. This may seem at first to be a
serious problem with the algorithmic definition of randomness, however,
Chaitin has shown that this undecidability is closely related to Godel's
incompleteness theorem. Thus, if this really is a problem it is a
problem associated with mathematics itself and not with algorithmic
information theory. Its also important to note that this uncertainty
is associated with randomness and not order. One can easily prove that
a string is compressible (ordered) by compressing it. Thus, there is no
doubt that the string "01010101...." is ordered. There is, however, some
uncertainty as to whether sequence (B) above is really random even though
I went to the trouble of flipping a coin 64 times to produce the string.
The results of the counting argument above should convince us that this
sequence is most likely random, but we never know for sure. There may be
some hidden order in the sequence that is beyond our abilities to detect.
This is very unlikely but still possible.

Now let's consider sequence (D) above. This sequence would have to be
considered atypical for the stochastic process of flipping a fair coin
due to the preponderance of heads (1's). Since each flip of the coin has
an equal probability of producing a head or a tail, we generally expect
a long sequence to have about the same number of heads as tails. In fact,
it can be shown that a certain amount of compression is guaranteed based
solely on the ratio of the number of heads to the sequence length. In the
above case this ratio is 50/64 = 0.78 and from the results in Chaitin (**),
this ratio guarantees that the sequence is *at least* 24% compressible.