RE: What does ID mean?

Brian D Harper (bharper@postbox.acs.ohio-state.edu)
Sat, 18 Apr 1998 23:13:28 -0400

At 07:30 PM 4/16/98 -0500, Glenn wrote:

>
>When a mathematician says 'fundamentally undecidable", it means that it is
>impossible to determine whether a sequence is random or designed. This means
>that we can't tell whether the genetic code was randomly produced or
>intelligently designed.
>
>I have never heard an ID person address this crucial issue.
>

This topic is very interesting to me so I thought I
say a few words about it. Hopefully I will not
incite the wrath of Glenn ;-).

Let me start out in the classical but seemingly
round about way. Suppose we toss a fair coin 64 times
recording 0 for tails and 1 for heads. Would either of
the following two sequences be more surprising?

(A) 0101010101010101010101010101010101010101010101010101010101010101

(B) 1101111001110101101101101001101110101101111000101110010100011011

Actually, I asked this question on talk.origins several years
ago and many people said something to the effect that while
they probably would be more surprised to get (A), they shouldn't
be because both sequences have exactly the same probability of
occurring. Though this answer is unsettling, it is correct from
the point of view of classical probability theory. Consider
the following quote from a recent book as an illustration of
the extent to which people will allow theory to overcome
common sense:

======begin quote====================================
I have just tossed a coin 7 times, and I ask you, who
have not seen the result, to guess which of the three
sequences below represents the sequence of my results.
I guarantee that one of the sequences is genuine. If
you don't get it right, you lose 10 dollars; if you
win, you get 30. H stands for heads, and T for Tails.

1. HHHHTTT

2. THHTHTT

3. TTTTTTT

On which would you bet? Let's think for a moment before
going on. <<If you think too much you'll lose ;-) --BH>>

Experiments with a great many subjects have shown that
the bets will be placed in the following order:2,1,3.
The preference for the second sequence is very strong.
But probability theory tells us that in seven tosses of
a coin the probabilities are totally even, and we rationally
should be quite indifferent to which of the three sequences
we choose. The person who chooses 2 is prey to one of the
most common cognitive illusions; she mistakes the most
<typical> for the most <probable>.
-- Piattelli-Palmarini, <Inevitable Illusions: How Mistakes of
Reason Rule Our Minds>, John Wiley & Sons, 1994, p. 49-50.
=========================================================

Wow, if only I could get him to put his money where
his mouth is and actually make this wager with me :).

Regarding the conclusion that each sequence has the same
probability and thus one should not be more surprising
than the other, Gregory Chaitin (one of the co-discoverers
of algorithmic information theory-AIT) writes "The conclusion
is singularly unhelpful in distinguishing the random from
the orderly. Clearly a more sensible definition of
randomness is required, one that does not contradict the
intuitive concept of a 'patternless' number."

As might be expected from the above, the "more sensible
definition of randomness" comes from algorithmic information
theory. AIT succeeds in this by putting many sequences into
groups, according to their compressibility, so that one no
longer has to deal with individual sequences that all have
the same probability of occurring. An interesting result
that comes from this is that practically every sequence
of a given length is random, i.e. is incompressible. From
this we conclude that *any* ordered sequence (not just the
specific one in (A) but any) is unlikely to occur by tossing
a fair coin. And so our intuition is rescued and
Piattelli-Palmarini will lose a lot of money ;-).

But we have to be extremely careful here. At first sight
this might seem to contradict what Glenn wrote. Also, I
imagine that this may be what Paul has in mind by a low
probability event.

In the above scenario we have some additional "inside
information". We know the process is stochastic and
we know the probability distribution. AIT is an intrinsic
measure of information content of a sequence and has nothing
to do with the process (stochastic or deterministic) which
generated it. Nevertheless, we can combine AIT with
probability theory to reach some significant conclusions,
as was done above.

While I'm at it, let me give what seems to me the most
common statement of the undecidability issue. A surprising
result is that even though practically all sequences are
random, one cannot prove that any specific sequence is
random. Whether any specific sequence is random is
fundamentally undecidable.

I don't particularly like the way Glenn wrote this:

>When a mathematician says 'fundamentally undecidable",
>it means that it is impossible to determine whether a
>sequence is random or designed. This means that we can't
>tell whether the genetic code was randomly produced or
>intelligently designed.

IMHO, this reinforces a false dichotomy wherein something
is either random or intelligently designed. This leaves
out an important alternative, i.e. physical laws ;-).
Also, the meaning of the word "random" is different
in AIT than in normal usage. It simply means incompressible
and is reasonably synonymous to irreducibly complex. Something
can be both random and intelligently designed.

OK, I have one last example which I designed awhile back
to make an important point.

Consider the following sequence:

(C) 1111110110111101110010111001111111111111011110101111110110011111
1101111111111101111111111111011111111001111111101101001111111110
1111101110110111111101111011111011101111000101111110111111111111
0111101111011111101101111001111001110110110111011101111001110101

This sequence would have to be considered atypical for the
stochastic process of flipping a fair coin due to the
preponderance of heads (1's). Since each flip of the coin
has an equal probability of producing a head or a tail, we
generally expect a long sequence to have about the same
number of heads as tails. In fact, it can be shown that a
certain amount of compression is guaranteed based solely on
the ratio of the number of heads to the sequence length. In
the above case this ratio is 202/256 = 0.789 and from Chaitin's
results this ratio guarantees that the sequence is *at least*
26% compressible. Skipping the math, it is possible to calculate
further that there is only one chance in about 10^20 that a
sequence of length n=256 will be 26% compressible. Again,
this is *any* sequence, not the specific one above.

Now for the punch line. Should these calculations convince us
that it is very unlikely that this sequence is the result of
a stochastic process? As a matter of fact, this sequence *was*
produced by a stochastic process wherein a 10 faced die was
cast 256 times (see how patient I am :). A 0 was recorded if
the die landed with a 1 or 2 showing face up, otherwise a 1
was recorded (this would be equivalent to tossing an unfair
coin that has an 80% probability of heads). Note that with this
additional information about the stochastic process sequence
C is no longer surprising. Since the chance of getting a 1 on
each roll is 80% we expect a long sequence to have about 80% 1's.
Sequence (C) has 79% 1's.

The main point of this example is that probability calculations
are meaningless unless one knows the probability distribution.
This is particularly significant since in the absence of such
knowledge it is common to assume that all possibilities occur
with equal probability.

Brian Harper
Associate Professor
Applied Mechanics
The Ohio State University

"It is not certain that all is uncertain,
to the glory of skepticism." -- Pascal