Re: design: purposeful or random?

Brian D. Harper (harper.10@osu.edu)
Mon, 10 Feb 1997 21:42:27 -0500

At 10:33 PM 2/9/97 +0800, Steve Jones wrote:

>
>BH>I don't mean this to be negative, my suggestion is that you just
>>let slide stuff more than a week or two old and get caught up.
>>You'll be much more effective this way.
>

SJ:==
>OK. I might do that.
>

To help you out, I've decided not to reply to any of your most recent
group of posts except for this one. I chose to reply to this one for
four reasons (1) information and complexity is one of my favorite
topics (2) unlike most of our other "discussions" this one may
have some interest to everyone else (3) this topic is, IMHO, extremely
important to the debate on origins and (4) Bill Dembski's NTSE
conference paper "Intelligent Design as a Theory of Information"
could prove very useful in getting over the hurdle of defining what
specified complexity is.

Burgy provided us with the URL for this:

http://www.dla.utexas.edu/depts/philosophy/faculty/koons/ntse/ntse.html

>>BH>There were a couple of reasons for my challenge above. One was to
>>see if you had any understanding of the quotes you were giving
>>out. The other was a genuine curiosity about the answer to the
>>question I posed. As you are no doubt already aware, I'm not
>>particularly a fan of neo-Darwinism and if there is an information
>>theoretic argument against it then I'm certainly interested in
>>knowing about it. But hand waving and word games such as those
>>provided by W-S won't do.
>
>>SJ>I was responding to Brian's specific request that I define
>>"information" in terms of "information theory":
>
>BH>You misunderstood my request. You are free to define
>>information any way you wish [except, of course, something
>>like "that quantity which does not increase due to a random
>>mutation" ]. I merely mentioned that the only objective definitions
>>I know about come from information theory (classical or algorithmic).
>
>[...]
>
>SJ>My point was not that I cannot define "information" but that I
>>cannot define it "in `information theory'...terms". I understand
>>what "information" is as described by scientific writers in books
>>addressed to laymen like myself, ie. as "specified complexity":
>
>BH>One problem is that "information" can mean all sorts of different
>>things in books written for laymen. Its very confusing sometimes
>>figuring out just what is meant by a particular author. But I
>>think "specified complexity" corresponds fairly well to the
>>meaning of "information" in algorithmic information theory.
>
>> http://www.research.ibm.com/people/c/chaitin
>> http://www.research.ibm.com/people/c/chaitin/inv.html
>
>Thanks to Brian for the above. I will look them up some time. But I
>think I will stick to laymen's definitions like "specified
>complexity".
>
>>SJ>"Information in this context means the precise determination, or
>specification, of a sequence of letters. We said above that a
>>message represents `specified complexity.' We are now able to
>>understand what specified means. The more highly specified a thing
>>is, the fewer choices there are about fulfilling each instruction.
>>In a random situation, options are unlimited and each option is
>>equally probable." (Bradley W.L. & Thaxton C.B., in Moreland J.P.
>>ed., "The Creation Hypothesis", 1994, p207)
>
>BH>Oops.
>

SJ:===
>What is the "Oops" about?
>

B&T's statement: "In a random situation, options are unlimited and
each option is equally probable."

I followed up on this oops later in this thread in an attempt to clarify,
you probably haven't seen it yet.

[...]

>BH>Briefly, the algorithmic info content (or Kolmogorov complexity)
>>can be thought of roughly in terms of "descriptive length". The
>>longer the description of an object, the greater its complexity.
>>Of course, one is talking here of the length of the shortest
>>description so that B&T's "I love you" book above could be described
>>"I love you" repeat 4000 times. The descriptive length is small so
>>the complexity of this message is small.
>
SJ:===
>Agreed.
>
>BH>The reason I thought at first that "specified complexity"
>>corresponded roughly to Kolmogorov complexity is that I was thinking
>>in terms how long it takes to specify an object.
>>
>>Now, let me illustrate why the descriptive complexity (algorithmic
>>information content) is generally expected to increase due to a
>>random mutation. First we consider the following message written
>>in our alphabet with 26 letters:
>>
>>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ...............
>>
>>Now we introduce a random mutation anywhere, say:
>>
>>AAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAA................
>>
>>The first sequence has a small descriptive length:
>>
>>AA repeat
>>
>>the second has a much longer descriptive length:
>>
>>AAAAAAAAAAAAAXAA repeat A
>

SJ:===
>This is *not* an example of increasing the information of an already
>specified complex string. The string of AAAAs has zero information
>content, so anything would be an improvement! But this has *no*
>analogy with a living system. There may indeed have been an increase
>in "algorithmic information content" by a "random mutation" but I
>cannot see that it has "created" any "new information", in the sense
>that I am using it, ie. on the analogy of an English sentence like
>"John Loves Mary". Thaxton, Bradley & Olsen illustrate:
>
>"Three sets of letter arrangements show nicely the difference between
>order and complexity in relation to information:
>
>1. An ordered (periodic) and therefore specified arrangement:
>
>THE END THE END THE END THE END*
>
>Example: Nylon, or a crystal.
>
>2. A complex (aperiodic) and therefore specified arrangement:
>
>AGDCBFE GBCAFED ACEDFBG
>
>3. complex (aperiodic) unspecified arrangement::
>
>THIS SEQUENCE OF LETTERS CONTAINS A MESSAGE
>
>Example: DNA, protein.
>
>Yockey and Wickens develop the same distinction, explaining
>that "order" is a statistical concept referring to regularity such as
>might characterize a series of digits in a number, or the ions of an
>inorganic crystal. On the other hand, "organization" refers to physical
>systems and the specific set of spatio-temporal and functional
>relationships among their parts. Yockey and Wickens note that
>informational macromolecules have a low degree of order but a high
>degree of specified complexity. In short, the redundant order of
>crystals cannot give rise to specified complexity of the kind or
>magnitude found in biological organization; attempts to relate the two
>have little future." (Thaxton C.B., Bradley W.L. & Olsen R.L., "The
>Mystery of Life's Origin" 1992, p130)
>
>Maybe Brian can do the above with a *real* English sentence, like
>THIS SEQUENCE OF LETTERS CONTAINS A MESSAGE?
>

I will do just that in a moment. It is important to emphasize at this
point though that dealing with English sentences is just by way of
analogy and all analogies break down eventually. For example, in
English there is a lot of intersymbol influence whereas in proteins
there is no intersymbol influence (I'm not an expert in molecular
biology either :) so someone please correct me if I'm wrong).

But the real problem with the English analogy is the temptation to
draw meaning from the words themselves. This is a confusion that
plagues so many discussions of information on both sides of the
fence. Dawkins falls for this trap in his "me thinks it is like a
weasal" word game. More subtly, Manfred Eigen also commits the
error with his "value parameter" in his Hypercycles scenario for
the origin of information.

The pioneers of information theory warned of this trap from the
beginning:

================================
The fundamental problem of communication is that of reproducing
at one point either exactly or approximately a message selected
at another point. Frequently the messages have MEANING; that is
they refer to or are correlated according to some system with
certain physical or conceptual entitites. These semantic aspects
of communication are irrelevant to the engineering problem.
--Shannon, <Bell System Technical Journal> v27 p379 (1948).
================================

To tie this in with biology we can observe that the genetic information
processing system can process the information for forming a
non-functional protein as easily as it can for a functional protein.

Another information pioneer said something very similar:

===================================================
What I hope to accomplish in this direction is to set up a
quantitative measure whereby the capacities of various systems
to transmit information may be compared.
Hartley <Bell System Technical Journal> v7 p535-563 (1928).
===================================================

I note that your quote of TB&O above refers to Hubert Yockey
and Jefferey Wicken. There is good reason to mention these
two since both are recognized experts in biological applications
of information theory. Its been awhile since I've read Wicken
so I won't try to rely on my memory to discuss his views. Yockey
I'm much more familiar with. Everything I've written here follows
directly from Yockey's work. Let me give a few quotes.

First, the one Glenn likes to quote so often ;-)

====================================
Thus both random sequences and highly organized sequences are
_complex_ because a long algorithm is needed to describe each
one. Information theory shows that it is _fundamentally_
_undecidable_ whether a given sequence has been generated by
a stochastic process or by a highly organized process.
-- H.P. Yockey, _Information Theory and Molecular Biology_,
Cambridge University Press, 1992, p. 82.
=====================================

Note that *both* random sequences and highly organized sequences are
complex (contain a lot of information).

====================================================
The entropy that is applicable to the case of the evolution
of the genetic message is, as I believe the reader should now be
convinced, the Shannon entropy of information theory or the
Kolmogorov-Chaitin algorithmic entropy. ...

The Kolmogorov-Chaitin genetic algorithmic entropy is increased
in evolution due to the duplications that occurred in DNA. [...]
Thus the genetic algorithmic entropy increases with time just
as the Maxwell-Boltzmann-Gibbs entropy does. Therefore creationists,
who are fond of citing evolution as being in violation of the
second law of thermodynamics (Wilder-Smith, 1981; Gish, 1989),
are hoist by their own petard: evolution is not based on increasing
_order_, it is based on increasing _complexity_. In fact, evolution
requires an increase in the Kolmogorov-Chaitin algorithmic entropy
of the genome in order to generate the complexity necessary for the
higher organisms. Let us recall from section 2.4.3 that _highly
organized_ sequences, by the same token, have a large Shannon
entropy and are embedded in the portion of the Shannon entropy
scale also occupied by _random sequences_.Evolution is not in
violation of the second law of thermodynamics. This is what any
reasonable scientist believes; nevertheless, it is important to
avoid word-traps and to reach the correct conclusion for the correct
reasons.
-- Hubert Yockey,_Information Theory and Molecular Biology_,
Cambridge University Press, 1992, p. 310-313.
===========================================

>[...]
>
>SJ>But if Glenn or Brian has an example in the scientific literature
>>of a "random mutation" that has "created new information", they
>>could post a reference to it.
>
>BH>You have an example above. You can find another example in the
>>pure chance thread.
>

SJ:===
>The above is not an "example" at all. And I find it strange that I am
>referred to a web site. I do not regard web sites as "the scientific
>literature".

Well, first of all, the web site you were referred to contains almost
all of the papers (published in the "best" journals) of one of the
founders of algorithmic information theory. A real time saver, especially
since many are not going to have ready access to most of the journals.
In any event, what you were referred to is definitely the literature.

More importantly, this was not the example I was talking about. Note
that I referred you to the "pure chance" thread where this example
was discussed extensively. Since you have been skipping over messages
that don't have your name in it you probably missed the example:

J.S. Rao and C.P. Geevan, "Significance of the Information
Content of DNA in mutations and Evolution," <J Theor.
Biology>, 96:571-577, 1982.

Now, my other example with the string of A's is a very good example
because it illustrates so well the ideas Algorithmic Information
Content (AIC) and also shows how one can get an intuitive grasp
of these ideas in terms of the "descriptive length". Granted, it is
a simple example, but simple examples are best for getting down
the basic ideas. Once we have these in hand, we can go on.

Another way of illustrating (as opposed to "descriptive length") is
to actually compress the strings using some compression algorithm.
Each of the following strings were saved as an ascii file and then
compressed using gzip. The compressed size in bytes is given
in [brackets] following each string. The basic idea is to introduce
sequentially one "mutation" after another. Here's the results:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA................[35]

AAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAA................[40]

AAAAAAAAAAAAAXAAAAAAAAAAAAAXAAAAAAAAAAAA................[41]

AAXAAAAAAAAAAXAAAAAAAAAAAAAXAAAAAAAAAAAA................[44]

AAXAAAAAAAAAAXAAAAAAAXAAAAAXAAAAAAAAAAAA................[48]

Now, first of all, you said elsewhere that my string of A's contained
no information. I would beg to differ, since I just measured it in
bytes. You may say it doesn't mean anything to you. I say so what,
it means a great deal to me :). You may counter that it is
not useful information, to which I respond that it has been
very useful to me. [I'm not playing games here but rather
trying to illustrate why attaching value and meaning to the
concept of information just ties you in knots. You can't do
anything with it.]

Note that with each "mutation" there is a measured increase
in information. But we have to look at the compressed size to
tell this. All five ascii files had the same uncompressed size.

I actually made a little slip-up in the above which turns out to
be fortuitous since it will illustrate another point and also show
the usefullness of AIC even in such a simple example. My slip-
up was that I had intended to insert the X's at random places
but I wasn't really being that careful about it. When I looked at
the compressed sizes I noticed something I thought was odd.
As each new X was introduced there was an increase in size
of between 3 and 5 bytes with one exception, going from the 2nd
to the 3rd resulted in an increase of only one byte. This aroused
my suspicions, and on looking at the sequences more carefully
I noticed I had inadvertantly introduced a pattern in the third
sequence, thirteen A's followed by an X appears twice. The gzip
algorithm took advantage of this pattern, resulting in a smaller
increase in size than would normally be expected.

We also see from this example a difference in how the word
random can be used. As far as I was aware, my actions were
random in introducing the X's. I definitely gave it no conscious
thought. I was in a hurry, so I just plugged an X, saved it,
plugged another X etc. The definition of "random" in AIT is
precise and objective. The gzip program has no idea how the
sequence was generated or whether the pattern was placed
intentionally or not (whether it "meant" anything to me).
It found the pattern because it was there, irrespective of
whether it "means" anything.

Now, lets do the same thing with your suggested message,
again the compressed size in bytes is given after each
sequence.

THIS SEQUENCE OF LETTERS CONTAINS A MESSAGE [73]

THIS SEQUENCE OF LXTTERS CONTAINS A MESSAGE [74]

THIS SEQUENCE OF LXTTERS CONTAINS A MEXSAGE [75]

THIS SEQUXNCE OF LXTTERS CONTAINS A MEXSAGE [76]

THXS SEQUXNCE OF LXTTERS CONTAINS A MEXSAGE [77]

THXS SEQUXNCE OF LXTTERS COXTAINS A MEXSAGE [78]

Now, I don't know how anyone could maintain that the last
sentence doesn't contain more information than the first.
At this very moment it is taking up 78 bytes of space on my
hard drive whereas the first is taking up only 73 bytes.

Yes, I know this idea of information takes some getting used to.
Look at it as a measure of the amount of information irrespective
of its meaning. Although leaving out meaning seems a great
concession, it is actually a great strength. Look at the impact
Shannon's results had on communication. Also, based on this
definition of information Yockey is able to discredit practically
every scenario for the origin of life.

So, the seeming weakness is a great strength because one obtains
an objective measure. With an objective measure one can actually
begin to evaluate claims made by both evolutionists and creationists.

A final example. Am I going to complain about my thermometer
because all it can do is give me the temperature in my oven and
it can't tell me what that temperature means (i.e. whether I
burnt the cookies)? Likewise, information theory can measure
the amount of information, it can't tell you what it means (thank
goodness !!).

[...]

>
>SJ>All I did was deny that "random mutation" can "create new
>>information".
>
>BH>And all I did was ask you to provide some justification for
>>your denial. BTW, you did more than just deny this, you
>>also quoted WS thinking that that supprted your denial.
>
SJ:==
>No. Brian asked me to "provide some justification" *in terms of
>information theory*. I cannot do this, and in any event I made no
>claim about information theory. My original request was in terms of
>biology:
>

Sorry, Steve, but this simply is not true. I never asked you to justify
your statement in terms of information theory. Here is my original
question:

>How would one define "information" in such a way that a random
>process would not result in an increase in information? The only
>objective definitions of information that I know of are those found
>in information theory. These information measures are maximal
>for random processes.

I'm merely asking for your definition and since turn about is fair
play I'm giving you my definition. And don't say that you gave
your definition as specified complexity. You didn't bring this up
till later. Specified complexity still hasn't been defined in any
objective way, perhaps the paper by Dembski will help. Hopefully
what I wrote in this post will clarify why I want an objective
definition. There is no way to evaluate claims about whether specified complexity increases, decreases,remains constant etc. unless we can
measure it.

Anyway, back to my request. You made a similar complaint
earlier that I was asking for a definition of information based
on information theory. I tried to clarify that this was not my
intention by writing:

>You misunderstood my request. You are free to define
>information any way you wish [except, of course, something
>like "that quantity which does not increase due to a random
>mutation" ]. I merely mentioned that the only objective definitions
>I know about come from information theory (classical or algorithmic).

Tell me Steve, how can I make it any clearer?

[I've responded to roughly half of Steve's post. This is enough I
think as I'll probably just start repeating myself if I go on]

Brian Harper
Associate Professor
Applied Mechanics
Ohio State University
"Aw, Wilbur" -- Mr. Ed