Re: design: purposeful or random? 2/2

Stephen Jones (sejones@ibm.net)
Mon, 17 Feb 97 06:03:27 +0800

Group

[continued]

BH>Now, my other example with the string of A's is a very good
>example because it illustrates so well the ideas Algorithmic
>Information Content (AIC) and also shows how one can get an
>intuitive grasp of these ideas in terms of the "descriptive
>length". Granted, it is a simple example, but simple examples are
>best for getting down the basic ideas. Once we have these in hand,
>we can go on.

Sorry, but I do not agree that "string of A's is a very good
example". I think it is the exact opposite!

BH>Another way of illustrating (as opposed to "descriptive length")
>is to actually compress the strings using some compression
>algorithm. Each of the following strings were saved as an ascii
>file and then compressed using gzip. The compressed size in bytes
>is given in [brackets] following each string. The basic idea is to
>introduce sequentially one "mutation" after another. Here's the
>results:
>
>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA................[35]
>
>AAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAA................[40]
>
>AAAAAAAAAAAAAXAAAAAAAAAAAAAXAAAAAAAAAAAA................[41]
>
>AAXAAAAAAAAAAXAAAAAAAAAAAAAXAAAAAAAAAAAA................[44]
>
>AAXAAAAAAAAAAXAAAAAAAXAAAAAXAAAAAAAAAAAA................[48]

While "descriptive length" and compressibility no doubt has a lot of
relevance to "information theory", I cannot see its relevance to
specified complexity.

BH>Now, first of all, you said elsewhere that my string of A's
>contained no information. I would beg to differ, since I just
>measured it in bytes. You may say it doesn't mean anything to you.
>I say so what, it means a great deal to me :). You may counter
>that it is not useful information, to which I respond that it has
>been very useful to me. [I'm not playing games here but rather
>>trying to illustrate why attaching value and meaning to the
>concept of information just ties you in knots. You can't do
>anything with it.]

That just goes to show that "information theory" is dealing with a
different aspect of "information", namely processing information down
a communication channel. It has nothing to say about "attaching
value and meaning" which is precisely the thing I was talking about
when I said: "Sorry, but one thing that `random mutation' cannot do
is to `create new information'". By "create new information" I mean
add "meaning".

BH>Note that with each "mutation" there is a measured increase in
>information. But we have to look at the compressed size to tell
>this. All five ascii files had the same uncompressed size.

No doubt it is an "increase in information" in the sense used by
information theory. But it is not an "increase in information" in
the sense of "meaning".

BH>I actually made a little slip-up in the above which turns out to
>be fortuitous since it will illustrate another point and also show
>the usefullness of AIC even in such a simple example. My slip- up
>was that I had intended to insert the X's at random places but I
>wasn't really being that careful about it. When I looked at the
>compressed sizes I noticed something I thought was odd. As each
>new X was introduced there was an increase in size of between 3 and
>5 bytes with one exception, going from the 2nd to the 3rd resulted
>in an increase of only one byte. This aroused my suspicions, and
>on looking at the sequences more carefully I noticed I had
>inadvertantly introduced a pattern in the third sequence, thirteen
>A's followed by an X appears twice. The gzip algorithm took
>advantage of this pattern, resulting in a smaller increase in size
>than would normally be expected.

OK.

BH>We also see from this example a difference in how the word
>random can be used. As far as I was aware, my actions were random
>in introducing the X's. I definitely gave it no conscious thought.
>I was in a hurry, so I just plugged an X, saved it, plugged another
>X etc. The definition of "random" in AIT is precise and objective.
>The gzip program has no idea how the sequence was generated or
>whether the pattern was placed intentionally or not (whether it
>"meant" anything to me). It found the pattern because it was
>there, irrespective of whether it "means" anything.

OK. The "patterns" that I am interested in are those that do "mean"
something:

"Linguists in the 1950's, most notably Noam Chomsky and George
Miller, asked dramatically how many grammatical English sentences
could be constructed with 100 letters. Approximately 10 to the 25th
power (10^25), they answered. This is a very large number. But a
sentence is one thing; a sequence, another. A sentence obeys the
laws of English grammar; a sequence is lawless and comprises any
concatenation of those 100 letters. If there are roughly (10^25)
sentences at hand, the number of sequences 100 letters in length is,
by way of contrast, 26 to the 100th power (26^100). This is an
inconceivably greater number. The space of possibilities has blown
up, the explosive process being one of combinatorial inflation. Now,
the vast majority of sequences drawn on a finite alphabet fail to
make a statement: they consist of letters arranged to no point or
purpose. It is the contrast between sentences and sequences that
carries the full critical weight of memory and intuition. Organized
as a writhing ball, the sequences resemble a planet-sized object, one
as large as pale Pluto. Landing almost anywhere on that planet,
linguists see nothing but nonsense. Meaning resides with the
grammatical sequences, but they, those sentences, occupy an area no
larger than a dime. How on earth could the sentences be discovered
by chance amid such an infernal and hyperborean immensity of
gibberish? They cannot be discovered by chance, and, of course,
chance plays no role in their discovery. The linguist or the native
English-speaker moves around the place or planet with a perfectly
secure sense of where he should go, and what he is apt to see. The
eerie and unexpected presence of an alphabet in every living creature
might suggest the possibility of a similar argument in biology. It
is DNA, of course, that acts as life's primordial text, the code
itself organized in nucleic triplets, like messages in Morse code.
Each triplet is matched to a particular chemical object, an amino
acid. There are twenty such acids in all. They correspond to
letters in an alphabet. As the code is read somewhere in life's
hidden housing, the linear order of the nucleic acids induces a
corresponding linear order in the amino acids. The biological finger
writes, and what the cell reads is an ordered presentation of such
amino acids-a protein. Like the nucleic acids, proteins are
alphabetic objects, composed of discrete constituents. On average,
proteins are roughly 250 amino acid residues in length, so a given
protein may be imagined as a long biochemical word, one of many. The
aspects of an analogy are now in place. What is needed is a relevant
contrast, something comparable to sentences and sequences in
language. Of course nothing completely comparable is at hand: there
are no sentences in molecular biology. Nonetheless, there is this
fact, helpfully recounted by Richard Dawkins: "The actual animals
that have ever lived on earth are a tiny subset of the theoretical
animals that could exist." It follows that over the course of four
billion years, life has expressed itself by means of a particular
stock of proteins, a certain set of life-like words. A combinatorial
count is now possible. The MIT physicist Murray Eden, to whom I owe
this argument, estimates the number of the viable proteins at 10 to
the 50th power (10^50). within this set is the raw material of
everything that has ever lived: the flowering plants and the alien
insects and the seagoing turtles and shambling dinosaurs, the great
evolutionary successes and the great evolutionary failures as well.
These creatures are, quite literally: composed of the proteins that
over the course of time have performed some useful function with
"usefulness" now standing for the sense of sentencehood in
linguistics. As in the case of language, what has once lived
occupies some corner in the space of a larger array of possibilities,
the actual residues in the shadow of the possible. The space of all
possible proteins of a fixed length (250 residues, recall) is
computed by multiplying 20 by itself 250 times (20^250). It is idle
to carry out the calculation. The number is larger by far than
seconds in the history of the world since the Big Bang or grains of
sand on the shores of every sounding sea. Another planet now looms
in the night sky, Pluto-sized or bigger, a conceptual companion to
the planet containing every sequence composed by endlessly arranging
the 26 English letters into sequences 100 letters in length. This
planetary doppelganger is the planet of all possible proteins of
fixed length, the planet, in a certain sense, of every conceivable
form of carbon-based life. And there the two planets lie, spinning
on their soundless axes. The contrast between sentences and
sequences on Pluto reappears on Pluto's double as the contrast
between useful protein forms and all the rest; and it reappears in
terms of the same dramatic difference in numbers, the enormous
(20^250) overawing the merely big (10^50), the contrast between the
two being quite literally between an immense and swollen planet and a
dime's worth of area. That dime-sized corner, which on Pluto
contains the English sentences, on Pluto's double contains the living
creatures; and there the biologist may be seen tramping, the warm
puddle of wet life achingly distinct amid the planet's snow and stray
proteins. It is here that living creatures whatever their ultimate
fate, breathed and moaned and carried on, life evidently having
discovered the small quiet corner of the space of possibilities in
which things work." (Berlinski D., "The Deniable Darwin",
Commentary, June 1996, p24)

BH>Now, lets do the same thing with your suggested message, again
>the compressed size in bytes is given after each sequence.
>
>THIS SEQUENCE OF LETTERS CONTAINS A MESSAGE [73]
>
>THIS SEQUENCE OF LXTTERS CONTAINS A MESSAGE [74]
>
>THIS SEQUENCE OF LXTTERS CONTAINS A MEXSAGE [75]
>
>THIS SEQUXNCE OF LXTTERS CONTAINS A MEXSAGE [76]
>
>THXS SEQUXNCE OF LXTTERS CONTAINS A MEXSAGE [77]
>
>THXS SEQUXNCE OF LXTTERS COXTAINS A MEXSAGE [78]
>
>Now, I don't know how anyone could maintain that the last sentence
>doesn't contain more information than the first. At this very
>moment it is taking up 78 bytes of space on my hard drive whereas
>the first is taking up only 73 bytes.

It is now quite clear that Brian is using a different meaning of the
word "information" than what I am. The above random mutations are
degrading the original message in terms of their meaningfulness. If
this was analogous to random mutations in genetic text, it would be
harmful.

BH>Yes, I know this idea of information takes some getting used to.
>Look at it as a measure of the amount of information irrespective
>of its meaning. Although leaving out meaning seems a great
>concession, it is actually a great strength. Look at the impact
>Shannon's results had on communication. Also, based on this
>definition of information Yockey is able to discredit practically
>every scenario for the origin of life.

I have no problem "getting used to" the above. I am sure it has been
very useful in "communication", ie. from an enginering perspective. I
note with interest that even using that definition, "Yockey is able
to discredit practically every scenario for the origin of life". But
it is just a different definition of the word "information" that I
was using.

BH>So, the seeming weakness is a great strength because one obtains
>an objective measure. With an objective measure one can actually
>begin to evaluate claims made by both evolutionists and
>creationists.

Not unless the "objective measure" can handle "meaning".

BH>A final example. Am I going to complain about my thermometer
>because all it can do is give me the temperature in my oven and it
>can't tell me what that temperature means (i.e. whether I burnt
>the cookies)? Likewise, information theory can measure the amount
>of information, it can't tell you what it means (thank goodness
>!!).

If "information theory can measure the amount of information", yet
"it can't tell you what it means" then it is inadequate to deal with
the "information" I am talking about.

[...]

>SJ>No. Brian asked me to "provide some justification" *in terms of
>information theory*. I cannot do this, and in any event I made no
>claim about information theory. My original request was in terms
>of biology:

BH>Sorry, Steve, but this simply is not true. I never asked you to
>justify your statement in terms of information theory. Here is my
>original question:
>
>How would one define "information" in such a way that a random
>process would not result in an increase in information? The only
>objective definitions of information that I know of are those found
>in information theory. These information measures are maximal for
>random processes.

Well, if Brian says that "The only objective definitions of
information that I know of are those found in information theory",
then if I don't "justify" my "statement in terms of information
theory" then Brian can say that I have not "justified" it at all!

BH>I'm merely asking for your definition and since turn about is fair
>play I'm giving you my definition. And don't say that you gave
>your definition as specified complexity. You didn't bring this up
>till later.

That's right. All along I undertood "information" as "specified
complexity", so when Brian tried to define it as per "information
theory", I clarified what I meant by it.

BH>Specified complexity still hasn't been defined in any
>objective way, perhaps the paper by Dembski will help. Hopefully
>what I wrote in this post will clarify why I want an objective
>definition. There is no way to evaluate claims about whether
>specified complexity increases, decreases, remains constant etc.
>unless we can measure it.

I do not know that "Specified complexity still hasn't been defined in
any objective way", but I await with interest the reception of "the
paper by Dembski".

BH>Anyway, back to my request. You made a similar complaint
>earlier that I was asking for a definition of information based on
>information theory. I tried to clarify that this was not my
>intention by writing:
>
>You misunderstood my request. You are free to define information
>any way you wish [except, of course, something like "that quantity
>which does not increase due to a random mutation" ]. I merely
>mentioned that the only objective definitions I know about come
>from information theory (classical or algorithmic).

See above. Brian says that I am "free to define information any way"
I "wish" but he quickly adds that "the only objective definitions I
know about come from information theory". This effectively means
that I cannot "define information" any way" I "wish"!

BH>Tell me Steve, how can I make it any clearer?

It's perfectly "clear" that Brian is setting up the rules so that he
can win the game. I do not accept his rules and I am not going to
play his game. It may be that "information" in the sense of
"meaning" cannot be made "objective" in the sense that Brian demands.
But that does not make it any the less real. It just shows the
limitations of "information theory" which is based on scientific
materialism. The same "objective" problem applies to consciousness
which is more real to each individual than any amount of "objective"
science.

BH>[I've responded to roughly half of Steve's post. This is enough I
>think as I'll probably just start repeating myself if I go on]

In a sense Brian has been "repeating" himself all along. This is
because he has taken little notice of what I mean by "information"
(specified complexity), and has instead used his own definition of
"information" (descriptive length), which I reject as irrelevant. I
will expand and clarify my original statement:

"While random mutation can create new `information' in the
information theory sense of increased descriptive length, it cannot
consistently create new information in the sense of specified
complexity."

God bless.

Steve

-------------------------------------------------------------------
| Stephen E (Steve) Jones ,--_|\ sejones@ibm.net |
| 3 Hawker Avenue / Oz \ Steve.Jones@health.wa.gov.au |
| Warwick 6024 ->*_,--\_/ Phone +61 9 448 7439 (These are |
| Perth, West Australia v my opinions, not my employer's) |
-------------------------------------------------------------------