Re: pure chance

Brian D. Harper (harper.10@osu.edu)
Fri, 17 Jan 1997 15:58:44 -0500

At 09:53 PM 1/15/97 -0800, Greg wrote:

[...]

BH:===
>> But this process if it were to occur would seem highly destructive.
>> Perhaps I'm goofing things up, but I had an intuitive feeling that
>> the extra storage capacity of DNA might be a mandatory requirement
>> for evolution to occur.
>
Greg:===
>How so? My intuition isn't telling me anything here...
>

Intuition is always tough to describe. Roughly, I had this idea
that the extra storage capacity might allow, or make easier,
the emergence of new things. If every protein were a "selfish"
protein, for example, it doesn't seem that there would be
much chance of discovering something new.

[...]

>
>I see what you mean, and I think I've been attempting to say something
>like that, although not so well. I think Shannon's assumptions
>ARE met by DNA codon sequences, but that that may just not MEAN very
>much in the actual world. i.e. it is certainly true that DNA chains
>hold information, but just because one can maximize information content
>on streams or write down expressions for mutual information doesn't
>mean that these concepts refer to anything meaningful in biology.
>

This takes us back to a question [i.e. "this is nice, but what good
is it?"] that I have a hard time addressing since I know so little
about molecular biology. I'll try anyway, of course ;-). Actually,
I think this would be a great question to ask on bionet.info-theory,
tactfully of course. Something like listing the top five contributions
of information theory to biology.

Here's a few stabs of mine, not necessarily five and not necessarily
profoundly significant.

1). The first thing that comes to mind is the Shannon-Macmillan
theorem that I mentioned recently in another post. Concluding
a section on this theorem in his book, Yockey writes:

===========
Probability theory contains many theorems that are
contrary to most people's intuition. Author's in
molecular biology, almost without exception, are
unaware of the Shannon-McMillin theorem, and have
been led to false conclusions. They do not realize
that, by the same token as the sequences in the
dice throws, all DNA, mRNA and protein sequences
are in the high probability group and are a very
tiny fraction of the total possible number of such
sequences.
-- Yockey, <Information Theory and Molecular Biology>,
Cambridge University Press, p. 76.
=============

What a teaser, I wish he would elaborate more on the false conclusions
that were reached.

2). Another item that keeps coming up is the Central Dogma.
Let's see if we can take a trip back in time to the days when
the structure of DNA was being unraveled and the genetic code
was being deciphered. Could information and
coding theory have played a roll at this point? We're just imagining,
so never mind that this was also roughly the time period when
Shannon was making his monumental contributions to the
subject. Well, suspecting that information was being transferred
from DNA to protein and knowing there are 61 codons and 20
amino acids one could have predicted that there should be a
central dogma. I think a prediction like this, before the fact,
would have been monumental. Yockey mentions that Crick did
predict from the 61-20 that the code was redundant before this
was actually determine experimentally.

3). Your mention of mutual information reminded me of something
Yockey harped on a lot in his book, namely he wants to get
rid of the vague and meaningless measure "similarity" [as in chimps
are 99.6% similar to humans] and replace it with mutual entropy.
Here's are a quote from his book to illustrate:

=======================
A distinguished group of molecular biologists (Jukes & Bhushan,
1986; Reeck _et al_., 1987; Lewin, 1987) has called attention to
sloppy terminology in the misuse of 'homology' and 'similarity'.
Nevertheless, editors still permit authors to qualify 'homology'
and to confuse that word with similarity. _Mutual entropy_ is the
correct and robust concept and measure of similarity so that the
sooner _per cent identity_ disappears from usage the better. Mutual
entropy is a mathematical idea that reflects the intuitive feeling
that there is a quantity which we may call information content
in homologous protein sequences. Clearly, the shortest message
which describes at least one member of the family of sequences is
what one would properly call the information content.
-- Hubert Yockey, _Information Theory and Molecular Biology_,
Cambridge University Press, 1992, p. 337.
=========================

4) In his book Yockey uses coding theory to show how the genetic
code could have evolved by a stochastic process (random walk).
Part of this was the transition from a doublet to a triplet code.
I don't want to go into many details (since I don't understand
them :). One of his concluding remarks about this in the
_Epilogue_ is, I think, important wrt the origins debate, so
I'll quote that:

==============
"...The argument led to support of the endosymbiotic theory
without being so contrived. The theory shows that the
number of triplet codes that merit consideration is limited
to the number of codes in the last two steps in the evolution
of the doublet codes. The modern triplet genetic codes emerged
from the bottleneck between the first extension doublet codes
and the second extension triplet codes. It was by this means
that the several modern triplet codes evolved without the
necessity of trying the vast number of possible triplet codes."
Yockey, ibid. p. 338.
===============

This reminded me of a basic tenet in many creationist probability
arguments. That random searches just aren't effective (in finite
time) because there are just too many possibilties to search.
Yockey concluded that one can make the transition from doublet
codes to the modern triplet code without searching all the
possibilities.

Brian Harper
Associate Professor
Applied Mechanics
Ohio State University