Re: pure chance

billgr@cco.caltech.edu
Thu, 19 Dec 1996 22:30:10 -0800 (PST)

Brian Harper:

[...]

> Now I would like to return to another question, whether the
> information content (as defined by Shannon) increases during
> evolution or more specifically due to a mutation. Based primarily
> on what little I know about info-theory and my intuition I had
> indicated in a "conversation" with Steve Jones that a random
> mutation would increase the info-content. This was in response
> to a bold assertion by Steve that information content would
> never increase due to random mutation. There was also a

I just attended a series of talks about how information theory
might help figure out how neurons in the brain work. Mixed in
was a more general discussion of how Info Theory might apply to
biology. The problem is that the Shannon theorem is extremely
broad--it is a theorem that given three or four (depending on
how you look at it) assumptions, the Shannon entropy measure IS
the correct measure for information. No-one wants to abandon
any of the assumptions. Trouble is, how does one apply that
to biology? Do critters try to maximize the information in
their genome? Doesn't seem obvious--genomes are ordered, and a
good thing, too!

Does the information in the genome increase by mutation? On
average, yes, but there can certainly be cases where mutations
would decrease the information in a genome. (Think of it as
a sort of 'annealing.') Those would be less likely, though, so
on average, mutation would be expected to increase the entropy
in the genome (which corresponds to our expectations about
diversity and suchlike).

> challenge to show such a case. Rummaging through my collection
> of papers I managed to find some concrete evidence that mutations
> do increase the Shannon IC. The reference is:
>
> J. S. Rao, C.P. Geevan and G.S. Rao (1982). "Significance of the
> Information Content of DNA in Mutations and Evolution,"
> <J. Theor. Biology> 96:571-577.
>
> Here the authors consider one point mutations and show that the
> only requirement for the Shannon IC to increase is that the
> frequency of the codon which mutates must be larger than the
> frequency of the codon to which it is mutated.

This doesn't seem convincing. Surely there must be other ingredients?

i.e. 100100100100100100100100110100100100
--> 100100100100100100100100100100100100
^

Although this is looking at something more similar to Kolmogorov
measures than to Shannon measures...

> "Assuming that the spontaneous mutations occur randomly
> at the DNA level, the more frequently occuring codons would
> tend to mutate more frequently. This in turn leads to an
> increase in the information content of the DNA."

OK...assuming there is a more-or-less random mix of codons...

> The above statement appears in the discussion section and is
> offered as a way of understanding the empirical results presented.
> The authors analyzed a bunch of data for one point mutations in
> the human haemoglobin gene and found that out of a total of
> 204 one point mutations, 139 resulted in an increase in IC,
> 2 resulted in no change in IC, 54 in a decrease with 9 being
> uncertain. So, one point mutations resulted in an increase in
> information content about 70% of the time.

How did they measure the information content of the genome?
The other problem with using Shannon measures is that you have
to have a complete prior distribution. Assuming something as
important as this seems fraught with peril to me. Perhaps the
biologists know something I don't, though....Can anyone enlighten
me? It would seem that to measure the information content of
the genome in question, you would have to figure out how much
information is conveyed to you when you are told which genome
actually is present. To do this, you need a prior distribution
over all genomes. I have a hard time believing the biologists
know this distribution.

-Greg