Re: pure chance

billgr@cco.caltech.edu
Mon, 30 Dec 1996 10:16:24 -0800 (PST)

Brian Harper:

[...]


> Now to your questions, which raise some very valid points.
> >From the mathematics presented by the authors it seems
> unnececessary to actually know what the information content
> is in order to know whether or not it increases. This is determined
> solely by the frequencies P1 and P2. Which raises then another
> question, how accurately are P1 and P2 known? It is obvious
> that these are not known precisely, nevertheless, in the authors
> defense, they don't have to be known precisely. Are they known
> well enough to be confident in saying P1>P2? Still, though,
> regardless of this the mathematics shows that a random mutation
> will likely increase the information content.

Thanks for the explanation of the authors derivation of increasing
information with mutation. I'm still concerned, though, that they
are taking an incorrect view as to how broad their window into the
genome needs to be to get correct readings about information increase
or decrease. For example, take the mutilated English word *uestion.
How much information do you get by knowing that the * stands for 'q'?
Practically none at all. And this is despite the fact that the
frequency of 'q' is very low in English! You would be silly to
suggest 'e' for the *, and the reason is that you are considering
the surrounding text (the word fragment, in this case). Sure, if
you think DNA is like marbles stuck together, and don't consider
long-range correlations in your prior distribution, the author's
derivation you refer to is quite correct. I remain unconvinced,
though, that it actually *means* anything useful.

To take another example, suppose we decide to mutate a dictionary
by changing all the 'q's to letters of higher frequency (at random).
The entropy in the dictionary will increase, because *long-range*
parts of the distribution are being done away with, even though
short-range ARE, in fact, being lessened. If anything is dependent
on long-range parts to the distribution, that thing is DNA, so
that is what is making me nervous. Is this a better explanation
(but still wrong :-))?

-Greg