Re: Information: Brad's reply (was Information: a very technical definition (was Dawkins' video))

Stephen Jones (sejones@ibm.net)
Tue, 23 Jun 1998 22:17:39 +0800

Glenn

On Mon, 15 Jun 1998 20:17:35 -0500, Glenn R. Morton wrote:

[...]

>SJ>I have a son, Brad, who is studying InformationTechnology
>Engineering at university and he is currently doing Information
>Theory. I showed him your message and he said what you are
>saying here is "wrong" (his word). He gave me a very detailed
>explanation why (which I did not fully understand), and even if I did
>it sounded to >complex for me to repeat!

GM>Brad better wait til he gets his degree. The equation I presented
>above shows why there is no information in a sequence like
>AAAAAAAAAA. There are no choices of characters so the
>probability of getting A is 100% or 1.
>Thus when you put that into the above equation
>
>H=-K (1) log(1).
>
>Well the log of 1 is ZERO, 0. This means that -K times 1 times
zero is
>ZERO. NO information.
>
>In the case of AAAAATAAAA the probabilities of the characters
are:
>
>P(A)=.9
>P(T)=.1
>
>Thus
>
>H=-K (.9log(.9)+.1log(.1))= -K(-.041-.1)=.14K.
>
>As long as K is not zero, which it isn't then the mutation Brad says
>doesn't increase information does exactly that. There is more
information
>in the later case than in the former.
>
>This should be enough for now.

Here is my son Brad's reply:

------------------------------------------------------------
Glenn,

I think there are two major problems with your reasoning.

1. you are assuming that the information content of DNA is best
modeled as a information source. As I will show below this is not the
case and by doing so you will not produce any informative results.

2. Your analysis of DNA as an information source is also incorrect
and your results are misleading.

The main mistake that you made was to assume that DNA is a "zero
memory" source. A zero memory source outputs symbols that do not
depend on the previous symbols; this is not the case with DNA.

A DNA sequence of AAAAATAAAA will output this each and every
time eg: AAAAATAAAA AAAAATAAAA AAAAATAAAA
AAAAATAAAA

A zero memory source with the probabilities given would produce
something like: AATAAAAAAA AAAAATAAAA AAAAAAATAA
ATAAATAAAA

If you would like this mathematically then the analysis is as follows:

*note that log to base 2 is used to give the result in bits*

sequence: AAAAATAAAA AAAAATAAAA AAAAATAAAA
AAAAATAAAA
zero memory:
P(A) = 0.9, P(T) = 0.1
H(S)= sum( p(i)*log(1/p(i)) )
=0.47 bits

2nd extension:
using groups of two: AA AA AT AA AA ...
P(AA) = 4/5 P(AT) = 1/5
H(S^2) = 0.72
H(S) = 0.36 bits

The 2nd extension has a lower information content than the zero
memory, this indicates that the source is not a zero memory source.

You can extend this by taking the 3rd, 4th etc and the information
content will continue to drop. eventually the information rate will
drop to zero as the 10th extension shows:

AAAAATAAAA AAAAATAAAA AAAAATAAAA
AAAAATAAAA
P(AAAAATAAAA) = 1
H(S) = 0

source not a zero memory source.

The general rule is that any information source that repeats a
sequence will have zero information. In plain english this is:

"If we already know what is being sent then there is no information
being sent"

btw this is the basic principal that all compression software relies on.

So you can see that modeling DNA as a information source tells us
nothing at all.

The method I would use to model DNA is as an information channel.
This is due to DNA being equivalent to a storage device for
information (exactly like a CD etc) It can be played as many times as
it likes but it is not _creating_ any information each time it is played.

The mutations of DNA seem analogous to the errors encountered
when copying a CD which is quite easily modeled by a correct
application of information theory.

By doing it this way it is possible to model the random mutations and
the effect they have on the information, ie the difference they make to
the information content as opposed to the actual information content.
The measure of this is called the mutual information of a channel.

I hope this clears it up somewhat, it is quite difficult to explain this in
easy terms and I would recommend finding a good textbook if you
really want to pursue this.

Brad Jones
------------------------------------------------------------

Steve

"Evolution is the greatest engine of atheism ever invented."
--- Dr. William Provine, Professor of History and Biology, Cornell University.
http://fp.bio.utk.edu/darwin/1998/slides_view/Slide_7.html

--------------------------------------------------------------------
Stephen E (Steve) Jones ,--_|\ sejones@ibm.net
3 Hawker Avenue / Oz \ Steve.Jones@health.wa.gov.au
Warwick 6024 ->*_,--\_/ Phone +61 8 9448 7439
Perth, West Australia v "Test everything." (1Thess 5:21)
--------------------------------------------------------------------