Re: Information: Brad's reply (was Information: a very

Greg Billock (billgr@cco.caltech.edu)
Mon, 29 Jun 1998 08:47:05 -0700 (PDT)

Brad Jones:

[...]

> I can accept what you are saying here but have some questions:
>
> *Biology question*
> 1. If the mutation does not occur in DNA replication then all it would do
> is create a different protein every now and then. How can this lead to
> cumulative mutations that in turn lead to macro evolution?

It can't. In order to be evolutionarily persistent, the mutation
has to occur in replication (*except that, as we've been discussing, there
are some good reasons to think that evolution isn't just DNA determinism).

> For evolution to happen the DNA itself must mutate, not an occasional glitch
> in the creation of proteins.
>
> This is what I assumed you were talking about as it is what could lead to
> change in the organism.
>
> I would like your opinion on this as I am certainly no expert on biology.

You're right; the DNA has to be mutated in reproduction. As to how
they accumulate, the mutation is fixed in the population. Many mutations
can be fixed simultaneously, and be at different stages of being fixed.
This fixation can happen 'for no reason at all' (genetic drift) or due
to selection pressures, or due to non-drift accidental factors. Various
folks are variously interested in all these sorts of mechanisms.

> *Information Theory*
> A glitch in putting out a symbol on a random basis is exactly what I was
> talking about in my analogy to a CD copy. This is an "information channel"
> and as such any random variation on an information channel ALWAYS reduces
> the channels information carrying capacity.

Here you consider the DNA as a channel. It is true that any errors
introduced by a channel degrades the information transmitted through the
channel. If DNA is the channel, however, what is the information source?
Usually the DNA is taken as the source, and the protein coding, or
reproductive process, is taken as the channel.

> > You are using the generational axis and we were using the sequence axis in
> > our calculation of information. That is why your analogy is flawed.
> >
>
> CD's are still like DNA. You can consider a CD in both sequential and
> generational modes, that is why a CD is so good an analogy. You actually
> strengthen the analogy by bringing up another axis in which the CD is
> similar to DNA.

If you are considering DNA to be the channel, then DNA is quite *unlike*
CDs. In a CD player, the CD is the source of information, and the
consumer is whatever process is reading the CD. Your analogy is
fairly straightforward, but I think you're misapplying it.

[...]

> As I said above. The better we know the source the closer we can model the
> information content. Example given was if we know what language a text is we
> can achieve better compression on it.

But in your analogy, the text is the channel, not the source. Again, I
think you're mixing source and channel identifications in your analysis,
which is causing some information loss. :-)

> Therefore if we have no knowledge of a source then yes, we cannot tell them
> apart. HOWEVER this is NOT the TRUE information content, this is just the
> model we are using.
>
> The better the model the closer it is to the TRUE information content. And
> if we know that one source is gibberish we can just turn it off and ignore
> it.

You just said that information theory makes no distinction between
gibberish and meaningful messages, but here you seem again to forget it.
There is information in gibberish; probably more than in plain text.

If you want a theory of meaning, you'll have to develop it yourself;
information theory simply won't serve.

> Therefore it follows that if we know that mutations are caused by RANDOM
> mutations then we can confidently say they do not add information.

This depends on which incarnation of the analogy we're dealing with.
Since there is some confusion there, let me list the ones I've seen
and the answers:

1. DNA is a channel
Random mutations will *decrease* the mutual information between
source and sink.
2. How much can you compress a DNA sequence
Random mutations will either *increase* or *decrease* the length
of the compressed string. (On balance, there will be an increase.)
3. DNA as source
Random mutations will *increase* the information content of the source.

In a straightforward (and oversimplistic) biology example, suppose we
have two parents (AA) and (aA). How much information is there in knowing
the DNA of their offspring?

Well, it could be (AA) or (aA). Probabilities are .5, and .5,
and the information resolved in learning the actual sequence is just
1 bit.

Now suppose there is a small probability that (a->b). Now we have
(AA), (aA), and (bA). The information is slightly higher now, due to
the fact that the logarithm is sublinear. The information resolved
in observing *any of the three* is identical.

One of the reasons it is oversimplistic, of course, is that we are usually
concerned with populations, and having a new (bA) in the population
can make it much more interesting.

DNA-as-channel is interesting if organisms care explicitly about the
sequence of their DNA. Since we've only known what it is for a few
decades, it is usually taken for granted that this is biologically
unrealistic. That is, organisms care more about their offsprings'
well-being, and not at all about their DNA. So the 'meaningfulness'
that guides to a particular model tends to make DNA-as-channel less
apropriate. (Certainly for somatic protein-manufacture it is
inappropriate.)

> _____________________________________________________________
> Do you aggree or disagree with the following statments?
>
> "Information theory is pure nonsense! Noise is usually modelled as a random
> source and a random source contains the most information since all symbols
> are equiprobable. Thus the most informative information source is noise."
> _____________________________________________________________

Looks like a trick question. The source conveys lots of information,
but its not very "informative," since that's something information theory
lays no claim to be about.

-Greg