RE: Information: Brad's reply (was Information: a very

Brad Jones (bjones@tartarus.uwa.edu.au)
Tue, 30 Jun 1998 13:20:03 +0800

>
> Brad Jones:
>
> [...]
>
> > I can accept what you are saying here but have some questions:
> >
> > *Biology question*
> > 1. If the mutation does not occur in DNA replication then all
> it would do
> > is create a different protein every now and then. How can this lead to
> > cumulative mutations that in turn lead to macro evolution?
>
> It can't. In order to be evolutionarily persistent, the mutation
> has to occur in replication (*except that, as we've been discussing, there
> are some good reasons to think that evolution isn't just DNA determinism).
>
> > For evolution to happen the DNA itself must mutate, not an
> occasional glitch
> > in the creation of proteins.
> >
> > This is what I assumed you were talking about as it is what
> could lead to
> > change in the organism.
> >
> > I would like your opinion on this as I am certainly no expert
> on biology.
>
> You're right; the DNA has to be mutated in reproduction. As to how
> they accumulate, the mutation is fixed in the population. Many mutations
> can be fixed simultaneously, and be at different stages of being fixed.
> This fixation can happen 'for no reason at all' (genetic drift) or due
> to selection pressures, or due to non-drift accidental factors. Various
> folks are variously interested in all these sorts of mechanisms.
>
> > *Information Theory*
> > A glitch in putting out a symbol on a random basis is exactly what I was
> > talking about in my analogy to a CD copy. This is an
> "information channel"
> > and as such any random variation on an information channel
> ALWAYS reduces
> > the channels information carrying capacity.
>
> Here you consider the DNA as a channel. It is true that any errors
> introduced by a channel degrades the information transmitted through the
> channel. If DNA is the channel, however, what is the information source?
> Usually the DNA is taken as the source, and the protein coding, or
> reproductive process, is taken as the channel.
>
> > > You are using the generational axis and we were using the
> sequence axis in
> > > our calculation of information. That is why your analogy is flawed.
> > >
> >
> > CD's are still like DNA. You can consider a CD in both sequential and
> > generational modes, that is why a CD is so good an analogy. You actually
> > strengthen the analogy by bringing up another axis in which the CD is
> > similar to DNA.
>
> If you are considering DNA to be the channel, then DNA is quite *unlike*
> CDs. In a CD player, the CD is the source of information, and the
> consumer is whatever process is reading the CD. Your analogy is
> fairly straightforward, but I think you're misapplying it.
>
> [...]
>
> > As I said above. The better we know the source the closer we
> can model the
> > information content. Example given was if we know what language
> a text is we
> > can achieve better compression on it.
>
> But in your analogy, the text is the channel, not the source. Again, I
> think you're mixing source and channel identifications in your analysis,
> which is causing some information loss. :-)
>
> > Therefore if we have no knowledge of a source then yes, we
> cannot tell them
> > apart. HOWEVER this is NOT the TRUE information content, this
> is just the
> > model we are using.
> >
> > The better the model the closer it is to the TRUE information
> content. And
> > if we know that one source is gibberish we can just turn it off
> and ignore
> > it.
>
> You just said that information theory makes no distinction between
> gibberish and meaningful messages, but here you seem again to forget it.
> There is information in gibberish; probably more than in plain text.
>
> If you want a theory of meaning, you'll have to develop it yourself;
> information theory simply won't serve.
>
> > Therefore it follows that if we know that mutations are caused by RANDOM
> > mutations then we can confidently say they do not add information.
>
> This depends on which incarnation of the analogy we're dealing with.
> Since there is some confusion there, let me list the ones I've seen
> and the answers:
>
> 1. DNA is a channel
> Random mutations will *decrease* the mutual information between
> source and sink.

Anything that stores inbformation is a channel. Books, CDs etc. As such DNA
seems to fit into this category quite well. A channel can output
information, but this does not make it a source of information. Easy mistake
to make.

> 2. How much can you compress a DNA sequence
> Random mutations will either *increase* or *decrease* the length
> of the compressed string. (On balance, there will be an increase.)

Compression is a useful measure of the informatin content, unfortunately the
generic compression algoirthms do not give the optimum compression. The true
compression can only be obtained by using a good model of the source and
this is obviously not done in generic compression programs.

> 3. DNA as source
> Random mutations will *increase* the information content of the source.

This is just the wrong way of looking at it. My corrections to Glenn's use
of information theory on this topic should show that. As nobody tried to
correct my maths then I assume that Glenn now sees that his use of the
simplistic formula was incorrect?

>
> In a straightforward (and oversimplistic) biology example, suppose we
> have two parents (AA) and (aA). How much information is there in knowing
> the DNA of their offspring?
>
> Well, it could be (AA) or (aA). Probabilities are .5, and .5,
> and the information resolved in learning the actual sequence is just
> 1 bit.
>
> Now suppose there is a small probability that (a->b). Now we have
> (AA), (aA), and (bA). The information is slightly higher now, due to
> the fact that the logarithm is sublinear. The information resolved
> in observing *any of the three* is identical.

This is true if you are transmitting this data, it does not follow that this
is the true information of the source.

To find the source information you need the BEST model possible, the generic
equation for zero memory systems is almost never the best (or even close).

The better method of modeling this is to take the random mutation of (a->b)
as noise on a channel going from parent to offspring. In fact any randomness
of a source is modeled as a source and a noisey channel.

I want to repeat here that sources are NOT random. They are deliberate
meaningful information. This seems to be a major misunderstanding here.

Information theory does not ascribe meaning to a source, but it definitely
ASSUMES that it is meaningful. The purpose of information theory is
therefore to transmit the MEANING of the data in as efficient a manner as
possible.

Of course random noise has no meaning and so the most efficient manner of
transmitting it is just not to....

>
> One of the reasons it is oversimplistic, of course, is that we are usually
> concerned with populations, and having a new (bA) in the population
> can make it much more interesting.
>
> DNA-as-channel is interesting if organisms care explicitly about the
> sequence of their DNA. Since we've only known what it is for a few
> decades, it is usually taken for granted that this is biologically
> unrealistic. That is, organisms care more about their offsprings'
> well-being, and not at all about their DNA. So the 'meaningfulness'
> that guides to a particular model tends to make DNA-as-channel less
> apropriate. (Certainly for somatic protein-manufacture it is
> inappropriate.)
>

ANY information storage system can be modeled as a channel.

> > _____________________________________________________________
> > Do you aggree or disagree with the following statments?
> >
> > "Information theory is pure nonsense! Noise is usually modelled
> as a random
> > source and a random source contains the most information since
> all symbols
> > are equiprobable. Thus the most informative information source
> is noise."
> > _____________________________________________________________
>
> Looks like a trick question. The source conveys lots of information,
> but its not very "informative," since that's something information theory
> lays no claim to be about.
>

Not a trick question. Why don't you have a go at answering it? This question
is used in exams as a general understanding question.

the question is: Does noise have more information than any other signal?

Glenn (and others) previously stated that it had maximal information. That
means that they agree with the above statement (The most informative source
is noise).

Do you want to the Lecturers response to this question? (Dr Roberto Togneri,
http://www.ee.uwa.edu.au/staff/togneri.r.html/)

--------------------------------------------
Brad Jones
3rd Year BE(IT)
Electrical & Electronic Engineering
University of Western Australia