Re: Information: Brad's reply (was Information: a very

Greg Billock (billgr@cco.caltech.edu)
Tue, 30 Jun 1998 10:07:59 -0700 (PDT)

Brad,

> >
> > 1. DNA is a channel
> > Random mutations will *decrease* the mutual information between
> > source and sink.
>
> Anything that stores inbformation is a channel. Books, CDs etc. As such DNA
> seems to fit into this category quite well. A channel can output
> information, but this does not make it a source of information. Easy mistake
> to make.

The question is whether it is meaningful to consider DNA a channel. This
is a question to be answered theoretically, not by checking definitions.
(BTW, most channels considered in information theory are memoryless, as
Glenn has been saying. And channels don't 'output' information; information
comes *through* them.)

> > 3. DNA as source
> > Random mutations will *increase* the information content of the source.
>
> This is just the wrong way of looking at it. My corrections to Glenn's use
> of information theory on this topic should show that. As nobody tried to
> correct my maths then I assume that Glenn now sees that his use of the
> simplistic formula was incorrect?

The problem is that you are considering DNA a channel. If you aren't
willing to reconsider that (as well as firming up your language a bit),
we'll probably have to declare the discussion over.

[more disagreements about same]

The application of what you are talking about *does* exist. That is,
if we want to know what dino DNA looked like, then the dino DNA is
the source, and modern descendants of dinosaurs (as well as their
relatives) are the channel through which we have to 'receive' the
information about the original dino DNA. The errors in transmission
decrease the mutual information between what we see in some strands
now and the dino DNA, which is why it is probably hopeless to do
a reconstruction a la Jurassic Park (amid other equally or more important
reasons).

Everyone already knows this, and it should be clear that this is a
special case. It is appropriate to consider intervening DNA a channel
because we're interested in ancient DNA sequences. This is basically
never true "in the wild," which is why info theory is never applied in
that way there.

> I want to repeat here that sources are NOT random. They are deliberate
> meaningful information. This seems to be a major misunderstanding here.
>
> Information theory does not ascribe meaning to a source, but it definitely
> ASSUMES that it is meaningful. The purpose of information theory is
> therefore to transmit the MEANING of the data in as efficient a manner as
> possible.
>
> Of course random noise has no meaning and so the most efficient manner of
> transmitting it is just not to....

Sorry, but this is just absolutely wrong. I don't know of a better way
to say it.

> the question is: Does noise have more information than any other signal?

A noise source will generate maximal information, yes.

> Glenn (and others) previously stated that it had maximal information. That
> means that they agree with the above statement (The most informative source
> is noise).
>
> Do you want to the Lecturers response to this question? (Dr Roberto Togneri,
> http://www.ee.uwa.edu.au/staff/togneri.r.html/)

Sure, why not.

Do you want to hear what an old guy named Shannon had to say on the topic?

The fundamental problem of communication is that of reproducing at
one point either exactly or approximately a message selected at another
point. Frequently the messages have meaning; that is they refer
to or are correlated according to some system with certain physical
or conceptual entities. These semantic aspects of communication are
irrelevant to the engineering problem. The significant aspect is that
the actual message is one selected from a set of possible messages.
The system must be designed to operate for each possible selection, not
just the one which will actually be chosen since this is unknown at the
time of design.

...and from page 5...

We can think of a discrete source as generating the message, symbol
by symbol. It will choose successive symbols according to certain
probabilities depending, in general, on preceding choices as well as
the particular symbols in question. A physical system, or a
mathematical model of a system which produces such a sequence of symbols
governed by a set of probabilities, is known as a stochastic process.

So, it would appear that the source materials (math hasn't changed in the
last 50 years, BTW), indicate that information theory is to be applied
*without worrying about the meaning of the information*, and information
sources are to be considered *stochastic processes* ('stochastic' means
'random'; look it up).

I'm still interested in what your professor has to say on the subject,
though, since it may give us some understanding of where you're coming
from.

-Greg