Re: Information: Brad's reply (was Information: a very

Greg Billock (billgr@cco.caltech.edu)
Thu, 2 Jul 1998 09:20:50 -0700 (PDT)

Brian,

> >3. Guessing at DNA of ancient species. The original DNA is the
> >source, the intervening generations' DNA is the channel, and we're
> >the consumers.
> >
>
> Do you mean we because we happen to be biological organisms or
> we as dispassionate observers trying to understand the process?

Observers. (Except when we're literally consuming the DNA of
hopefully-not-so-ancient organisms :-) :-) :-).

> Your 5 cases have helped a lot. I guess I have tended to be
> thinking along the lines of case 1.

Right. I think that has caused a bit of the misunderstanding regarding
the memory in the code, etc.

> It really bugs me thinking of us as ever being the receivers.
> Unless one is very careful this could lead to some enormous
> blunders. It seems to me that any certainty or lack thereof
> that we might have has absolutely nothing to do with the
> uncertainty in the actual physical process. To say otherwise
> would seem to me to make info theory like quantum mechanics
> wherein the observer can affect the outcome.
>
> Or let's take this example. Suppose through strenuous efforts
> we are able to determine that only 10^6 of the possible 10^120
> paths available in info-space lead to a viable (meaningful) result.
> So we conclude that evolution must be highly constrained. But
> is it really? Many of these nonviable paths may become nonviable
> only after they have been followed for awhile. The process
> of evolution will not share our foresight and so many many
> more than the 10^6 paths may actually be available. I guess
> what I'm getting at is the "we are the receivers" model seems
> fundamentally teleological. Why is it that these particular
> paths are followed and not others. Because of some desirable
> final cause.

I can see your point, and I readily admit I've painted a too-simplistic
picture. In actuality, the existence of groups of organisms modifies
the "permissible" paths through genome space that evolution can take,
so there is a whole ball o' wax that isn't so easily chopped into
"environment" and "organisms." Now we usually think of this as an
approximation, because typically organisms can move in their genetic
surround to keep time with environmental variations (or non-genetically
adapt to them). So its not an altogether bad approximation, but you're
right that to get all philosophical about it was pushing the envelope.

Now about the observer/process thing, its not as intrusive a deal as
you may be thinking. That is, if we try to figure out the DNA of
dinosaurs and (say) early birds (which would be a very informative
thing to do, BTW), we're not operating in the line of descent, so using
one mathematical model in that case (specifically, doing backwards
Kolmogorov equations or their ilk to get maximum likelihoods on the
DNA of dinosaurs) has no quantum-like effect on the DNA of the species
we're studying. That is, the quantum metaphor is a change in Hilbert-
space representation rather than an observation which has a real
effect on the studied system. Or to put it another way, we will model
the process of DNA inheritance differently depending on what we're
looking at, but the process remains the same. When we're looking for
ancient DNA, it is "bad" that mutations have happened in the channel
leading from them to us, because it means we can't be as sure as we'd
like about the original message. When we're looking at the DNA as a
source for information to construct proteins, we've switched models,
because we're interested in different things, but *we* get to decide
what model we want. The universe doesn't seem to care which way we
do the math. :-)

For what we're really interested in the most--which parts of the genome
space are constrained, and how, and that whole business--it may be
inappropriate to use information theory at all, or at least its use
may have to await more understanding of some of those details.

> My personal favorite would be algorithmic complexity since it
> doesn't suffer from some drawbacks of classical info theory.
> For example, one can deal with individual sequences rather
> than ensembles and one doesn't have to know the underlying
> probability distribution or even if there is one, i.e. its
> an objective measure dealing only with the structure of the
> sequence itself.

It has lots of advantages. As I think you mentioned, though, some
of the drawbacks are results that you can't prove thoroughly that
a string is "really" ever of the algorithmic complexity that any
algorithm says it is. This is often a relatively painless sacrifice,
as you know....

> >BH:===
> >> seem objectionable wrt what Yockey did. The proteins do
> >> "know" about functionally equaivalent amino acids since
> >> "know" is just a convenient way of talking about the chemistry
> >> involved in the functional equivalence. I think. But I tend
> >> to get more confused the more I think about this.
> >
>
> Greg:==
> >I'm not quite sure I am understanding here. Do you mean that it
> >depends on the location we look? That is, if we just look at the
> >output of decoding--primary structure of a protein--the code is
> >straightforward, but if we look farther--the 'tasks' of the
> >protein in full tertiary structure, then it is clear that the DNA
> >can't just make any old primary structure and live, and so we
> >have a right to expect constraints on which sorts of primary structures
> >will get produced in the first place.
> >
>
> I'm not quite sure how to answer this. I looked back in Yockey's
> book and one interpretation he gives to the result he calculated
> is as follows:
>
> "... the genetic message must contain between 233 and 374 bits to
> record the instructions to construct one of the molecules of iso-
> 1-cytochrome c in the high probability set." -- Yockey

Right. OK, I think I get what Yockey is saying, and I think I see
where you were coming from. Let me repeat it to make sure...in the
protein coding process, there is functional redundancy (that is,
very different codon streams can produce functionally-near-identical
cyt-c proteins), as well as just plain old DNA code redundancy. I do
not recall how long the gene actually is that makes cyt-c. How does
it compare to 233-374 bits? That would give us a concrete case where
we could compare the approximate measure of the functional space to
the total genetic space.

[...]

> appeared, graduate students wrote them down. Thus a collection
> of fragments of meaningful sentences was accumulated to be
> collated later into works of philosophy, poetry, theology,
> law and other learned matters of concern to the savants of
> the Grand Academy. -- Yockey
> =====

:-) :-)

> OK, so I guess this isn't what you had in mind :-). This is
> typical Yockey. Fun to read even if you disagree with
> everything he says. BTW, the savants at the Grand Academy
> are Manfred Eigen, Maynard Smith, Richard Dawkins and
> Sidney Fox ;-).

Uh, oh, I guess we all know where Yockey stands, then. :-)

-Greg