Re: pure chance

billgr@cco.caltech.edu
Mon, 13 Jan 1997 10:53:34 -0800 (PST)

Brian Harper:

[dice example for DNA/protein]

> This is a simple analogy but has a lot of features in common
> with DNA-->Protein. Here we can easily see how the information
> was lost. We have no idea which of the six ordered pairs was
> actually thrown when we receive the letter 7. We also see why
> we cannot reverse the process, knowing only that the sum is
> 7 we have no way of deciding which of the 6 ordered pairs to
> assign.

I like this example, and I think it is in many ways analogous to
the DNA->protein case.

[...]

> My suspicion is that what you are giving is a physical reason
> why reversibility would not occur even if it were "possible"
> according to info-theory.

I think the disagreement we are having is about what the reverse
transcription is. There is no doubt that the proteing information
can't be 'put back into' the DNA, because information is lost in
the decoding. This is uncontroversial, but I do *not* think that
this means that reverse transcription is *impossible*. (Or even
that it doesn't happen.) What the information loss tells us is that
as a matter of fact reverse transcription isn't at all frequent (and
we can use DNA information to say how infrequent it is). Sadly for
Lamarck, this turns out to be really, really infrequent. :-) So
to go back to the dice example, of course the ordered pairs can't be
reconstructed from the sums, but this is where DNA->protein is different:
we have a sequence of DNA codons and the proteins they code for. We
don't have a random process generating DNA codons. If it turned out that
only 20 or so codons were found in DNA, then even though the alphabet were
64 letters, we would suspect that reverse transcription were going on.
Similarly, if we observed that in a list of dice values, only 11 pairs
occurred, we might expect that reverse translation was going on. That
is, even though other letters of the alphabet *are* represented in the
code, some rule in the reverse transcription eliminates them, thus
driving down the information content of the original code string (once
it has been decoded and reverse recoded). So the higher information
content of DNA is proof that reverse transcription doesn't happen (much),
not proof that there is no possible way to take proteins and reconstruct
an adequate DNA code for them (which is all that reverse transcription
demands).

So in some sense, the inability to reconstruct the exact DNA coding
sequence is irrelevant. If you are a protein what wants to get reverse
encoded, you don't give a rip if you produce one of the umpteen possible
DNA codes which would produce you. All you care is that you produce an
*acceptable* code. (Actually, this brings up an objection to the above:
if reverse transcription is accompanied by a random process which selects
codons within the space of possible sequences to code for a protein, then
all the information in the original DNA sequence can appear in the
reverse translated one.)

> Greg:==
> >Uh, oh, now I'm partially mad. :-) :-) I thought I had
> >persuaded you that the assumption of equal probabilities
> >for all the codons was a poor estimate of the ensemble (or
> >at least *could* be a poor estimate, and so needed some
> >justification). Is this what you are going back on?
> >
>
> Wow, I guess I must have really mis-communicated some
> how. When I computed the maximum entropy for DNA
> assuming equal probabilities this was done just as a
> rough estimate purely for convenience since I didn't
> want to enter in the 61 probabilities given in the paper.
> What the authors did was try to estimate the probabilities
> based on the information available at that time, but they
> definitely were not equal.

Oh, OK. I think I jumped to conclusions based on my reading
of your examples. Sorry. :-)

> Greg:==
> >It is also a kind of ECC--that is, degeneracy like this
> >is exactly what ECCs rely on to decrease probability of
> >error. You make the codewords in clusters so that their
> >Hamming distance is larger than if you used densely packed
> >codewords. I think it would be interesting to see if the
> >genetic code were optimal in some sense in this way. It
> >should be fairly easy to figure out--do you know if anyone
> >has done so?
> >
>
> Actually Yockey does state that the genetic code was shown
> to be optimal in a paper published in 1985.

Cool!

[my example of acceptable reverse translations]

> But you are really not translating back and forth between two
> alphabets with different entropies. When you say "Obviously, I
> can't recover the original information in the sentence" you concede
> that the process is irreversible. Whether or not you care is
> irrelevant ;-). When you reach your final state you are translating
> reversibly back and forth, but between two alphabets with the
> same number of characters (1,2) (a,b).

Right. Hopefully the above discussion helps out here. One point
I raised in parenthesis is that the reverse translation can actually
yield a string with *higher* information content than the original
coding string--if the coding sequences have unequal probabilities,
for instance, but the reverse coded ones have equal probabilities.

> I really appreciate the above comments. One thing I'm sorely lacking in
> is an understanding of molecular biology. I'm glad there's some one
> around who knows stuff like this and is willing to discuss it. Yockey
> does go into some of the stuff you mention above, for example molecular
> clocks and molecular phylogenies. It would be nice if you could get a
> hold of a copy of his book so that I don't have to spend so much time
> translating. Or if you can't find that, maybe you have access to Journal
> of Theoretical Biology?

I'll have to correct the implication that I'm someone who knows a lot
of molecular biology. :-) My understandings are rudimentary at best,
although I do have some textbooks on the subject, so I have something
to fall back on... :-)

I have access to the journal. I'll try to get a copy of his book so
I will be able to stop feeling lazy. :-)

-Greg