RE: Dembski and Caesar cyphers

From: Dawsonzhu@aol.com
Date: Wed Nov 20 2002 - 11:57:25 EST

Next message: Iain Strachan: "RE: Dembski and Caesar cyphers"

Previous message: Peter Ruest: "RE: Dembski and Caesar cyphers"
Maybe in reply to: Glenn Morton: "Dembski and Caesar cyphers"
Next in thread: Iain Strachan: "RE: Dembski and Caesar cyphers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Glenn wrote:

> There are limits to the sounds, (but they are very very broad limits. But
> there really was no reason a t should be pronounced as an american
> pronounces it. Indeed, cockneys use a guttural stop to pronounce the t they
> read. It doesn't sound like our t at all. it sounds like bah le for bottle.

I completely agree and quite likely the whole "landscape" of sounds has
not even remotely been covered even if we combine a cumulative list
of all the languages ever spoken together (known and unknown).

But that is not what I am saying. Language is generally something
that we _want_ to be understood (presumably). On the other hand
code is something we don't want understood except by certain "trusted"
readers. So for most language "intelligibility" is an essential constraint.
Admittedly some people are unintelligible regardless of whether they write
in code or not, but that is not the issue here.

For example, a sentence like

"xg5bob^nx k8?5x b^b5km5l5mblmg8"

looks like nonsense, but if you used a translation table
x=t; g=o; 5=' '; b = e; o=v; ^=r;
n=y; ' '=h; k=i; 8=n; ?=g; m=s; and l=a,
you eventually would get:

"to everything there is a season"

Because of constraints on the spoken language there
are several repeated patterns that are too frequent.
Too many 5s, too many x's too many m's for example. For
a much longer fragment, it becomes increasingly easier
to see that it is probably a language. I'm sure we
could also write nonsense that looks like language by
weighting certain characters to show greater frequency
than others, but that is a different issue.

Proteins use "standardized" translation machinery (with
some exceptions of course) essentially without encryption.
That's why pathogens can make a nuisance of themselves. On
the other hand, the immune system is there to adaptively recognize
"foreign words" and "delete them" from the database. So
recognition is critical to biology and readability also.
This significantly reduces the number of possibilities,
although I agree that it is probably still enormous.

> ...there are,
> according to Yockey's calculation 10^94 different proteins
> which will perform the same functionality as cytochrome c. see Hubert
> Yockey, Information Theory and Molecular Biology, (Cambridge: Cambridge
> University Press, 1992), p. 59.

Not having easy and immediate access to Yockey's work here in Japan
I cannot comment, but let me take a stab at it anyway. First I assume
that cytochrome c roughly half coil and half alpha helix. Mutations in
the coil regions are less damaging than secondary structural regions,
so I estimate that the average variability in the amino acid sequence is
about <13> residues (if you go through the some hundreds of cytochrome
c sequences that are reported for different species). For the three
alpha helices, I would expect less variability so about <5> residues
variability
per position. So that means for a protein of 104 amino acids that obeys
the pairing behavior of random protein sequences, I would arrive at

(52ln(5)+52ln(13)) --> 10^93.

So I conclude that Yockey probably went through the database and looked
at the per position variability of the sequence giving a more precise estimate
than my off the seat of the pants shot can.

Here's my objection. This is still treating the sequence as thought
it is stupid letters on a page. The properties of proteins depend on
their nearest neighbor interactions AT LEAST. That means one should
look at tri-peptide patterns and this drastically reduces your degrees
of freedom. In the above calculation, I would estimate it kills the exponent
by half.

(1/2) (52)( ln(5)+ln(12)) --> 10^46

And now, on top of that, you have folding dynamics that make this structure
the MINIMUM free energy. Forget that nonsense about suboptimal yada yada.
It's mostly hogwash. Out of the set of sequences that can lead you toward
a global minimum, I would estimate you need to half it again. So finally
we have about 10^24 for the upper end, and possible 10^10 at the
lower end.

That is still enormous and far greater than the number of species, but it
is finite compared to 10^94 or 10^135 (if you allow all degrees of freedom).

At any rate, please note that I said:

               What function
               a given protein "serves", _might_ be somewhat
               arbitrary, but thermodynamics rules (as always)
               and that will set limits on what structures can be
               "meaningful" in that "some context".

I'm not interested in taking sides on the main issue of this post,
although I am inclined to think that Dembski's approach is not
likely to make much progress especially since nature _wants_ to
be intelligible, and he is approaching the matter from an "encryption"
standpoint which is just the opposite of how the system seems
to behave. Nevertheless, I think these probabilities require more
thought because they are not quite as free as I often hear them
implied to be.

by Grace alone we proceed,
Wayne

Next message: Iain Strachan: "RE: Dembski and Caesar cyphers"
Previous message: Peter Ruest: "RE: Dembski and Caesar cyphers"
Maybe in reply to: Glenn Morton: "Dembski and Caesar cyphers"
Next in thread: Iain Strachan: "RE: Dembski and Caesar cyphers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Wed Nov 20 2002 - 21:27:28 EST