RE: Dembski and Caesar cyphers

From: Dawsonzhu@aol.com
Date: Wed Nov 20 2002 - 11:57:25 EST

  • Next message: Iain Strachan: "RE: Dembski and Caesar cyphers"

    Glenn wrote:

    > There are limits to the sounds, (but they are very very broad limits. But
    > there really was no reason a t should be pronounced as an american
    > pronounces it. Indeed, cockneys use a guttural stop to pronounce the t they
    > read. It doesn't sound like our t at all. it sounds like bah le for bottle.

    I completely agree and quite likely the whole "landscape" of sounds has
    not even remotely been covered even if we combine a cumulative list
    of all the languages ever spoken together (known and unknown).

    But that is not what I am saying. Language is generally something
    that we _want_ to be understood (presumably). On the other hand
    code is something we don't want understood except by certain "trusted"
    readers. So for most language "intelligibility" is an essential constraint.
    Admittedly some people are unintelligible regardless of whether they write
    in code or not, but that is not the issue here.

    For example, a sentence like

    "xg5bob^nx k8?5x b^b5km5l5mblmg8"

    looks like nonsense, but if you used a translation table
    x=t; g=o; 5=' '; b = e; o=v; ^=r;
    n=y; ' '=h; k=i; 8=n; ?=g; m=s; and l=a,
    you eventually would get:

    "to everything there is a season"

    Because of constraints on the spoken language there
    are several repeated patterns that are too frequent.
    Too many 5s, too many x's too many m's for example. For
    a much longer fragment, it becomes increasingly easier
    to see that it is probably a language. I'm sure we
    could also write nonsense that looks like language by
    weighting certain characters to show greater frequency
    than others, but that is a different issue.

    Proteins use "standardized" translation machinery (with
    some exceptions of course) essentially without encryption.
    That's why pathogens can make a nuisance of themselves. On
    the other hand, the immune system is there to adaptively recognize
    "foreign words" and "delete them" from the database. So
    recognition is critical to biology and readability also.
    This significantly reduces the number of possibilities,
    although I agree that it is probably still enormous.

    > ...there are,
    > according to Yockey's calculation 10^94 different proteins
    > which will perform the same functionality as cytochrome c. see Hubert
    > Yockey, Information Theory and Molecular Biology, (Cambridge: Cambridge
    > University Press, 1992), p. 59.

    Not having easy and immediate access to Yockey's work here in Japan
    I cannot comment, but let me take a stab at it anyway. First I assume
    that cytochrome c roughly half coil and half alpha helix. Mutations in
    the coil regions are less damaging than secondary structural regions,
    so I estimate that the average variability in the amino acid sequence is
    about <13> residues (if you go through the some hundreds of cytochrome
    c sequences that are reported for different species). For the three
    alpha helices, I would expect less variability so about <5> residues
    variability
    per position. So that means for a protein of 104 amino acids that obeys
    the pairing behavior of random protein sequences, I would arrive at

    (52ln(5)+52ln(13)) --> 10^93.

    So I conclude that Yockey probably went through the database and looked
    at the per position variability of the sequence giving a more precise estimate
    than my off the seat of the pants shot can.

    Here's my objection. This is still treating the sequence as thought
    it is stupid letters on a page. The properties of proteins depend on
    their nearest neighbor interactions AT LEAST. That means one should
    look at tri-peptide patterns and this drastically reduces your degrees
    of freedom. In the above calculation, I would estimate it kills the exponent
    by half.

      (1/2) (52)( ln(5)+ln(12)) --> 10^46

    And now, on top of that, you have folding dynamics that make this structure
    the MINIMUM free energy. Forget that nonsense about suboptimal yada yada.
    It's mostly hogwash. Out of the set of sequences that can lead you toward
    a global minimum, I would estimate you need to half it again. So finally
    we have about 10^24 for the upper end, and possible 10^10 at the
    lower end.

    That is still enormous and far greater than the number of species, but it
    is finite compared to 10^94 or 10^135 (if you allow all degrees of freedom).

    At any rate, please note that I said:

                   What function
                   a given protein "serves", _might_ be somewhat
                   arbitrary, but thermodynamics rules (as always)
                   and that will set limits on what structures can be
                   "meaningful" in that "some context".

    I'm not interested in taking sides on the main issue of this post,
    although I am inclined to think that Dembski's approach is not
    likely to make much progress especially since nature _wants_ to
    be intelligible, and he is approaching the matter from an "encryption"
    standpoint which is just the opposite of how the system seems
    to behave. Nevertheless, I think these probabilities require more
    thought because they are not quite as free as I often hear them
    implied to be.

    by Grace alone we proceed,
    Wayne



    This archive was generated by hypermail 2.1.4 : Wed Nov 20 2002 - 21:27:28 EST