Random origin of biological information

From: pruest@pop.dplanet.ch
Date: Mon Oct 02 2000 - 14:18:36 EDT

  • Next message: glenn morton: "RE: Random origin of biological information"

    Glenn:

    As much of our differences of opinion may be colored by our respective
    theological and philosophical outlooks, I move this topic to the
    beginning:

    mortongr@flash.net wrote:
    > I will tell you my experience. I believed Christian apologists when I was in
    > college. I came out as a YEC and went into the oil business. What I saw there
    > horrified me. Eventually, after 20 years of struggling, I finally had to admit
    > that everything the YECs had told me about geology was absolutely false. And
    > they made arguments just like you do. They said that the other guy couldn't
    > prove this or that but I would go look at the data and find out that the other
    > guy had shown what had just been claimed to be impossible. This created a
    > tremendous scepticism on my part towards what apologists say. They usually are
    > behind the times (like you were with Yockey's work). They ignore modern data
    > and explain it away rather than incorporate it into a coherent world view (as
    > you are doing with the frequency data). In otherwords, data doesn't behave for
    > apologists like it does for scientists. Data become something to be discounted
    > rather than incorportated into a world view. I should start a discounting
    > factor measurement for apologists. But that would then be criticized for 'not
    > building bridges'. The problem I have is that I want no bridges to falsehood.

    Let me tell you my experience. I became a Christian at age 21, after
    having been taught evolution. No problem: so God used evolution in
    creation. 6 years later, I came into contact with Christian
    anti-evolutionists (I can't remember coming across any dating criticisms
    at that time). So I started studying evolutionist primary literature,
    always with the idea "God did it, but how?" in the back of my mind. The
    conclusion grew that there is NO incontrovertible evidence for evolution
    (apart from microevolution), as perfectly valid alternative explanations
    can be found - unless there is NO intelligent designer: similarities may
    reflect similar requirements, similar cladistics may reflect similar
    interdependencies, punctuations between equilibria are necessarily
    virtually devoid of fossils, macroevolution by random-walk emergence of
    information in a DNA-based organism is unbelievable, and, of course,
    there is no plausible theory about the origin of life. Later, I started
    studying more closely what the Bible says about creation. First, I came
    to the conclusion that the Bible is NOT opposed to evolution, even for
    humankind. Then, slowly, the conviction grew that a close attention to
    the Hebrew of the creation story, as well as the personality and freedom
    God has given humans, require evolution. That the creation story
    requires an old earth became clear to me at that time. I had always
    accepted that science proves an old earth, but after studying a popular
    German YEC book and following up on all its references, as well as some
    relevant scientific material, I was fully convinced that YEC is false. I
    believe God is as active in "natural" events as he is in special
    "miracles", and he has plenty of scientifically undetectable means of
    guiding the outcome of "natural" random-walk processes, thus providing
    all information needed. This is not a philosophy of a god-of-the-gaps,
    but a theology of a God who is both transcendent (creating) and immanent
    (upholding) at the same time and continuously. Thus, my skepticism about
    the feasibility of process (e)(see below) occurring spontaneously has
    nothing to do with any of my theological beliefs.

    As a basis for discussion, I repeat the definition of the 5 different
    cases:
    > > (a) search for a meaningful letter sequence among random ones,
    > > (b) artificial selection of a functional ribozyme from a collection of
    > > random RNA sequences,
    > > (c) evolution of a functional ribozyme in RNA world organisms,
    > > (d) evolution of a protein by mutation of the DNA and natural selection
    > > of the protein,
    > > (e) a random DNA mutational walk finding a minimally active protein.

    The problem we keep running into is that you assume that (a) and (b) are
    representative for (d) and (e), which I contest. I group the points
    discussed under different headings, A **** etc.:

    A **** Is it necessary to distinguish (a) and (b) from (d) and (e)?

    > I raised that only as a response to your contention that proteins wouldn't
    > behave as does an RNA. I think the evidence says that they do.

    They don't: a nucleotide is worth 2 bits, an amino acid about 4.3 bits
    which can only be selected as a whole. This may not amount to much
    difference if each mutational step is selected individually, but
    whenever you have intermediates without functional improvement, the
    probability factors are multiplied at each step. RNA can be made by
    "organisms" consisting of 1 RNA molecule each, in a soup containing RNA
    polymerase and 4 nucleotide triphosphates, whereas a selection system
    doing translation of DNA (on which mutation works) across RNA into
    protein (on which selection works) requires a bacterium. You may
    mutagenize RNA at rates of 10^(-4), perhaps also at 10^(-3) per
    nucleotide and generation, but a bacterium will hardly survive such
    treatments (the usual, i.e. naturally optimized, mutation rate is
    10^(-8)). This rate also multiplies in each time a step leads to an
    unselected intermediate.

    > Now, here is how I view that the probability argument will
    > eventually be defeated. A protein that engages in Function X has this broad
    > structure--
    > variable amino acids-invariant amino acids-variable amino acids.
    > (and no I don't believe that they have to be segregated but use that as a
    > diagram). Now. I think eventually for function X we will find
    > variable amino acids-invariant amino acids A-variable amino acids
    > variable amino acids-invariant amino acids B-variable amino acids
    > etc.
    > This is exactly as the case of the two sentences:
    > Picking noses begets warts
    > fingers in the nares creates hypertrophy of the corim
    > All these sentences convey the same idea without using any of the same
    > invariant sequences. They consist of a separate family of solutions for
    > conveying this idea. One family has the invariant word warts. Another has the
    > invariant word corium, a third has the invariant bugers. I can create hundreds
    > of thousands of sequences for EACH of these families. I think eventually we
    > will find the same thing in proteins, and we have found it in RNAs. The
    > solution that life uses, which seems so limiting, is merely the solution that
    > life chose early in its evolution.

    That different sequences of the same protein family (having recognizable
    sequence similarities) often have the same function (but in different
    organisms or environments!) is clear. The experimental evidence for
    different folds having the same function, however, is very meager if
    they occur at all (I don't know of any example, although it might be
    feasible occasionally).

    > > This is what Yockey did. To find a lower limit, we may estimate how much
    > > semantic (specified) information can be generated in a random walk and
    > > how much time this would take. And that's exactly what I tried to
    > > present for discussion in my first post. But you dismissed my
    > > (tentative) conclusion out of hand, without discussing it, by referring
    > > to cases (a) and (b), which cannot be compared with it at all.
    >
    > It ignores the possibility I discuss above about different families of
    > solutions. With the RNA experiments, we have already seen the same experiment
    > run twice yeilding totally different sequences that perform the same function
    > exactly as I illustrated in the sentences above.

    RNAs aren't proteins, although both can be specified by DNA. And
    sentences can be compared even less with proteins. They are analogous
    because sentences, RNA, and proteins all may contain coded information,
    but an analogy may not be used to transfer ALL details. Christ being a
    vine doesn't mean he is literally rooted in the ground.

    > > But, most
    > > importantly, how about the origin of new functionalities by process (e)?
    >
    > New functionalities are found exactly as the experiments are showing new
    > functionalities to be found with Ribozymes. They appear at a frequency of 10^-
    > 13. You seem to continue to ignore this frequency that appears over and over
    > in biopolymer experiments.

    No! You keep getting back to ribozymes, case (b), which is incomparable.

    > > This last factor might easily transcend any estimate for process (d) by
    > > a transastronomical magnitude.
    >
    > Experimental data would say it doesn't.

    Case (b) data are NOT evidence for (e)!

    > > What I meant with "unknown bias" is this: the starting pool of RNAs was
    > > certainly about random (within the limits of biochemical precision), but
    > > this was only a minute fraction of all possible sequences.
    >
    > This is precisely the point that amazes me. If it is such a small fraction of
    > all possible sequences, and yet they still find active sequences, the ONLY
    > POSSIBLE CONCLUSION THEREFORE IS THAT THE CLASSICAL CLAIM (THAT IT IS
    > IMPOSSIBLE TO FIND A FUNCTIONAL SEQUENCE BECAUSE THEY ARE TOO RARE) IS FLAT
    > OUT WRONG, WRONG WRONG! If the functional sequences were so rare as to make it
    > impossible for life to evolve, as you have contended throughout this thread,
    > then why on earth were they able to find one in a couple of months of work?
    > We can't claim that functional sequences are rare when we can find them in a
    > couple of months. If Szostak and Joyce can find them that quickly, then so
    > can nature!!!!!!

    You keep mixing up cases (b) and (d), and NEVER consider case (e)!

    B **** What is the frequency of active RNA's in ribozyme selection (b)?

    > The question is how efficient is nature at finding solutions.
    > The experiments with biopolymers that I have cited clearly show that
    > functionality occurs at a rate of 10^-13 or so. In the case of one of Joyce's
    > RNAs the classical probability argument would say that he had something like a
    > 1 chance in 10^236 of finding a useful sequence. But Joyce has been showing
    > that he can find functionality in a vat of 10^13 ribozymes. Surely that must
    > cause the anti-evolutionist pause because at that rate, there are 10^223 or so
    > different sequences that will perform a given function. I really fail to see
    > how someone can not see the implication of this except for theological
    > reasons.

    To which paper are you referring? We would have to look at the details.
    Exactly the opposite conclusion was drawn in C.Wilson, J.W.Szostak,
    Nature 374 (1995), 777: "A pool of 5 x 10^14 different random sequence
    RNAs was generated... On average, any given 28-nucleotide sequence has a
    50% probability of being represented... Remarkably, a single sequence
    accounted for more than 90% of the selected pool... This result
    indicates that there are relatively few solutions to the problem of
    binding biotin." The probability of accidentally hitting on a functional
    combination composed of L nucleotides is 4^L, no matter how large N, the
    length of the randomized sequence is. Your conclusion that with N=392
    (10^236 different sequences), finding one active sequence among 10^13
    (L=22) implies that there are 10^236/10^13 = 10^223 active sequences of
    length 392 is formally correct but completely irrelevant, as the
    392-22=370 other nucleotide positions add nothing at all to the
    functionality. If L=370, instead, a completely different overall
    probability results. Your insistence on the 10^13 to 10^14 figure is
    entirely arbitrary. That this same figure keeps popping up in different
    experiments may just mean that this amount of RNA is practical to work
    with. Even in RNA selection, probabilities depend very much on the
    length of the RNA sequence selected, WHICH function is being selected,
    as well as other details. So you cannot generalize. And especially, you
    cannot draw conclusions regarding natural selection in a DNA-to-protein
    organism from results of artificial RNA selection.

    C **** In what sense is meaning compatible with randomness?

    > > I fully agree with you that both (a) and (b) are relatively easy, and
    > > certainly successfully doable (although you may be overestimating the
    > > fraction of letter sequences representing a recognizable meaning - but I
    > > don't know). These are the only two types you have been dealing with up
    > > to now. As we don't know anything about the feasibility of an RNA
    > > world, it is too uncertain to speculate about the chances for success of
    > > (c).
    >
    > As I have said at least twice before, I am not discussing the RNA world. I am
    > merely pointing out that the classical anti-evolutionary position which claims
    > (erroneously) that randomness is incompatible with meaning or specificity is
    > clearly false.

    Randomness, entropy, Shannon information deal with statistical
    properties of sequences. From the sequence alone, it is impossible to
    say whether it has meaning, specificity, biological functionality. This
    must be tested in a replicating system or organism. Randomness does NOT
    generate meaning, we need selection to recognize meaning. If we have a
    mutational path consisting of one or more steps, AND none of the
    intermediate mutants (for paths of >1 steps) represents an improvement
    on the wild type (starting sequence), the increase in meaning or
    functional information corresponds to the improvement observed in the
    final mutant of the path with respect to the wild type. Where does this
    information increment come from? From the information contained in the
    environment? Did it emerge accidentally? From God's guidance? It's
    impossible to be sure as far as science is concerned. All we can do is
    calculate the probability of the random walk mutational path; if it is
    something like 10^(-13) or larger, we hardly care. If it's 10^(-130),
    would you like to say there is no problem about randomness generating
    meaning?!

    > First off, there is no enzymatic activity if one doesn't allow selection and
    > comparison. Isolated proteins, created by random mutation won't do anything to
    > anybody unless one allows them to be tested against another protein for a
    > given function.

    I fully agree.

    > > All this is just Shannon information. For a string of length L and 4
    > > nucleotides, the maximum amount of information corresponds to 4^L
    > > possibilities. This may be called information potential. But none of
    > > this tells us anything about usable or semantic information or meaning in
    > > the sense of specification of biological function. Mutations add nothing
    > > to the semantic information until you test them by the environment.
    >
    > We agree here.
     
    D **** Is darwinian evolution (d) faithfully modelled by ribozyme
    selection (b)?

    > > In the evolutionary process, the only possible natural source of
    > > information is the environment. But the extraction of this information
    > > is extremely slow, probably only a fraction of a bit per generation -
    > > when any useful mutants are available at all. And if they are, they must
    > > penetrate the entire population before being fixed. For small selective
    > > advantages and large populations, the mutation still risks being lost by
    > > random drift.
    >
    > Having looked at informational flow calculations for the genome, like those
    > Spetner published in Nature in 1964, I am not at all impressed with his
    > calculations. There is most assuredly more than 1 bit of information generated
    > per generation. This is especially true in long sequences in which many
    > mutations occur during a generation.

    How do you know? Each intermediate organism must be viable in order to
    contribute to the evolution of its genome. In bacterial evolution
    experiments you sometimes find single-step mutants being selected, but
    double-step mutants through a non-selected intermediate have not been
    documented, to my knowledge. With RNA, viability in a non-selected state
    is not an issue. Multiple mutations in the same RNA molecule between
    selections (in vitro) are easily possible, but whether they are in the
    DNA coding for a bacterium has not been demonstrated. It is just
    assumed.

    > > Furthermore, it's no use having all these bits randomly
    > > distributed in 10 million bags (species), or even further spread out
    > > among the individuals of a species. Biology only works if the right
    > > information is in the right place at the right time. Each individual
    > > must have all the information it requires. That will slow down the
    > > process tremendously. For each bit of information, you must consider
    > > that it can be input into the biosphere almost anywhere on earth. One
    > > bit improves cytochrome c in a fish on an Australian shelf, the next one
    > > improves a kinase in a worm in Canadian soil, the next one improves an
    > > ATPase in a heterotrophic bacterium 1 km below the surface in a Siberian
    > > rock, etc. This may help if each of the functionalities needed is
    > > already in place in each organism and is just made a little bit better.
    > > To make use of the improvements, the other organisms of the same species
    > > would have to trade their genes among themselves, which is not a matter
    > > of seconds, nor even of a few years. And if other species should profit,
    > > the trade between species or even higher taxa is much slower.
    >
    > First off, bacteria have sex with other bacteria of different species all the
    > time. There is a blizzard of genetic material that flows through the
    > biological world, trading genomes and genes. (see La Ronde, Scientific
    > American June 1994 P. 28-29

    This reference is incorrect: I couldn't find it. I am not disputing that
    genes are traded rapidly among bacteria. What I emphasized is that a NEW
    mutant gene representing an improvement, which first is present as only
    ONE molecule in the biosphere, has to spread to all individuals and to
    all species which are to profit from it. We are talking of thousands of
    positive mutations required to build up each of thousands of efficient
    proteins, the set of which is basically the same today in virtually all
    species. Your simple calculation is not realistic, because you assume
    that the moment a helpful mutation is available anywhere on earth it can
    be used immediately as a basis for further improvements anywhere else on
    earth.
     
    > > A question which remains, of course, is the amount of semantic
    > > information at the transition point between (e) and (d). If this is just
    > > a few bits, my problem doesn't exist. What we can do is to try to define
    > > an upper and a lower limit for this transition point. Presumably, the
    > > two limits are very far from each other, but this is the best we can do
    > > for the moment. For the upper limit we may look at the amount of
    > > semantic information required for a modern (i.e. a known) enzyme.
    >
    > Oxytocin has only 8 amino acids. Several others have that also. An enzyme
    > does not a priori have to have a long sequence.

    Oxytocin is a biologically active peptide, not an enzyme. There are lots
    of small, but biologically active things, down to ions like Ca++. Active
    peptides usually aren't even translated from an mRNA (I'm not sure about
    oxytocin), but synthesized by rather large enzyme complexes. Enzymes and
    other biologically active proteins have sizes of usually a few hundred,
    and up to a few thousand amino acids. They often are composed of domains
    with their own tertiary structure, where domains are usually around 100
    amino acids. As an enzyme has to fold into a more or less fixed steric
    structure, in order to very specifically hold one or more substrates and
    catalyze a very specific reaction, it cannot be too short.

    > So tell me what exactly is your definition of 'primitive' enzymes? How would
    > you recognize one? What objective criteria would you use? Is Oxytocin
    > primitive because it is short? Or are the enzymes of cyanobacteria primitive
    > because cyanobacteria are so old?

    A "primitive" enzyme (or enzyme of "minimal activity") would be just
    above the transition from process (e) to (d). Such transitions would
    happen anytime during the history of life, whenever a basically novel
    activity was emerging, from the origin of life to the origin of humans.
    If we had such an enzyme, we would detect that it has a small activity,
    but we still would not know if a precursor was already active (apart
    from a probably unpracticable exhaustive mutant search). To find out by
    what mutational random-walk it originated would probably be hard.
     
    E **** Some misunderstandings in the scientific realm:

    > > > "Extrapolating to the rest of the protein indicates that there should be
    > > > about 10^57 different allowed sequences for the entire 92-residue domain.
    > >
    > > This fits in very nicely with Yockey's cytochrome c estimate. Now, using
    > > his "effective number of amino acids" 17.621, we get 17.621^92 = 4.3 x
    > > 10^114 possible sequences, and the probability of finding any one of the
    > > 10^57 [lambda] repressor sequences is 0.23 x 10^(-57), rather low!
    >
    > And once again, it ignores the data found by Szostak and colleagues that a
    > repeat of the same selection experiment yields vastly different sequences to
    > solve the same biological problem.

    You yourself brought in this example (Reidhaar-Olson & Sauer, 1990), in
    order to refute Yockey's result. Szostak's ribozyme results are a
    different case.

    > > These artificial mutations were targeted intelligently to specific small
    > > sequence regions to be tested, which makes it practical to recover
    > > biologically active mutants. Thus, this is not an experimental
    > > simulation of darwinian evolution. If you want to use these results for
    > > probability estimates, you have to factor this in.
    >
    > I would rather see them start with totally randomly generated strings rather
    > than try to substitute one at a time. I think they could be surprised as
    > others have been at that approach.

    See above under B!

    > > Whatever is
    > > contained therein has a greater chance of being selected than sequences
    > > not in the starting pool, which just might, but need not, be formed by
    > > later mutagenesis. And Lorsch & Szostak (Nature 371 (1994), 31), for
    > > instance, indicate that their starting pool already contained the ATP
    > > binding site required, "which greatly increased the odds of finding
    > > catalytically active sequences". Furthermore, they suggest it would be
    > > better to mix, match and modify small functional domains.
    >
    > The ATP is irrelevant as far as the frequency of the functionality is
    > concerned.

    You are contradicting Lorsch & Szostak concerning their own work!

    > > The don't-knows are Orgel's! (you clipped out his very relevant comments
    > > I quoted.) You don't want to claim he hasn't done anything worth while,
    > > during several decades of work, to solve these questions, do you? It's
    > > not just one "guy's failure", but the failure of a whole field of
    > > research, in ALL research groups having had a try at it. Orgel is one of
    > > the leaders in the field.
    >
    > So we base our position upon other people's failure. Most scientific theories
    > are based upon positive experimental support, not other people's failure. This
    > is the wrong approach for Christians to take. If we depend upon failure, what
    > happens when they finally succeed?

    If the ribozyme selection results would constitute any positive
    experimental support for the early evolution of life, do you think Orgel
    would not see it?

    > You did cite outdated 1979 Yockey material which was at least 13 years behind
    > what I believe to be the last statement on the topic in 1992.

    I explained that in my post of 24 Sep 2000 (you seem not to have read
    it).

    > > > Degeneracy equals lots and lots of different proteins to perform the same
    > > > task. And before you say that there is an invariant region that must be as
    > > > it is in order to assure protein function, have you ruled out that other
    > > > sequences in other protein folded structures can't perform the same thing?
    > >
    > > The sequences of the same fold are already taken into consideration in
    > > the 10^57 sequences. Whether there are sequences of different folds with
    > > the same activity is not known. If I remember correctly, cases of
    > > different folds having the same activity are extremely rare, if they
    > > exist at all.
    >
    > You misunderstand. Not the same sequences having different folds--different
    > sequences haveing different folds!

    You misunderstand. I said " different folds having the same activity "!
    A "fold" in this sense is a set of protein families without recognizable
    sequence similarities between them, but folding into the same tertiary
    structure.

    F **** Some misunderstandings in the theological/philosophical realm:

    > > Your calculation omits some very crucial details about how an organism
    > > functions and how the biosphere communicates. Before you apply natural
    > > selection, you have no semantic or functional information whatever. Your
    > > string of a huge amount of Shannon information (which equals amount of
    > > randomness or entropy) is nothing but raw material for selection, bit by
    > > bit. First you need a functioning organism coded by the string (how do
    > > you get that?), then you can start testing each of the other bits
    > > against the environment in which this organism lives - a rather slow
    > > process.
    >
    > I think you keep trying to mix the problem here. I started this thread merely
    > by pointing out that randomness isn't incompatible with semantical meaning. I
    > think I proved this. Now you want to change it to the origin of life where you
    > think you have a better defense for your case. First off, we don't need a
    > functioning organism to to have selection. We merely need reproduction. Now I
    > will freely admit I don't know how the raw molecules would reproduce and right
    > now no one else does either. However, to claim that our lack of knowledge is
    > equivalent to a law of nature seems to rest your case on our continued
    > ignorance. History has shown over and over again that that is a weak place to
    > rest one's case.

    No, I want to focus on case (e), the initial, random-walk search for a
    minimal enzymatic activity in a fully functional DNA-RNA-protein
    organism in which darwinian evolution works. I just have to constantly
    fend off all your linguistic (a) and in vitro ribozyme (b)
    probabilities. Not because I don't like them, but because there really
    are crucial differences between the cases (a) to (e), see at the
    beginning of this post. I never contested that (Shannon) randomness is
    compatible with semantical meaning (phenotypically tested). We need a
    functioning organism for cases (d) and (e), just reproduction for (b).

    > No, unsolved problems are not forbidden, but reliance on unsolved problems to
    > support one's position is a god-of the gaps (small g) type of approach and is
    > philosophically, historically and scientifically poor. It postulates that the
    > lack of solution for this problem supports my theory. No lack of any solution
    > supports any theory.

    It seems that you are relying on lack of experimental/observational
    results in natural evolution, case (d). As for my philosophical stance,
    see at the beginning of this post!

    > So, if you are an evolutionist as you seemed to imply earlier in this letter,
    > why are you taking this position? It seems to me that one who accepts
    > something is very unlikely to then turn around and claim persecution for
    > questioning it. Don't get me wrong, I think you have a right to question it.
    > What I am questioning is the correctness of your assertion that you believe in
    > evolution.

    Please see at the beginning of this post for my theological and
    philosophical views! I believe EVERYTHING in the history of the life is
    creation. AND, for THEOLOGICAL reasons, I believe EVERYTHING in the
    history of life is evolution. But, for SCIENTIFIC reasons, I don't
    believe atheistic evolution is capable of accounting for reality. So, do
    you want to call me a creationist or an evolutionist? Categorizing
    people sometimes is tricky.

    > And by using this overriding belief that we have interpreted the Bible
    > correctly, we then twist the science, and discount the data as I discussed
    > above. The YECs make the same kind of epistemological claim. They KNOW that
    > God created the world 6000 years ago and they then discount any data that
    > violates that belief, just as you are discounting any data that violates your
    > theological system.

    See at the beginning of this post: I never was anything like a YEC, nor
    guilty of what you suspect.

    > > Although I agree with most of your criticism of the YEC position and
    > > some of that of the ID folk, there is certainly some value in their
    > > sincere commitment to biblical doctrins like divine inspiration of the
    > > Bible, and the idea of the ID "wedge" fighting the nihilistic
    > > degeneration of our society is worthy of approval, although some of the
    > > intellectual tools used may be questionable.
    >
    > First off, it is the ID group, Phil Johnson in particular who has basically
    > excluded the TE's as being 'compromisers' or some such.

    I see positive ideas as well as deficiencies on both sides. We
    definitely should find some common ground, at least between ID and TE,
    if people on both sides would seriously consider the issues involved.



    This archive was generated by hypermail 2b29 : Mon Oct 02 2000 - 14:16:47 EDT