(non-flame post) evolution of IC in computer model

From: DNAunion@aol.com
Date: Mon Oct 09 2000 - 12:30:17 EDT

  • Next message: Chris Cogan: "Re: Human designers vs. God-as-designer"

    DNAunion: I thought I would try leaving old topics behind and try to start over again (this time, still using my same name!). Perhaps that will be enough to break the cycle of negative exchanges between me and others here. Here is my first attempt at a return to civility: it addresses a computer program that modeled the evolution of biological information and is an excerpt from a post I made at another site.

    ***********
    [anti-IDist claims that model used "better binding" as a selection factor.]
    ***********

    DNAunion: In the model, it is not the recognizer's ability to bind better (i.e., more tightly) to sites that drives selection, but the recognizer's combined ability to bind more correct sites and fewer incorrect sites.

    If the recongizer's score for a site exceeds a fixed threshold (-58 in the run mentioned in the article), then it recognizes that site - if not, it doesn't. This is a binary decision - either true or false - there are no gradations.

    ************
    [anti-IDists claims that the model showed CSI forming by RM & NS]
    ************

    DNAunion: Bits of information are additive. According to the article, each 6-base binding site contained 4 bits of information, and there were 16 such sites. The total amount of information evolved, in bits, is thus 4 bits x 16 sites = 64 bits of information. This falls way short of Dembski's universal probability bound of 500 bits of information - the amount he equates with CSI - so this is not a simulation of CSI evolving.

    And this total information measure is confirmed in the article itself.

    [quote]"...[t]he probability of finding 16 sites averaging 4 bits each in random sequences is [2^(-4x16) = ~5x10^(-20)] yet the sites evolved from random sequences in only ~10^3 generations, at an average rate of 1 bit per 11 generations." [/quote]

    Note the "2^(-4 x 16)" which is equal to 2^-64, or 64 bits of information.

    As far as how quickly the non-CSI information evolved, that is open to discussion and scrutiny.

    We need to take a close look at the selection mechanism in Schneider's model to determine whether it is a plausible natural method, or if it is also directed in some way (I haven't studied that part enough yet to comment - just raising a caution flag at this point).

    ***********
    [anti-IDist points out that the authors “refute” Behe giving the following quote]

    [quote]"The ev model can also be used to succinctly address two other creationist arguments. First, the recognizer gene and its binding sites co-evolve, so they become dependent on each other and destructive mutations in either immediately lead to elimination of the organism. This situation fits Behe's [34] definition of `irreducible complexity' exactly (``a single system composed of several well-matched, interacting parts that contribute to the basic function, wherein the removal of any one of the parts causes the system to effectively cease functioning'', page 39), yet the molecular evolution of this `Roman arch' is straightforward and rapid, in direct contradiction to his thesis." [/quote]
    ************

    DNAunion: I missed the "ICness" of the system in my readings. Not that it isn't in there, but I didn't notice it. Perhaps someone could help point it out to me.

    Let's look at the organism as a whole.

    There was a single recognizer protein and it ended up recognizing 16 different binding sites. I count only 2 "components" - the recognizer and the binding sites - and an IC system requires several parts (among other things). Why do I count only 2 "components" when there are obviously 17?

    The organism does not need any of the binding sites to be recognized at first - these are all new splice sites (which would result in new protein functions). So, throughout the vast majority of the experiment, an organism can be viable if the recognizer (1) does not recognize binding site #1, or (2) does not recognize binding site #2, or ... (16) does not recognize binding site #16, or (17) does not recognize binding sites #1 and #2, or (18) does not recognize binding sites #1 and #3, or (19) does not recognize binding sites #1 and #4, or ...

    If we can pluck out any single "component" (not to mention half a dozen "components") of a system and it still functions, then the system as a whole is not IC. (Note that the "contrapositive" is not true - that is, even if we pluck out a single component of a system and the system ceases to function, that does not necessarily mean that the system as a whole is IC). Consequently, the system as a whole during evolution is not IC.

    Now the final state, where a recognizer "must" bind all 16 sites (and not bind to any incorrect sites), is a little bit different. Is that stage IC? I don't think so.

    By the very definition of the selection mechanism, if a single organism mutates so that it can now recognize only 15 of the 16 sites, then the algorithm automatically kills that organism (as it would now rank in the bottom 49%). This very strict selective pressure mandates that all the originally-optional new proteins suddenly became absolutely essential to the survival of the organism. When did this occur in the model? It didn't. We must imagine that these 16 new protein functions were at first optional, but that the organism - for some unspecified reason - became absolutely dependent on each and every one of them.

    Also, note that even when all 16 binding sites must be recognized, that the system still does not appear to me to be IC. Why? Because there are still only 2 parts interacting. There is a single recognizer protein and 16 individual, isolated binding sites - the vast majority of the "components" of the system do not interact: those 16 binding sites do not have any interconnections: each is a separate, isolated island that interacts only with a common protein, not with each other. Well, unless we again imagine that all 16 of the new resulting proteins just happen to belong to, say, a single new complex biochemical reaction cascade that requires all these 16 new proteins (but this would be astronomically unlikely).
     
    Also, I tentatively count only two parts that are well-matched at any one time - the recognizer protein and its recognition of each of the binding sites, individually. Binding site #1 does not need to be well-matched to binding site #2, or to binding site #3, or to..... True, they must all be recognized by the same protein, but this is (apparently) very easy and, in my opinion, does not require truly well-matched binding sites. For example, values between -512 and +511 can be used for the recognition threshold, and the value used in the run described in the article was -58: this is just about in the middle, giving just about equal likelihood that a site will or will not be recognized. That is, when you add the calculated values for each of the 6 individual nucleotides of a given binding site, if the sum is greater than -58 the site is recognized - otherwise, it is not, and -58 is just about halfway between -512 and +511. There seems to be a lot of tolerance for sequences: ma!
    ny!
     dissimilar sequences will still
     fall out of the algorithm as "recognized" using the model's method - in fact, about 50% of all possible sequences will be recognized. I fail to see how such vast tolerance in sequence recognition could qualify the binding sites as being well-matched to each other.

    Now, going back to an issue raised earlier: the authors’ claims of the rapid rate of evolution displayed by their model. I found some interesting code documentation in the C version of the ev program dealing with the selection mechanism.

    But first, note that Schneider emphasizes (brags) just how quickly the information evolved.

    [quote]”Second, the probability of finding 16 sites averaging 4 bits each in random sequences is 2^(-4 x 16) [or about] 5 x 10^-20 yet the sites evolved random sequences in only ~10^3 generations, at an average rate of ~1 bit per 11 generations.” [/quote]

    And using that as a springboard, Schneider then discussed how this high evolutionary rate indicates that the entire human genome could evolve from scratch in a mere 1 billion years, and other similar claims. And earlier in the article he stated:

    [quote]”Remarkably, the cyclic mutation and selection process leads to an organism that makes no mistakes in only 704 generations.” [/quote]

    That tells us how fast his modeled selection mechanism produced the information, but just how well does that algorithm model natural selection?

    (1) Equivalence of Mistakes The article mentions that both kinds of mistakes – recognition of a wrong site and non-recognition of a correct site – are scored the same.

    [quote]”For simplicity these mistakes are counted as equivalent, since other schemes should give similar final results.” [/quote]

    Would the rate of evolution change were the weighting of the two kinds of mistakes treated more accurately? I found the answer in the C code’s internal documentation.

    [quote]”These are weighted equally. (If they were weighted differently it should affect the rate but not the final product of the simulation.)” [/quote]

    In his model, did Schneider use the correct rate or the fastest rate? (Before answering, consider the material that follows).
     
    Are the two kinds of mistakes really equivalent? Isn’t missing a binding site (and therefore being short a protein in the organism) worse than binding incorrectly (in which case any resulting errant polypeptide would probably just be degraded anyway)? It seems so to me that in general, “missing” a site is the more deleterious effect, and if so, then it should have the greater selective pressure in the model.

    So how does scoring the two equivalently effect the evolutionary rate? I believe it accelerates it. In the 256 nucleotide genome, there were 251 potential binding sites [number of sites = length of genome - length of each site + 1; and 256 - 6 + 1 = 251]. Of those 251 total sites, there were exactly 16 correct binding sites, leaving the remainder, 235, as incorrect sites. Considering chance events, it is more likely for a site to be incorrectly recognized than it is for a site to be overlooked - but the more likely event is also the one that would have the milder effects on an organism.

    By “falsely” elevating the selective pressure on erroneous hits – the much more likely chance occurrence – to being equivalent with the more deleterious missing of a site – the less likely chance event - Schneider has biased the selection algorithm to favor faster evolution by more quickly eliminating the greater percentage of errors in each round of selection.

    It seems to me that Schneider should have stated something along the lines of:

    [quote]”For simplicity these mistakes are counted as equivalent, since other schemes should give similar final results [, though they would require more generations to reach similar final results].” [/quote]

    (2)Extinction Impossible I found the following back-to-back sentences very humorous.

    [quote]”Given these conditions, the simulation will match the biology at every point. Because half of the population always survives each selection round in the evolutionary simulation presented here, the population cannot die out and there is no lethal level of incompetence.” [/quote]

    Hmm, follows biology at every point????? Sorry, but this does not sound like natural selection to me. Millions upon millions of species - each of which was probably at one time or another composed of very many populations - have died out over the more than 3.5 billion years since life appeared on Earth. And since the disappearance of populations hinders evolution, and this model does not allow this real possibility, it is stacking the deck in favor of evolution.

    Elsewhere, Schneider explains one reason for his algorithm’s NOT “following biology at every point”:

    [quote]”The fact that the population cannot become extinct could be dispensed with, for example by assigning a probability of death, but it would be inconvenient to lose an entire population after many generations.” [/quote]

    Does nature really look towards the future and adjust its selection rules to avoid inconveniences?
     
    (3) Maintenance of Diversity
    Anyway, Schneider later explains how his model prevents the entire population from dying out.

    [quote]”First, the number of mistakes made by each organism in the population is determined. Then the half of the population making the least mistakes is allowed to replicate by having their genomes replace (“kill”) the ones making more errors. (To preserve diversity, no replacement takes place if they are equal).” [/quote]

    Being a skeptic, I wondered about that last innocent remark in brackets. I found more relating to this “maintenance of diversity” requirement in the C code’s internal documentation (note that “bugs” in the following refers to “organisms”, not errors in the computer code).

    [quote]”SPECIAL RULE: if the bugs have the same number of mistakes, reproduction (by replacement) does not take place. This ensures that the quicksort algorithm does not affect who takes over the population, and it also preserves the diversity of the population. [1988 October 26] Without this, the population quickly is taken over and evolution is extremely slow! [/quote]

    Another indicator that his selection mechanism was tuned for speed (manipulated such that fewer generations are needed to produce the desired results).

    (4) Threshold Value One key component of the model’s selection algorithm is the recognition threshold (if the sum of 6 consecutive bases is greater than this threshold, the site is recognized). I did not see it mentioned in the article how the threshold value was determined – just that it was stored in the genome along with the other elements (recognizer, binding sites, etc.). I found a block of documentation in the C code.

    [quote]”Unfortunately, experience [1988Oct17] shows that with such a large range, the model can fail to evolve. This is because the threshold is compared to the SUM of the weights [of the individual nucleotides that might make up a site], which therefore form a Gaussian distribution [i.e., a standard bell-curve: a curved line plotted on a graph where the greatest number of elements lie in the middle, a great many other elements life just off to one side or the other of the middle, and very few elements lie at either extreme]. This distribution can be tight, so that it becomes unlikely that any of the sites can cross the threshold to allow selection to take place. Therefore the range of [the variable] bpthreshold should be kept small enough to ‘oil’ the evolution, but not too large” [/quote]

    More fine tuning? More need for proper values to be purposefully selected – not just to accelerate evolution, but to allow it to occur at all? Do they have to (know the Gaussian distribution beforehand and then) select a threshold value that is near the middle (near the “top of the hill”)? Had they used –100 or +100 instead of –58, how many more generations would it have required to evolve those 4 bits per site? What law or regularity of nature says that the selection threshold is -58?

    Closing remarks
    Even if I am wrong (I too eagerly await comments) it still seems to me that any “bragging” about how fast the rate of evolution was seems meaningless unless Schneider can demonstrate that his model’s selection mechanism produced results in line with those of selection in nature (of course, another issue that would need to be addresses was the mutation rate used).



    This archive was generated by hypermail 2b29 : Mon Oct 09 2000 - 12:30:51 EDT