Probability and apologetics

GRMorton@aol.com
Wed, 30 Aug 1995 23:29:49 -0400

ABSTRACT: The probability argument against the random finding of
a given sequence is one of the main stays of the anti-
evolutionary position. I have noted before that I view that
argument as a weak one for a variety of reasons. In this note I
will show that the finding of a functional sequence by a random
search is quite likely on normal evolutionary time scales.
Because of this, and other weaknesses in the traditional
apologetic, Christianity needs to move to a more defendable
apologetic.

Duane Gish once wrote:

"The highly specific biological activity of each protein is due
to the precise way the amino acids are arranged, just as the
information conveyed by this sentence is determined by the
precise sequence of 190 letters found in it."~Duane Gish, "The
Origin of Life," Proc. First Int. Conf. on Creationism,
Pittsburgh: Creation Science Fellowship, 1986, p. 62

There is a major problem with that sentence. This is not the
only way to state what Gish wanted to state. For instance, he
could have written "Biological activity is due to very specific
orderings of amino acids as this sentences meaning is due to the
123 letter order."

This is only a hint of how much variability there is in sequence
space in order to convey the same message. There is an amazing
flexibility in the language to perform the same task. I once
calculated that there are over 330,000 ways to convey the
information, "if you pick your nose; you get warts." These vary
from relative pigeonish phrases like "pick nose get wart" to more
complex statements, "If you put your digits into your nares, you
will contract a hypertrophy of the corium." There are various
orders of this statement. It can be reversed. "To contract a
hypertrophy of the corium, place your digits into your nares."
But you can substitute nasal openings, nostrils, nasal passages,
for nares. You can get more gross and talk about what you pick
and extract. :-) All of sequences were less than 80 in length
and I only quit calculating because my imagination played out and
I was getting bored.

So the question is, if I wish to convey a certain message, how
likely is it that I can find a sequence to perform a given
function? There is a way to randomly produce a useful sequence
which is not all that improbable.

Let's use a less gross example than the nose picking one above.
Lets find a functional sequence to answer the question your wife
asked you when you were first married. "What do you want for
breakfast?" (and you thought I was going to say something else.
tsk tsk.) There are lots of ways to answer this question. What
we will do is choose a 70 unit long sequence of 20 letters,
ruling out the use of z,q,x,k,v and j. Thus, we have in this 70
unit long sequence 1.18 x 10^91 different possible combinations.
Normally the anti-evolutionists say, like Gish, that the
likelihood of finding just the correct sequence is too unlikely
to occur. This is usually based upon the idea that one and only
one sequence will perform the task. This is untrue as we have
seen.
Even finding 330,000 ways to say I want eggs, does not solve the
problem. 330,000 ways to say I want eggs out of 1 x 10^78 is
still too improbable for one to consider realistically.


In order to solve the problem we need one other factor. What is
the shortest sequence which performs the function? The shortest
I can think of is simply "eggs". But this is not a full sentence
and would be too brusque for your bride. So lets say the
shortest sentence is "I eat eggs" without the spaces this is a 8
letter sequence.

What I noticed was that with a 2 unit long sequence, i.e., in a
2-d phase space, the sequence ab occurs at only one point out of
the 26 x 26 points in a 26 character set. That is 1/676=.0014. If
you embed this 2d space into a third (e.g. using a 3 unit long
sequence), there are then 26 locations with the sequence ab.
There are 26 sequences *ab and 26 sequences ab* for a total of 52
sequences in the phase space. Thus the odds of finding a
sequence with ab is 52/17576=.0029, a considerable improvement in
the odds of finding ab. Embedding the 2d in a 4d space requires
**ab,*ab*,ab** be the sequences desired.(here * is variable)
There are 3 x 26^2 in the 4d sequence and thus the odds are .0044
of finding an ab. Each subsequent embedding raises the odds of
finding a particular short sequence.
It would appear that the equation ought to look something like:

prob=(N-n+1)(L^(N-n)/L^N

where N is the number of dimensions in the larger phase space, n
is the number of dimensions in the smaller phase space and L is
the number of characters which can be selected. This equation
ignores those sequences which have multiple copies of the desired
embedded sequence, but they are a small quantity by comparison
and can be safely ignored.

Thus in the search of a 70-d space for a 8-unit sequence ("I eat
eggs"), should yield

prob =(70-8+1)(20^(62))/(20^70)=2.4 x 10^-9

This is the probability that you will randomly make a 70 unit
long sequence which contains the string "ieateggs" somewhere in
it. But one can object that this embedding of the wanted string
in another one makes it unlikely to be useful. After all, the
string

"fieuoindhgeosyhdbflgdsyfgshsdfgdfosuieateggsqcrpflacyebfmcpdusmw
gcnmle"

does not seem to convey much information. But, as is often noted
in discussions of the origin of protein or DNA sequences, once
formed the sequence is likely to be cut randomly. So what are
the odds that a sequence with "iwanteggs" will be cut twice, at
just the correct location? If we consider that a sequence that
is not cut is equivalent to cutting it past the terminal
character of the sequence, there are 71 places you can cut the
sequence. Thus for the above sequence, randomly cut, there is a
1/(71*71)= 1/5041 chance of cutting it in such a fashion that the
"iwanteggs" statement is extracted. Thus the total probability
of finding a useful sequence in the 70 unit long sequence is 4.76
x 10^-13.

How likely are we to find this useful sequence? If we were to
assign proteins to the letters, and write this sequence in
proteins, and then create a vat with 10^14 70-amino acid
proteins, (which is not at all impossible nor would this occupy a
huge space.) you would most likely find 10 of the "ieateggs"
sequence in the first vat.

This is not all. The next shortest useful sequence to answer
your bride's question is "I want eggs" This is a nine character
sequence The odds of finding this sequence in a 70-unit long
sequence is 2.40 x 10^-14. In your first vat of proteins there
is a high probability that one "iwanteggs" will be found. But
there is also the phrase "I like eggs" which is also 9 and has a
probability of 2.40 x 10^-14 of being in the vat after each
sequence is cut twice. There is also, "I need eggs", "I wish
eggs" and "I have eggs".

If we look for 10-sequence solutions, we have "I covet eggs", I
crave eggs", "I fancy eggs", "I favor eggs" Each of these has a
probability approximately 10^-15. You would be likely to find
one of these in the first 10 vats.

In addition to these, if we go to an 11-length solution, we have
phrases like "I ingest eggs" "I devour eggs", "I fancy eggs", "I
gobble eggs". These have a likelihood of 10^-16.

This can go on and on. Within the 70-d space there are hundreds
of thousands of ways of saying that you want eggs for breakfast.

One question which can be addressed here is how can a short
useable sequence become longer. Well, if you come down to
breakfast and say brusquely to your bride, that "I eat eggs", she
might cook them for a few days but eventually she will demand a
politer response, like "Dear, I eat eggs". Small additions from
one useable form to another due to selection pressure caused by
your hunger pangs when your bride doesn't fix your breakfast, can
eventually lead you to say, "My beautiful wife, I am most
desirous of eating two eggs this morning" Obviously this
sequence has a greater functionality than simply, "I eat eggs".

Do proteins act in the same fashion as the language above? Yes.
Gerald Joyce is one of the leaders in the field of directed
evolution. I would point you to Discover, May 1994, "Speeding
Through Evolution,", and to Gerald E. Joyce, "Directed
Evolution," Scientific American, Dec. 1992, pp.somewhere around
p. 94,95 or Beaudry and Joyce, Science, 257:637-638, 1992.

Sean Eddy of the Washington University School of Medicine
recently wrote on Talk Origins,( message
<EDDY.95Aug17084136@wol.wustl.edu>) that RNA sequence space is
teeming with interesting functionalities. All based upon Joyce's
work.

Thus, the weaknesses in the traditional creationist probability
argument is two fold. It assumes that one and only one sequence
can perform a given function. And secondly, it assumes that only
the most complex forms must be made at first. This ignores the
potential of short sequences performing the same function."

When one adds this weakness to the other weaknesses mentioned
over the past few weeks the weakness of our apologetical approach
becomes obvious. The problems are: 1) the amount of genetic
variability in humans which requires an ancient humanity in order
to fit the Biblical data. 2) The inability for young-earth
creationists to account within their time frame for how the caves
could be formed in which fossil man lived. 3) The fact that
fossil man apparently built religious altars of various forms
which is unaccounted for by those defending a recent origin of
Adam. 4) The inability of old earth creationists to point to a
place and a set of rocks to explain how the flood occurred and
how it matches the Biblical account (how could Noah float for a
year and land anywhere near mountains?). 5) Whether one accepts
the fossils we discussed in June and July as truly transitional
or not, is less important to the apologetical case than what
those fossils appear like. If they have the appearance of being
transitional forms, all our pleading that these are really NOT
transitional forms will fall on deaf ears.

The young earth creationists position Christianity in opposition
to almost every piece of observational data science collects,
from astronomy, biology, geology, paleontology, physics and
anthropology. The PC and TE positions, with a recent creation of
man, are much better, but they place christianity in opposition
to certain biomolecular data(MHC and other allelic diversity) and
anthropological data (the nature of fossil man) as noted above.

It is very obvious that the positions we are defending
apologetically, are not very secure.

The question those interested in Christian apologetics and the
relation between science and the early chapters of Genesis should
ask themselves, is whether the purpose of the Christian apologist
is to explain the observational data in a Biblical framework or
to explain the data away. These are two very different
approaches. But if the probability argument against evolution is
as weak as I showed above, Christianity had best find a better
way to handle the area of Science and the Bible.

glenn
16075 Longvista Dr.
Dallas, Texas 75248