Re: Probability and apologetics

Brian D. Harper (bharper@postbox.acs.ohio-state.edu)
Mon, 11 Sep 1995 09:42:21 -0400

Glenn wrote:

>Brian wrote:
>
>>>Now for another interesting and perhaps somewhat surprising result
>that has some bearing on Glenn's proposal [I'm still thinking about
>the best way to answer Glenn --- later!]. <<
>
>May I suggest a possible answer? How about "Glenn, I think you might be
>correct'? ;-)
>

Ah, very good :). This is, of course possible, however another
answer might be "Glenn, your idea is clever but not clever
enough" ;-).

I haven't tried getting into the math too much as of yet but I
did want to clear up one point. On September 6, Jim Gibson
asked the following:

Jim:======================================================
Glenn
In your calculation of the number of amino acid strings of
length 110 in the phase space of strings of length 112,
you used the calculation:

**[110] + *[110]* + [110]** => 3 x 2^20 = 3145728

Why did you use 2^20 rather than 20^2 in this and the
succeeding calculations?
===========================================================

Did you reply to this? The reason I express uncertainty is that I
have been seeing several posts replying to other posts that I
never saw, leading me to suspect I'm not getting everything. I
think it was also about this time that another curious thing
happened, I received an identical set of 6 posts repeatedly
five times in a row, roughly every five minutes. I don't think
this was a problem with the reflector since at least one of the
6 came from another source. Apparently I'm having problems with
my e-mail server. I also tried looking at the archives for a
reply but have been unable to connect for some reason.

In any event, it seems to me that Jim is right, p^n would be the
number of sequences of length n with p equally probably outcomes
at each position, i.e. 2^20 is the number of binary strings of
length 20 whereas 20^2 is the number of amino acid sequences of
length 2. This makes a big difference of course. 2^20 is about
a million whereas 20^2 = 400.

Also, your calculations seem to be missing something. Let's
generalize the notation and suppose we have original sequences
of length N which contain 10^x functional sequences. But there are
also 10^y nonfunctional sequences. Also, the evidence indicates
that 10^y >> 10^x. Now the probability of finding a functional
sequence by a random search through sequences of length N is

10^x
p = -----------
10^x + 10^y

Now we try to increase these odds by looking for functional sequences
of length N imbedded in sequences of length M where M > N.
It is clear enough that this will increase the number of functional
sequences, however, there is some disagreement (see above) regarding
just how much. So, let's just say the number of functional sequences
increases by some factor F so that there are now F*10^x functional
sequences of length N buried in the sequences of length M. Here
we note that the number of nonfunctional sequences also increases,
and by the same factor, i.e. whatever math you do to compute the
number of new functional sequences also applies to the nonfunctional
ones. So, we also have F*10^y nonfunctional sequences of length N
buried in the sequences of length M. The probability that any of
the sequences of length M will contain one of the functional
sequences of length N is thus:

F*10^x 10^x
p = --------------- = --------------
F*10^x + F*10^y 10^x + 10^y

as before. So it doesn't help.

Now let's get away from the math and look at some general problems
with your scenario:

1) It seems to require a primeval soup. There is no evidence that
a primeval soup ever existed and many strong reasons for
arguing that it did not. Thus I would have to be very skeptical
of any scenario requiring a soup.

2) It seems you need a fairly good probability for a chance
encounter between cutter and cuttee which would imply that
not only do you need a soup, you need a rather concentrated
soup.

3) You also have to consider the probability of the chance formation
of cutters. This may be considerably more probable than the chance
formation of something like cytochrome c, however, the problem
is that you would seem to require lots of these buggers to ensure
that whenever your sequences of length M form, they also get cut
twice.

4) For your scenario to work seems to require that it is fairly
probable that a sequence of length M, once formed, gets cut
twice to give your sequence of length N. But, if cutting is
so probable, why twice and not thrice?

==

Brian Harper:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=
"I believe there are 15,747,724,136,275,002,577,605,653,961,181,555,468,
044,717,914,527,116,709,366,231,425,076,185,631,031,296 protons in the
Universe and the same number of electrons." Arthur Stanley Eddington
:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=:=