Re: design: purposeful or random?

Bill Hamilton (hamilton@predator.cs.gmr.com)
Fri, 17 Jan 1997 13:32:19 -0500

Brian Harper wrote

>Ah, I think I just figured out why this error [random = uniform distribution]
is so common.
>Most probability calculations have to do with constructing
>protein sequences by chance. Related to this would be
>the probability of executing a long sequence of events in
>some specified order.
>
>To illustrate, suppose we want to construct a sequence
>M units long from an alphabet containing S characters.
>What is the total number of possible sequences (N)
>of length M? This one's easy N = S^M. Now, whats
>the probability of selecting any one of these sequences
>at random? If the probability of selecting any one of them
>is the same then this probability is 1/N.

However, each sequence represents a different combination of chemical
building blocks. In simple language, each resulting chain is formed by a
different chemical reaction. Assuming all combinations are equally
probable assumes all the reaction rates are the same. Now perhaps all the
A, C, T, G building blocks combine with one another with the same reaction
rates, but that seems unlikely to me. Is there a biochemist in the house
who can either confirm this or tell me to shut up before I mislead someone?

>
>I think this is the point at which one is typically thinking
>about equal probability. If these sequences supposedly
>form by chance then why give preference to any one of
>them?

If I'm correct, the sequences whose components have the highest (reaction
rate)*concentration products should get preference.

And if some are more probable than others,
>how could we know which ones and how could we know
>whether the specific one we're interested in is one of
>the more probable ones or one of the less probable.
>Given this uncertainty, isn't the fairest thing just to
>assume they're all equally probable?

But we should be concerned with correct modeling, not "fairness".
>
>So, the confusion comes from talking about two different
>random processes. The random process wherein individual
>letters are selected from S letters according to some
>probability distribution p and then the random process
>of selecting one of the N possible sequences of length M.
>When saying that the equi-probable assumption is bad
>I've generally been talking about the assumption that the
>individual letters appear with equal probability. So, this
>raises an interesting question. It doesn't seem that the
>probability distribution for the occurrence of the letters
>has anything to do with the total number of sequences
>that are possible.

Agreed.

Isn't this independent of those
>probabilities, and if so wherein lies the error?

Huh? It seems to me that there ought to be a reaction rate for the
formation of any given sequence from any given set of precursors, that each
reaction rate is likely to differ from the others, so that products of
concentrations and reaction rates ought to figure in the estimation of the
probability of producing each possible product.

Bill Hamilton
--------------------------------------------------------------------------
William E. Hamilton, Jr, Ph.D. | Staff Research Engineer
Chassis and Vehicle Systems | General Motors R&D Center | Warren, MI
810 986 1474 (voice) | 810 986 3003 (FAX) | whamilto@mich.com (home email)