Re: order, complexity, entropy and evolution

Don N Page (don@phys.ualberta.ca)
Fri, 19 Dec 97 15:40:58 -0700

In view of Glenn's Wed, 17 Dec 1997 15:50:09 -0600 and Wed, 17 Dec 1997
19:36:18 -0600 messages pointing out the difference between ordered (low
algorithmic information) and organized (high algorithmic information), I
wondered whether life might be characterized as being both organized and
ordered, in different ways. An individual DNA molecule has a lot of
information (though much less than the brain stores from its experiences, as
Glenn has emphasized) and so is highly organized rather than highly ordered
(though the fact that is is given by a sequence of the four base pairs means
that it is much more ordered than a random molecule of the same size).
However, the fact that the same DNA molecule occurs millions of time within an
organism (Can someone tell me how many times for a human?) means that the
collection of DNA molecules for an organism is also highly ordered.

In bed last night I began to have a half-baked idea that I wanted to
try out for quantifying the combination of organization and order. The basic
idea is that the information in an individual DNA molecule is much higher than
the total information in all the DNA molecules of an organism divided by the
number of such molecules (since the total information for a large number of
identical molecules is just the information in one, plus the information of how
many there are), whereas for a random collection of similarly complex
molecules, the information in each one would be roughly the total information
divided by the number of molecules. (I'm ignoring the information of how all
the molecules are arranged, which would presumably be most strongly influenced
by environmental rather than genetic factors and which, as Glenn or someone
else pointed out several weeks ago for the synaptic connections in the brain,
would be far more information that than in any single DNA molecule.)

I'm not too clear how to quantify this measure for a large collection
of non-identical molecules, such as those making up an organism when one also
counts the large number of non-DNA molecules (or the DNA molecules that have
mutated), but the following is my first stab at it:

Take all the M molecules in an entity (e.g., in an organism, a species,
or in the entire biosphere) and arrange them in decreasing order of complexity.
Let 0 < m < M+1 be a label for each molecule, with m = 1 labeling the most
complex and m = M labeling the simplest. If i_m is the information in each
molecule separately (with _m being the TeX notation for subscript m), then the
order is such that i_m >= i_n (i_m greater than or equal to i_n) if m < n.
Many molecules will be identical and so have identical i_m, but m is supposed
to label each individual molecule and not just each type. (I.e., if there are
n of one type, there will be n m labels for those n molecules, and not just one
m label.)

Now let I_m be the total information needed to specify the first m
molecules. I_1 = i_1 will be the information in the most complex molecule, and
I_M will be the total information in the collection of molecules making up the
entire entity (without considering the information of how the molecules are
arranged in the entity). If there are n >> 1 molecules of the most complex
type, I_n will be approximately I_1 + const. log n, since it takes of order log
n bits to specify how large n is. On the other hand, if all the molecules were
totally uncorrelated, then I_m would be roughly J_m, which I define to be the
sum of all the individual i_j's for 0 < j <= m, the sum of the individual
information in all the molecules up to and including that m, ignoring the
correlations that reduce the actual information I_m.

Now let K_m = (I_M/J_M)J_m, a function of m that grows with m and is
normalized to give K_M = I_M for the total collection of molecules, and let L_m
= I_m - K_m. If the molecules were all uncorrelated (no order), then J_m = I_m
and hence K_m = J_m = I_m and L_m = 0. But if the molecules are correlated,
then K_m would generally be less than I_m, so L_m would be positive. On the
other hand, if i_m is small (little organization), then J_m and hence I_m < J_m
would also be small, and so L_m = I_m - K_m would be small. To get L_m large,
one needs both that some i_j's be large (large organization) for j <= m, and
also that the molecules be correlated (large order). Hence the L_m's seem to
be at least some measure of a combination of both the organization and the
order of the molecular content of the entity and would apparently be small if
either the organization or the order were small.

What I am by no means certain of is the best way to convert the L_m's
to a total measure of this combination of organization and order. A simple way
would be simply to sum the L_m's over all m's, and then perhaps divide by M,
the total number of molecules, to get what I might call L = (1/M)(sum of L_m's)
(without any subscript on the L). (Dividing by M is designed to avoid making L
larger simply by virtue of having a large number of molecules in the entity.)
But one might alternatively choose L to be any of various other functions of
the i_m"s, the I_m's, and M.

To give an example of how to calculate L in my original definition
above, suppose one had a collection of M >> 1 molecules that all have the same
complexity, i_m = c >> log M. Then I_m ~ c + log m, dropping whatever the
correct coefficient is of the log term that specifies the information of the
size of m, whereas J_m = c m. K_m = (I_M/J_M)J_m ~ [(c + log M)/(c M)] c m ~
cm/M, so L_m = I_m - K_m ~ c + log m - cm/M ~ c(M-m)/M and hence L ~ c/2.

One might conjecture that for a suitable choice of the function for L,
the biosphere would be the entity with the largest L. (The choice above does
not look quite ideal, since it seems to depend almost entirely on the
organizational complexity c and not much on the number of copies M that
represents the correlations between the complex molecules, though if M = 1 one
would indeed get L = 0.) If so, one could say that one has chosen a
combination of organization and order so that life is picked out as the entity
that maximizes this combination. But I am not sure how to avoid making the
choice rather ad hoc.

Also, I'm not sure what these considerations imply about evolution,
Intelligent Design, etc. Since the common biological descent aspect of the
evolutionary scenario seems to be a good explanation of the similarity between
species, and hence of correlations between molecules of different species (just
as the "according to its kind" in Gen. 1:11-12, 21, and 24-25 describes
correlations within species), I would suspect that most reasonable definitions
of L would give larger results within an evolutionary scenario than within most
scenarios in which God created the same number of similarly complex species
separately. As to whether large L for life is evidence for ID, I doubt it,
since within the evolutionary scenario mutations can account for the complexity
of the DNA molecules (large organization), and the genetic copying mechanism
can account for the large correlations between them (large order), both of
which would be necessary to give a large L if it were suitably defined.

I would be curious if any of you have heard of attempts to define any
measures, similar to an L such as what I am proposing, that would be large only
for entities that had both large complexity (organization) and large
correlations (order), such as what life has.

Don Page