Re: Entropy

From: Richard Wein (rwein@lineone.net)
Date: Wed Oct 25 2000 - 09:02:05 EDT

  • Next message: Richard Wein: "Re: Randomness & Purpose [wasRe: Piecemeal genetic differences as support for macroevolution, etc."

    From: David_Bowman@georgetowncollege.edu
    <David_Bowman@georgetowncollege.edu>

    David, I'm still digesting your last reply to me about entropy. There was a
    lot to consider there. But, as the discussion has moved on to complexity and
    randomness, I'd like to butt in again.

    DNAUnion:
    >>I have heard both complexity and randomness defined in terms of a measure
    of
    >>the degree of algorithmic compressibility.
    >
    >Like many of the other key terms of involved in information theoretic
    >considerations the notions of 'complexity' and 'randomness' have been
    >defined in terms of multiple incompatible definitions by different
    >authors. The lack of a single nomenclature of the field helps make the
    >field more arcane than it needs to be.
    >
    >But I think usually the notions of 'complexity' and 'randomness',
    >although related, are not synonomous. Usually the notion of 'complexity'
    >is taken to mean (a la Kolmogorov & Chaitin) some measure of the length
    >of the shortest possible algorithm required to reproduce or reconstruct
    >the thing whose 'complexity' is being discussed. The ideas of
    >'compressibility' and 'complexity' (using the above definition) *tend* to
    >be typically *antithetical*. Suppose that the things being described are
    >sequences of symbols. A highly compressible sequence is one whose
    >shortest algorithm for reproducing it (i.e. its maximally zipped version)
    >is much shorter than the original sequence. Such a sequence is not very
    >complex (relatively speaking) because the shortest reproduction
    >algorithm is relatively short. Conversely, a highly incompressible
    >sequence is one whose complexity (length of shortest reproduction
    >algorithm) is nearly as long as the sequence itself. It is possible,
    >though, for a sequence to be *both* highly compressible and yet still be
    >complex. An example of this is a sequence that is so very long that even
    >with a high compression ratio that makes its shortest reproduction
    >algorithm much shorter than its original length has the property that
    >that shortest algorithm is *still* very long on some relevant scale of
    >measurement length. Also it is possible for a sequence to be both
    >incompressible and not complex. Any sequence that is relatively
    >unpatterned and originally very short will be both incompressible and
    >non-complex.
    >
    >The term 'randomness' has a wider range of meanings. Often the property
    >of 'randomness' is taken to be characteristic of a parent ensemble or
    >distribution that is devoid of mutual dependences, correlations and
    >nonuniformities in probabilities among the possible outcomes.
    >Distributions that have a high randomness are very hard to predict (with
    >any accuracy above that from wild guesses) the outcomes prior to their
    >being realized. Such distributions with this property tend to have a
    >high entropy, in that the background prior information about the
    >distribution is woefully inadequate to determine the outcomes--
    >neccessitating much further information (on the average) to be supplied
    >to reliably identify any given outcome.
    >
    >Another usage of the term 'randomness' is used in terms of sequences
    >or time series generated by some stochastic process. Time series
    >drawn from stochastic processes having a high randomness tend to pass
    >various statistical tests designed to detect it. But it is the nature
    >of the beast that any battery of tests for randomness can easily be
    >fooled by nonrandom sequences that are sufficiently enciphered and
    >false positives are the bane of any test for randomness. Sequences
    >with the property of randomness are 'white noise' in the sense that
    >the mean power spectrum of such sequences is flat in frequency
    >space, and there are no correlations among the individual
    >time-dependent members of the sequences in the 'time domain'. Each
    >member of a sequence is statistically independent of all the other
    >members. This latter usage of the term 'randomness' has a connection
    >with the notion of complexity in that a stochastic process that
    >generates random sequences, i.e. a fully random process, has the
    >property that the sequence so generated is maximally complex, i.e. the
    >shortest algorithm for reproducing the sequence is just a listed copy of
    >the sequence itself. Random sequences (i.e. sequences generated by fully
    >random processes) are both incompressible and maximally complex.

    Your last sentence should strictly have said that sequences formed by random
    processes *tend* to be incompressible and maximally complex, but are not
    necessarily so. They may, by chance, turn out to have a regular pattern, and
    so be compressible. (Sorry to be pedantic, especially as you've effectively
    make this point below, but I didn't want anyone to be misled by this
    sentence.)

    >>HHTTHHTTHHTTHHTTHHTTHHTTHHTT
    >>
    >>HTTHHHTHHTTHTHTHHHHTHHTTTHTT
    >>
    >>The first coin flip example can be described by "repeat HHTT 7 times".
    Even
    >>if the sequence were extended out to a billion symbols, the description
    would
    >>become hardly longer, something like "repeat HHTT 250,000 times". The
    >>sequence has a very short mimimal algorithm that fully describes it, so is
    >>not random (nor complex?).
    >
    >The first sequence is neither complex nor random (with near certainty).
    >It's not complex because its minimal length reproduction algorithm is
    >short; its not random (with overwhelmingly probability) because it
    >easily fails many tests for randomness. But there *is* a *very* tiny
    >probability (1/2^28) that the first sequence could have been generated
    >by a random process if that process that generated it just happened to
    >turn up the HHTT pattern 7 times in a row just when the sequence
    >was being sampled.
    >
    >>But the second coint flip example has no such shortcut. The shortest
    >>algorithm that fully describes the sequence is (for all we can tell), the
    >>sequence itself. Therefore, it is random (and/or complex?).
    >
    >It *seems to be* possibly maximally complex. But even this degree of
    >complexity is only at most 28 bits. In the overall scheme of things
    >28 bits is not very much complexity. Again, I haven't applied any
    >test for randomness to it, but I suspect that the probability that it
    >could have been generated by a fully random process is much greater
    >than the example of the first sequence. But it, presumably, could
    >also easily have been generated by a decidely non-random process that
    >only looks random (for at least these 28 bits worth) w.r.t. whatever
    >tests for randomness that it passes.
    >
    >>PS: I also understand that a more detailed look for randomness would
    involve
    >>taking the symbols 1 at a time, then 2 at a time, then 3 at a time, ...
    and
    >>seeing how well the observed distribution of possible symbol grouping
    matches
    >>the expected distribution.
    >
    >Yes, there are tests for randomness that check these things.
    >
    >> Does this also apply to complexity also?
    >
    >If the sequence is at all compressible, it is possible that any
    >algorithm that generated it could have made use of any patterns,
    >correlations and nonuniformities in the frequencies in various multi-
    >symbol subsequences to save space in its maximally compressed form.
    >
    >>Basically, in terms of symbol sequences, what is the difference between
    >>randomness and complexity?
    >
    >The complexity is effectively the length of it maximally compressed form
    >(length of minimal length reproduction algorithm). Randomness is
    >typically not a property of a given sequence. Rather it may be a
    >property of the parent process that generated it. If a very long
    >sequence is incompressible in that its complexity is nearly its original
    >length then that sequence is a viable candidate for having been generated
    >by a fully random process. Recall that a sequence generated by a good
    >(pseudo)random number generator will tend to not have any obvious
    >patterns that could be used to fail a test for randomness. Such a
    >pseudo-random sequence *looks* random and may pass a battery of tests for
    >randomness, but nevertheless such a sequence is highly compressible and
    >is not very complex because the algorithm that generates it is much
    >shorter than the sequence generated. Also, using the other meaning
    >mentioned above for the term 'randomness' it is possible to consider a
    >numerical measure for the randomness of a probability distribution that
    >governs the realizations of an ensemble of possible sequences of a given
    >length to be just the entropy of that generating distribution.

    Note also that Kolmogorov complexity is sometimes referred to as
    "algorithmic randomness" by Chaitin and others, and this sometimes gets
    reduced to "randomness" for short. In this special sense, one can say that
    the second sequence above is "more random" than the first sequence. But it's
    important to note that this is a non-probabilistic sense of the word
    "random", and I feel that using the word in this sense causes a lot of
    confusion, as in my recent discussion with Brian.

    Algorithmic randomness is a property of a sequence or state, while
    statistical (or probabilistic) randomness is a property of a process. In
    statistical terms, we cannot say that one sequence is more random than
    another. A sequence either comes from a random process or it doesn't.

    It's also interesting to note that Kolmogorov complexity can give a result
    opposite to that of Dembski's "specified complexity". The first sequence of
    coin tosses above has lower Kolmogorov complexity than the second, but
    higher Dembski specified complexity (with respect to the chance hypothesis
    that the coins were tossed fairly).

    Richard Wein (Tich)
    --------------------------------
    "Do the calculation. Take the numbers seriously. See if the underlying
    probabilities really are small enough to yield design."
      -- W. A. Dembski, who has never presented any calculation to back up his
    claim to have detected Intelligent Design in life.



    This archive was generated by hypermail 2b29 : Wed Oct 25 2000 - 09:01:14 EDT