Re: Probability and apologetics

Bill Hamilton (hamilton@predator.cs.gmr.com)
Thu, 7 Sep 1995 11:04:00 -0500

Brian quoted me quoting an article from New Scientist:

>"Our uneasy attitude toward randomness is probably to do with the human
>penchant for spotting patterns. The brain has an architecture ideal for
>picking out a person in a crowd, or linking together disparate events --
>abilities that have obvious evolutionary advantages[I'm not quoting this
>because I agree -- only to establish context - weh]. 'Humans want
>order--and they will impose order even when it is not there,' says
>Professor Norman Ginsburg, emeritus professor of psychology at McMaster
>University, Ontario, who has made a study of how well humans simulate
>randomness.
>end bill:================================================================

Brian:
>I am very intrigued by this article and plan to go over to the library
>and get it as soon as I get the chance. Random means so many different
>things to different people that it becomes very difficult to even use
>the word without confusion. I generally use the information theoretic
>definition for a random sequence [more on this later] in which case
>Ginsburg's comment seemed complete nonsense to me at first reading.

Ginsburg uses some terminology that may be misleading. Where he says
"impose" is would say "conclude". That is, we look for patterns and when
we see them -- even in a sequence which passes any battery of tests for
randomness you want to prescribe -- we tend to think "aha, there's a
pattern here."

>In this sense humans cannot impose order when it is not there no matter
>how hard they might try. The rest of the quote shows that this is not
>what Ginsburg actually meant [what he means is that humans might imagine
>that they see order that isn't really there]. This is no doubt true.
>For example, Democrats might imagine that there is order in the White
>House. Oooh, I really shouldn't have said that ;-).

Exactly. Republicans might imagine there's order in the White House too --
and might be considered paranoid for the underlying purpose they see :-).
Now we're in this political aside together :-).
>
>Perhaps a case in point here is my own sequence (B) above. I don't know
>if I actually mentioned that this sequence was generated by tossing a
>coin but I'm sure Bill would have concluded that it was based on what
>he learned from this article.

I'm willing to conclude that it could have been so generated, and I
certainly agree that it looks more like a credible sequence of fair coin
tosses than sequence A.

>Note that this sequence contains 11
>consecutive 0's at one point. I would *never* have done this had I been
>deliberately trying to construct a random sequence!
>
[interesting story of Shannon's outguessing machine deleted. I read
Schroeder several years ago but don't remember that story. Thanks]

>bill:========
> "The point is that no sequence is inherently random or nonrandom.
> The sequence Brian included could have come from tosses of a fair
> coin."
>end:=========
>
>Now I'm really intrigued. Is this your interpretation of what the author
>wrote or did he actually say something like this? In any event I must
>strongly disagree with the first sentence, practically all sequences are
>random.

I succumbed to one of my favorite vices: making bald generalizations. The
point I was trying to make is that it's difficult to determine if a finite
sequence -- especially a relatively short one -- is random. I agree that
your sequence B looks much more like a random sequence than sequence A. I
think you discuss how practically all sequences are random below. At this
point I will merely note that I suspect we will have to agree on some
definitions before I will concede _that_.

>The second sentence I agree with, sequence (A) *could* have
>been generated by tossing a fair coin, although this possibility is
>extremely unlikely. This is not due to the improbability of obtaining
>that *specific* sequence [the creationist's fallacy] but rather due to
>the improbability of getting *any* ordered sequence from a random
>process [failing to recognize this is what I referred to earlier
>as the evolutionists fallacy]

Perhaps you need to define ordered. Remember, you just said practically
all sequences are random. Can a random sequence be ordered? You have also
used the term "organized" in other posts to mean something different from
"ordered". I've been meaning to ask for a definition of "organized" also,
and maybe I should just do that now.
>
>I think the confusion here is due to my using random in two different
>ways in the same post. I do this out of habit since this terminology
>is common in the literature. Anyway, the random in random sequence means
>something different than the random in random process. I think its
>clear that this must be the case since it is in fact possible for
>a random process to produce a non-random sequence, an apparent
>contradiction. Random sequence takes on the definition of random
>used in information theory. Whether a sequence is random or not
>depends solely on its structure and is completely independent of
>the process used to produce the sequence. Thus, one can discuss
>the orderliness or randomness of a sequence knowing absolutely
>nothing about how it was generated. To avoid this confusion, many
>authors have started substituting stochastic process for random
>process, reserving the word random for its information theoretic
>definition.

Unfortunately, the courses I took on probability, random numbers and
stochastic processes in grad school 30 years ago were quite vague about
getting definitions nailed down. Maybe it's time to read some more modern
references. Any suggestions? To me the idea of saying that a substring of
the output from a true random number generator is nonrandom seems
foolhardy. To me, saying it's nonrandom implies something about the
underlying process which has generated the string. If it doesn't, what
does it buy you? Suppose the string 11110000 occurs in a long sequence of
outputs from a true random number generator -- flips of a fair coin. I can
certainly claim that string is nonrandom, and I can easily make a finite
state machine that can generate it. But what's the point? None of that
tells me anything about when I can expect to see such a string again. It
seems to me that what we're doing when we try to understand nature is
looking at data and searching for underlying regularity. Frequently that
involves filtering operations to remove noise. If there's no underlying
regularity, then you can have any pattern you want in a string and it
doesn't tell you anything about the process by which the string was
generated, (except that it's random or chaotic :-)) which is generally the
objective in studying random sequences.
>
>I think Murray Gell-Mann explains this word confusion quite
>well in the following:
>
> As we have been using the word, applied for instance to a single
> string of a thousand bits, random means that the string is
> incompressible. In other words, it is so irregular that no way
> can be found to express it in shorter form. A second meaning,
> however, is that it has been generated by a random process,
> that is, by a chance process such as a coin toss, where each head
> gives 1 and each tail 0. Now those two meanings are not exactly
> the same. A sequence of a thousand coin tosses _could_ produce a
> string of a thousand heads, represented as a bit string of a
> thousand 1s, which is as far from being a random bit string as it
> is possible to get. Of course, a sequence of all heads is not at
> all probable. In fact, its chance of turning up is only one in a
> very large number of about three hundred digits. Since most long
> strings of bits are incompressible (random) or nearly so, many sets
> of a thousand tosses will lead to random bit strings, but not all.
> One way to avoid confusion would be to refer to chance processes as
> "stochastic" rather than random, reserving the latter term mainly
> for incompressible strings.
> -- Gell-Mann, M. (1994). <The Quark and the Jaguar>. New York:
> W. H. Freeman and Company.

Okay. Fair enough. However, even the incompressibility definition is
potentially problematic. For example Michael Barnsley has shown that
fractals can be used to compress scenes from nature. That makes sense
because fairly natural-looking pictures of mountains, clouds and islands
can be generated using fractals. In other words, there may be a certain
nonlinear recurrence that, if found (and that's the rub with Barnsley's
approach) can achieve compression. As I've said before, the beautiful
patterns based on the Mandelbrot set can be expressed in terms of a simple
recursion: x(n+1) = x(n)^2 + c and a coloring rule based on escape times.
But if you tried to compress one of these patterns without knowing the
generation rule, you'd be stuck.
>
>bill:==================================================================
>The article points out that these apparent patterns are typically "fleeting"
>and that the lack of order typical of randomness typically requires long
>strings of numbers. But how long is long? 100? 1000? 10^19?. Depending
>on what tests you use and how persnickety you are, your mileage will vary.
>==========================================================================
>
>Does the author elaborate on this point? How long is long is the key both
>here and in my discussion with Glenn.

As I remember, he doesn't. I've played around with random number
generators in the past, some of which I've authored, others which people
have recommended. I've performed various tests for randmoness and
sometimes found some sequences of 1000 outputs that look rather nonrandom.
It's a slippery issue.

The longer the sequence the stronger
>my argument becomes. This point is more critical with a binary string
>than it is with a protein since there are only two possibilities at
>each location of a binary string, i.e. it is much more probable to
>get 10 consecutive heads than it is to get 10 consecutive alanines.

Agreed. The two-symbol alphabet does give some rather "structured-looking"
strings simply because there are only two choices.

>
>Gell-Mann uses a length of 1000 in the example above, does it really
>need to be this long? I don't think so. My string of length 64 has
>exactly the features we have been discussing. There are substrings of
>11 consecutive 0's and 6 consecutive 1's yet there is no obvious
>pattern to be found. Also, the total number of 1's is nearly equal to
>the total number of 0's.
>
>I have more to say on "how long is long", first I want to discuss an
>interesting side-light that has some importance to the discussion.
>It is well known [look in practically any book on information theory]
>that practically every sequence is random (incompressible) or nearly
>so.

So I'd be wasting my money if I bought a modem with compression? :-).

>Even so, one cannot actually prove that any *specific* sequence is
>random. An interesting paradox, almost all sequences are random yet
>no specific sequence can be proven to be random. Gregory Chaitin
>showed that this little oddity is closely related to Godels theorem and
>Turings halting problem.

Which leads me to believe that the issue of how the sequence was generated
(or how it can be characterized as the output of some recursion) is
important. After all, any practical compression algorithm must be
invertible. In other words, the inverse of the compression algorithm is
_an_ algorithm for generating the subsequence that was compressed.

>This seems to throw a real monkey wrench into
>the works. Given this, how is it possible to say *anything* about a
>given sequence. Well, its not so bad as it seems. It is impossible to
>prove any given sequence is incompressible, however, it *is* possible
>to prove a sequence is compressible merely by compressing it. For example,
>the nose-picking phrases I gave in my last post are compressible to
>"if pick nose get warts". Since most sequences are incompressible,
>the abilty to compress these phrases is itself sufficient reason to
>say that it is very unlikely that these phrases were generated by a
>stochastic process.
>
>This brings us back to "how long is long". We know that practically
>all sequences are incompressible or nearly so. This doesn't really
>get at your question directly though. Given our distinction between
>random sequence and random process, hereafter stochastic process for
>sake of clarity, I think the question of interest is "how long of a
>sequence do you need to determine (with reasonable certainty) whether a
>sequence was generated by a stochastic process?"
>
>Some time ago someone on bionet.info-theory suggested a way of looking
>at this that I have found very useful.

[interesting stuff about c-incompressibility snipped. I want to read it
and respond later]

>Bill again:=============================================================
>Still, Brian makes an excellent point in reminding us that a quite a
>number of scientific discoveries have begun with someone noting an
>anomaly in som data and _not_ attributing it to randomness. A good
>example might be the discovery of Uranus. Anomalies were noted
>(I believe) in the orbit of Neptune, and one possible explanation was an
>undiscovered planet between Saturn and Neptune. My understanding is that
>by doing some perturbation analysis, astronomers were able to determine
>where to look for Uranus.
>========================================================================
>
>I guess I shouldn't complain about making "an excellent point", however
>I believe the point I was making is much much stronger than this.
>It seems to me that a fundamental aspect of practically all of science
>is the attempt to compress the regularities observed in nature. This
>is in fact what a natural law is, a highly compressed description
>of regularities.

Agreed.
>
>Let's consider another example. Evolutionists make a big deal about
>about the patterns seen in the fossil record, claiming that these
>patterns give strong evidence for common descent. Were I to use
>the evolutionists fallacy I could simply say "so what". The pattern
>seen is no less likely to have resulted from a random placement
>of fossils here and there.

But what the evolutionists claim is that they see patterns that are
strongly suggestive of nonrandomness. But your point is well-taken: it's
very easy to attribute cohclusions we disagree with to "noise". To take
your example a bit further, a theist who doesn't accept the young-earth
creationist scenario could look at the fossil record and conclude that some
sort of coherent developmental plan seems to be documented -- which is
consistent with his belief in God as creator and sovereign. And he'd have
both evolutionists and young-earth creationists mad at him.
Bill Hamilton | Vehicle Systems Research
GM R&D Center | Warren, MI 48090-9055
810 986 1474 (voice) | 810 986 3003 (FAX)