RE: ABCDEFGHIJKMNOPQRSTUVWXYZ

Glenn Morton (grmorton@waymark.net)
Wed, 31 Dec 1997 10:45:28 -0600

At 12:07 AM 12/31/97 -0500, Brian D Harper wrote:

>Now consider the infamous typing monkey. Everyone assumes,
>of course, that the monkey will strike each key with
>equal probability. But this only occurs for thought experiment
>monkeys. Real monkeys typing on real typewriters would
>not strike each key with equal probability due to spatial
>arrangement of the keys, physiological constraints etc.
>etc. What this means is that the output from a real typing
>monkey will be compressible. Let's suppose its only compressible
>by 10% or so. Given this and a sequence that's more than
>one or two hundred characters in length I could prove
>mathematically that it is virtually impossible for the
>monkey to have typed what it just typed. Note that this
>has nothing to do with the usual tactic of computing the
>probability of that *specific* sequence. It would be virtually
>impossible for the monkey to have typed *any* sequence that
>happens to be 10% compressible.

I think that this is the problem of using probability AFTER the fact rather
than BEFORE the fact. Consider a deck of cards. there are 10^70 different
combinations that the cards can occupy. You ask. What are the odds of
shuffling the deck and getting any given order? 1 in 10^70. So you shuffle
the deck. The cards are in a particular order. So is it impossible for you
to have achieved this card order? No. The cards had to be in some order
and it just happened to be this particular one. So to say that a monkey
couldn't have typed that order is like saying that the cards should be in NO
order because each order is too improbable. Probabilities only work in this
sense toward the future, not towards the past.

>Typing in the above refreshed my memory a bit. I'm afraid I
>botched what I wrote previously. The primary statistical
>features of a language (such as frequency at which characters
>occur) would not have to "hardwired" in as the compression
>algorithm would discover these on its own. One of the first
>things it will do is look at the frequency at which the various
>characters appear. If they occur at other than equal frequency,
>then it can be compressed. Please don't ask me how it works!

OK, I won't. :-) I couldn't tell you either. It still seems to me that
putting information into the compression algorithm in the form of
expectatios would constitute a transfer of information and thus would not
end up giving a good value for complexity. However, I do beleive that you
are correct that the compression algorithm really doesn't assume any given
frequency but discovers it. If there is equal frequency then the compression
does no good.
>
>>One other thing that was pointed out to me privately. Technically when we
>>speak of information in a sequence we are speaking of information density
>>not the quantity of information. I will continue in my bad habit of using
>>the term 'information' because, unfortunately all the players in this field
>>tend to use this terminology.
>>
>
>Hmm... I tend to always think in terms of algorithmic information
>since I am convinced that this is a really great measure of
>complexity. I'm guessing here, but I suspect that information
>density is a concept from Shannon info theory, bits/symbol.
>Did I guess right?
That would be my understanding.

glenn

Adam, Apes, and Anthropology: Finding the Soul of Fossil Man

and

Foundation, Fall and Flood
http://www.isource.net/~grmorton/dmd.htm