Re: pure chance

billgr@cco.caltech.edu
Fri, 3 Jan 1997 09:47:33 -0800 (PST)

Glenn Morton:

> > In your example *uestion, you
> >realize q is appropriate because you have English words in mind.
>
>
> The difference between the word "information" which has the connotation
> "meaning" and "information" defined by information theory can be illustrated
> by the following. What meaning is there in the sequence:
>
> "Yingwei ni zhi dao yingwen zi shou yi ni li jie '*uestion' xu yao q "?
>
> Doesn't convey much meaning to you but that is a loose translation of what
> Brian said written in the pin yin form of Chinese.
>
> The meaning is the same (loosely, I am not a great chinese
> translator) but the
> information content may not be the same. There are 93 characters in the
> English version of this sentence and 68 in the Chinese version. A true
> measure of information requires a lot of work but I bet that there is more
> information in the English version even though the meanings are approximately
> the same. The longer the sequence of characters usually the less compressible
> it is and thus the more information it has.

I'm not sure compressibility is a good measure for Shannon information. Taking
this translation as an example, the information conveyed by your writing the
above string, vs. some other, is given by the information in the ensemble of
probabilities of Chinese translations. There are probably thousands of
possible translations, each with their own probabilities of being chosen by
you. The fact that you chose *that* one then conveys exactly the amount of
information contained in the ensemble to begin with. If you had said "My
translation begins with 'Yingwei'" it would have conveyed (probably) less.
The point I've been trying to get at is that to 'do' Shannon information,
one needs to know what the ensemble is that one is talking about. In my
*uestion example, I was assuming that the ensemble was English words. Telling
you that I am picking an English word, and then telling you that it is 8
letters long and ends with 'uestion' leaves only one possibility--if random
strings of letters are allowed, chosen according to some distribution, then
the information given by picking one letter to put in front is equal to the
information in that distribution.

-Greg