RE: Information: Brad's reply (was Information: a very

Brian D Harper (bharper@postbox.acs.ohio-state.edu)
Tue, 30 Jun 1998 16:48:42 -0400

At 01:57 PM 6/30/98 +0800, Brad wrote:

[...]

>
>Brian,
>
>I'd like to add something about how information theory deals with meaning.
>

First of all, welcome to the group and thanks for your contributions.

Before getting specifically to your points here I would like to
take a look at your quiz question that you gave in another post:
_____________________________________________________________
Do you aggree or disagree with the following statments?

"Information theory is pure nonsense! Noise is usually modelled as a random
source and a random source contains the most information since all symbols
are equiprobable. Thus the most informative information source is noise."
_____________________________________________________________

This is a great example of what has been, IMHO, a plague in information
theory, namely a play on words. In this question the word information
is being used with two different meanings. If one were to insist on
using only one definition for information in this question and
were to take that as the technical definition, then the answer
is clear. The most informative source would be the one containing
the most information in the technical sense. Thus the most informative
source would be a stochastic one with equal probabilities for
all outcomes. This is very clear and becomes nonsense only when
one plays a word game and switches the meaning of informative
to its everyday meaning.

Now, random is another word which leads to great confusion. Not
only is the technical meaning different from that in everyday
conversation, the technical meaning can also vary depending on
the technical field. To illustrate, let me construct my own
quiz question motivated by part of your quiz above, namely

"... a random source contains the most information since all symbols
are equiprobable" -- Brad's quiz

Suppose you have a pair of fair dice. You roll these and then
compute the sum of the two results. This sum is the output of our
process, the possible outcomes ranging between 2 and 12.
Now for the quiz:

a) is this a random process?

b) are all outcomes equally probable?

Now let me use this quiz to illustrate a point. This is easiest
if we associate a letter to each of the possible outcomes above,
2=A, 3=B, 4=C, 5=D ... 12=K. Now suppose we actually generate
a "message" by the method suggested above. We keep rolling the
die and recording A-K according to the sum we get. Now, if we
were to assume each letter occurs with equal probability then
we would have about 3.46 bits per symbol as the information
content of a typical message. If we were clever enough to
figure out the real probability distribution for this process,
then we would be able to compress the information content to
about 3.27 bits per symbol. All this has nothing to do with
whether or not our message is "meaningful".

Now, what I really had in mind with this example is the following
statement you made in another post:

===============begin quote of Brad========
Information theory does not ascribe meaning to information. It does however
ascribe NO MEANING to any randomness or noise. Do you underand this?

Did you know it is possible to achieve better compression on a text if you
know what language it is? This shows that better models lead to more
accurate analysis of the information content.

An example of this is as follows:

If we compress text we can find a general information content of 4.75 bits
per symbol.

BUT if we know that text is going to be english we can refine this to 3.32
bits per symbol.
=================end quote===============

Now I'm sure that you, or someone else more knowledgeable in info
theory than myself :), will correct me if I'm wrong, but I was
under the impression that the compression you are referring to
above is accomplished knowing the statistical features of the
English language and that it has nothing whatsoever to do with
any "meaning" that those messages might have. This being the
case this compression would be directly analogous to my
dice example above. It is not the meaning in a message that
allows it to be compressed, it is the fact that not all
letters and words occur with equal probability.

OK, one last comment. Above you wrote:

"Information theory does not ascribe meaning to information.
It does however ascribe NO MEANING to any randomness or noise.
Do you underand this?" -- Brad

A fundamental result from algorithmic information theory (AIT)
is that it is impossible to prove that any particular sequence
is random. This is very interesting in view of the fact that
the vast majority of all the possible sequences *are* random.
impossible.

I was going to quote Shannon at this point, but fortunately
Greg beat me to it ;-).

Brian Harper
Associate Professor
Applied Mechanics
The Ohio State University

"It appears to me that this author is asking
much less than what you are refusing to answer"
-- Galileo (as Simplicio in _The Dialogue_)