From: Iain Strachan (iain.strachan@eudoramail.com)
Date: Thu Nov 23 2000 - 18:37:44 EST
I promised on the "Dembski and Caesar Ciphers" thread that I would
not contribute further to the discussion because it risked repeating
myself and appearing tedious. However, I have found the discussion
most useful in developing my thinking about Dembski's "Intelligent
Design" methodology, and has served to convince me that the
methodology is not flawed. I would like to thank Glenn for provoking
me into thinking this out, so that I have a better understanding of
the ideas.
So I thought it would be a good idea to start off a new thread
summarising what I see as Glenn's position, and what is my take on it.
Here are the two positions:
Glenn's position:
Dembski's method is no good because it can never eliminate the
possibility of design. If I send him a text that is encoded with a
Vignere cipher that is the same length as the text, it will appear
random, and he will say it is undesigned, until I tell him that it is
designed. It is therefore totally useless because it fails to
discriminate between designed and undesigned.
My position:
Dembski's method only seeks to verify design that can be verified by
observing something that has low probability. If the methodology
fails to detect design, all it will say is that we can't make a
design inference. Saying "we cannot make a design inference" is not
the same as saying "we infer that it is not designed".
Furthermore, a filter that detects design, but on occasions fails to
detect it is still useful. Lets take the example of empirical data
modelling.
Suppose you send me 10 pairs of points (x(i), y(i)) for i=1 to 10.
You don't tell me anything about whether it's designed or not
designed.
I claim that I can tell you if there is a correlation between the
variables (for "correlation" read "design") simply via the
methodology of using Minimum Description Length. If I find such a
correlation, it is useful to me because I can interpolate between the
specified x(i) points to make new predictions from my model. Here's
how I do it.
First, I attempt to fit the data by a least squares technique. I
choose a parametrized mathematical model of for my data, and adjust
the parameters so as to minimize the sum of squares of the
differences between the model predictions for the y(i) and the actual
values y(i).
For simplicity, let's say I choose a polynomial, and vary the degree
of polynomial I fit.
After the fitting procedure, I then calculate the "description
length" for the data. This assumes that I wish to send a complete
specification of the y(i) given the x(i) to someone else in the
minimum message length. I perform my fit, and get an inexact
prediction of the y(i). So what I do is to send the other person the
values of the polynomial coefficients produced in the fitting
process, and also the values of the "residuals", i.e. the differences
between predicted and actual values for the data. If the fit is very
good, then the magnitude of the residuals will be much smaller than
the magnitude of the original y(i), and I can therefore transmit them
using less bits of information than the originals. The data length
of the model parameters plus the length in bits required to store the
residual errors is the "description length".
Suppose I do this for all polynomial orders from 1 (which has two
coefficients ( a(0) + a(1)x ) up to 9, which has 10 coefficients [
a(0) + a(1)x + a(2)x^2 + ... + a(9)x^9 ]. There is no point in
carrying it on further than this, because the message length for the
degree 9 polynomial fit will be equal to the original data, because
there are 10 coefficients, and the fit will be exact, so the
residuals are all zero. Any higher degree polynomial (and there
will be an infinite number of different ones even for a 10th degree
polynomial because the problem is underdetermined), will result in a
longer message length than the original data, because more than 10
coefficients will have to be transmitted.
Now, somewhere in the middle, there will be a polynomial that results
in the minimum message length. Suppose this is satisfied by a 4th
degree polynomial. Then that is what I shall choose as my "best
model" for fitting the data. It is a clear consequence of Occam's
razor that I should choose the model that has the minimum description
length; it is simply the most parsimonious description of the data.
Now the question is this. Once I have generated this model which has
the minimum message length, how do I then decide that there is a real
correlation between x and y, or if it's just random? I do it by
exactly the same counting argument that I used in an earlier post
about the coin tossing.
So suppose my message length is N bits, and the length of the message
to transmit the raw y(i)'s is M bits. Then the probability that I
can describe my data in N bits or less is at most 2^(N-M). If N and
M differ by, say 20 bits, then the probability comes to roughly 1 in
a million, and I'm pretty certain that it is a real correlation, and
that whoever gave me the data had used a mathematical function
(design) to generate the points, rather than choosing them randomly.
But what happens if I find that none of the polynomial fits manage to
reduce the message length of the data by a significant amount? Shall
I say that the data is "undesigned"? I shall not, because whoever
sent me the data could well come back and say that they used a 20th
degree polynomial to generate the data, that it therefore was
"designed", and that my conclusion of it being "undesigned" is false.
So I shall say that I have insufficient data to decide whether it was
designed or not.
According to Glenn's objections to Dembski, my methodology is totally
useless, because all the data sequences it examines could be
candidates for design, and my method fails to detect design for the
case of the data being generated by a polynomial of 10th order or
more, till he tells me the values of the polynomial coefficients he
used to generate the data.
My reply to this is that of course it's still useful. If I get a set
of data that I can get a good fit on using a third degree polynomial,
I can then go on and use it to make useful predictions. This is
exactly what any scientist does when s/he makes a parametric model
based on the empirically obtained data, and tunes the coefficients to
give the best fit. Is this technique no use at all because it
occasionally fails to pick a genuine correlation (design) because
there is insufficient data?
--- Iain.Strachan@eudoramail.com is a free email account I use for posting to public forums. To contact me personally, please write to:iain.g.d.strachan AT ntlworld DOT com
I am not responsible for the advert which follows this signature ...
Join 18 million Eudora users by signing up for a free Eudora Web-Mail account at http://www.eudoramail.com
This archive was generated by hypermail 2.1.4 : Sat Nov 23 2002 - 13:49:17 EST