Design detection and minimum description length

From: Iain Strachan (iain.strachan@eudoramail.com)
Date: Thu Nov 23 2000 - 18:37:44 EST

  • Next message: Stein A. Stromme: "Re: Design detection and minimum description length"

    I promised on the "Dembski and Caesar Ciphers" thread that I would
    not contribute further to the discussion because it risked repeating
    myself and appearing tedious. However, I have found the discussion
    most useful in developing my thinking about Dembski's "Intelligent
    Design" methodology, and has served to convince me that the
    methodology is not flawed. I would like to thank Glenn for provoking
    me into thinking this out, so that I have a better understanding of
    the ideas.

    So I thought it would be a good idea to start off a new thread
    summarising what I see as Glenn's position, and what is my take on it.
    Here are the two positions:

    Glenn's position:

    Dembski's method is no good because it can never eliminate the
    possibility of design. If I send him a text that is encoded with a
    Vignere cipher that is the same length as the text, it will appear
    random, and he will say it is undesigned, until I tell him that it is
    designed. It is therefore totally useless because it fails to
    discriminate between designed and undesigned.

    My position:

    Dembski's method only seeks to verify design that can be verified by
    observing something that has low probability. If the methodology
    fails to detect design, all it will say is that we can't make a
    design inference. Saying "we cannot make a design inference" is not
    the same as saying "we infer that it is not designed".

    Furthermore, a filter that detects design, but on occasions fails to
    detect it is still useful. Lets take the example of empirical data
    modelling.

    Suppose you send me 10 pairs of points (x(i), y(i)) for i=1 to 10.
    You don't tell me anything about whether it's designed or not
    designed.

    I claim that I can tell you if there is a correlation between the
    variables (for "correlation" read "design") simply via the
    methodology of using Minimum Description Length. If I find such a
    correlation, it is useful to me because I can interpolate between the
    specified x(i) points to make new predictions from my model. Here's
    how I do it.

    First, I attempt to fit the data by a least squares technique. I
    choose a parametrized mathematical model of for my data, and adjust
    the parameters so as to minimize the sum of squares of the
    differences between the model predictions for the y(i) and the actual
    values y(i).

    For simplicity, let's say I choose a polynomial, and vary the degree
    of polynomial I fit.

    After the fitting procedure, I then calculate the "description
    length" for the data. This assumes that I wish to send a complete
    specification of the y(i) given the x(i) to someone else in the
    minimum message length. I perform my fit, and get an inexact
    prediction of the y(i). So what I do is to send the other person the
    values of the polynomial coefficients produced in the fitting
    process, and also the values of the "residuals", i.e. the differences
    between predicted and actual values for the data. If the fit is very
    good, then the magnitude of the residuals will be much smaller than
    the magnitude of the original y(i), and I can therefore transmit them
    using less bits of information than the originals. The data length
    of the model parameters plus the length in bits required to store the
    residual errors is the "description length".

    Suppose I do this for all polynomial orders from 1 (which has two
    coefficients ( a(0) + a(1)x ) up to 9, which has 10 coefficients [
    a(0) + a(1)x + a(2)x^2 + ... + a(9)x^9 ]. There is no point in
    carrying it on further than this, because the message length for the
    degree 9 polynomial fit will be equal to the original data, because
    there are 10 coefficients, and the fit will be exact, so the
    residuals are all zero. Any higher degree polynomial (and there
    will be an infinite number of different ones even for a 10th degree
    polynomial because the problem is underdetermined), will result in a
    longer message length than the original data, because more than 10
    coefficients will have to be transmitted.

    Now, somewhere in the middle, there will be a polynomial that results
    in the minimum message length. Suppose this is satisfied by a 4th
    degree polynomial. Then that is what I shall choose as my "best
    model" for fitting the data. It is a clear consequence of Occam's
    razor that I should choose the model that has the minimum description
    length; it is simply the most parsimonious description of the data.

    Now the question is this. Once I have generated this model which has
    the minimum message length, how do I then decide that there is a real
    correlation between x and y, or if it's just random? I do it by
    exactly the same counting argument that I used in an earlier post
    about the coin tossing.

    So suppose my message length is N bits, and the length of the message
    to transmit the raw y(i)'s is M bits. Then the probability that I
    can describe my data in N bits or less is at most 2^(N-M). If N and
    M differ by, say 20 bits, then the probability comes to roughly 1 in
    a million, and I'm pretty certain that it is a real correlation, and
    that whoever gave me the data had used a mathematical function
    (design) to generate the points, rather than choosing them randomly.

    But what happens if I find that none of the polynomial fits manage to
    reduce the message length of the data by a significant amount? Shall
    I say that the data is "undesigned"? I shall not, because whoever
    sent me the data could well come back and say that they used a 20th
    degree polynomial to generate the data, that it therefore was
    "designed", and that my conclusion of it being "undesigned" is false.
    So I shall say that I have insufficient data to decide whether it was
    designed or not.

    According to Glenn's objections to Dembski, my methodology is totally
    useless, because all the data sequences it examines could be
    candidates for design, and my method fails to detect design for the
    case of the data being generated by a polynomial of 10th order or
    more, till he tells me the values of the polynomial coefficients he
    used to generate the data.

    My reply to this is that of course it's still useful. If I get a set
    of data that I can get a good fit on using a third degree polynomial,
    I can then go on and use it to make useful predictions. This is
    exactly what any scientist does when s/he makes a parametric model
    based on the empirically obtained data, and tunes the coefficients to
    give the best fit. Is this technique no use at all because it
    occasionally fails to pick a genuine correlation (design) because
    there is insufficient data?

    ---
    Iain.Strachan@eudoramail.com is a free email
    account I use for posting to public forums.
    To contact me personally, please write to:
    

    iain.g.d.strachan AT ntlworld DOT com

    I am not responsible for the advert which follows this signature ...

    Join 18 million Eudora users by signing up for a free Eudora Web-Mail account at http://www.eudoramail.com



    This archive was generated by hypermail 2.1.4 : Sat Nov 23 2002 - 13:49:17 EST