Re: Evidence and proof; was More on Gosse's OMPHALOS

From: Iain Strachan (iain@istrachan.clara.co.uk)
Date: Thu Feb 15 2001 - 13:23:32 EST

  • Next message: John W Burgeson: "Re: More on Gosse's OMPHALOS"

    Hi, George,

    Perhaps I can try to answer some of these questions:

    <My comments about the curse of dimensionality being a problem to GA's
    snipped>
    >
    > I do not understand why the curse of dimensionally presents a problem to
    evolution.
    > Isn't it only a problem for
    > linear or parametric models?

    As I understand it, GA's were invented to solve the "global optimization"
    problem; you have a parametric model, coded for by a "genetic string", which
    is usually a string of binary digits. The problem with conventional
    optimization algorithms is that the hill-climbing procedures they adopt mean
    that, depending on the original guess at a solution, the nearest local
    optimum is reached, whether or not it happens to be the global one. The
    promise of the GA is that by maintaining an ensemble of solutions, and
    having randomising operations such as mutation and crossover, that you can
    get out of local minima, and not get stuck. However, the way it seems to
    work in practice is that the randomisation gives rise to a form of
    hill-climbing, mediated by the power of natural selection. This is exactly
    how it works in Richard Dawkins's famous "Methinks it is like a weasel"
    example from "The Blind Watchmaker". This is really a very simple problem
    to solve, as there are no local optima. Suppose, instead there are other
    local optima, corresponding to other strings. In order to get from the
    inerior optimum to the correct one, there are two possibilities:

    (1) Gradual change. This is virtually impossible because one has to
    descend from the local optimum to the saddle point on the surface before
    climbing the correct one. This involves many successive fitness-decreasing
    steps before things start to improve. Eventually, the pressure of natural
    selection would pull you back up to the wrong local minimum that you were
    stuck in.

    (2) A sudden "lucky" mutation, or random change, that pulled you into the
    basin of attraction of the correct solution. But this is where we fall foul
    of the curse of dimension. Suppose you have to get within a radius of R of
    the correct solution, and you happen to be 2R away. The volume of a
    hypersphere in N dimensions is proportional to R^N, and hence the chance of
    a random step getting to the right point is 2^-N, so if N=200, it isn'tgoing
    to happen on any timescale comparable to the age of the universe.

    > initial-condition dependent ensembles? Hence, GA's actually benefit
    from the so
    > called "blessing of dimensionally" in that there are lots solutions to
    choose from.
    >

    I believe the sort of thing they have been applied to is the "Travelling
    Salesman Problem", for which, in general, there aren't going to be many
    solutions at all. I don't know how successfully they have been applied to
    this problem, however.

    What I can report on is some work I did some years back in attempting to get
    a GA to train a neural network. In this case, there are in fact many
    equivalent (redundant) parametric solutions to choose from, by the simple
    procedure of re-ordering the so-called "hidden units" in the neural network.
    We thought that this would actually cause a problem for GA's, because
    performing a crossover between two neural nets that were swapped with
    respect to their hidden units would produce a nonsense (trivially, like
    combining two dogs with nose and ears, into one that had no nose and two
    sets of ears). So we invented a way of re-ordering the genes so the part of
    solution space was non-redundant, expecting to see a vast improvement in the
    performance. In fact it made no difference at all; the population
    statistics showed that even without the ordering, all the members of the
    population converged to a fixed order rapidly. It had nothing to do with
    crossover, just to do with population statistics. We had hoped to publish
    the results, but found to our dismay that someone else had already published
    and come to exactly the same conclusion.

    It was not long after that that we started to go off GA's as having much
    potential. A number of papers started to appear that demonstrated by
    empirical experimentation that the mathematical "justifications" for GA's
    working, which had been derived from biological considerations to do with
    natural evolution, were in fact not true. This was achieved by posing the
    problem in a way to violate the kind of conditions that the maths required,
    and finding that the algorithm behaved in exactly the same way.

    An example of this sort of thing can be found at:

    http://citeseer.nj.nec.com/baluja95removing.html

    where the role of the "Crossover" operator in the standard genetic algorithm
    is seriously called into question, because a much simpler population based
    algorithm without crossover outperforms a GA on a problem that had been
    tailor made to demonstrate the benefits of the crossover operator.

    I should point out here that I'm not saying that there is no use for GA's;
    I have used them with quite a bit of success on moderate sized problems
    (choosing the best combination of inputs for a neural network, given around
    24 possible choices. The GA found the same solution as an exhaustive search
    (2^24 choices) in around half an hour (the exhaustive search taking 48
    hours). But whether it would work as well for selecting from 100 choices, I
    do not know.

    The GA is a kind of "black box" algorithm that works reasonably well when
    you don't know what else to do. I've even been to a talk at a neural
    networks conference where the speaker apologised for using a GA, because he
    realised he was just invoking a black box without really knowing how it
    worked.

    Hence, although many people set great store by Dawkins's "Methinks it is
    like a weasel" example, I do not think either that, or GA's done properly
    are a justification for evolution.

    > In terms of training neural nets, since we train each other, perhaps it is
    a matter
    > of equality of complexity that is the
    > problem you allude to; i.e. "it takes one to train one." Isn't a GA less
    complex
    > than a many parameter neural net--since it is an algorithm?

    Of course we do "train each other" in terms of education, but clearly the
    human brain is already wired up at birth many sophisticate processing
    capabilities, like being able to breathe, cry, kick, coordinate muscles and
    so forth, recognise a human face, and so forth; the kind of tasks that make
    artificial neural networks look hopelessly trivial. But this is how it
    comes pre-wired at birth, so that level of "training" would have to have
    arisen through evolution if you don't accept the "Intelligent Design"
    hypothesis.

    As to whether the GA is less complex than the Neural net; I don't think
    that is the case, a neural net with around 50 parameters is also expressible
    as a simple algorithm. And in any case, the Levenburg-Marquardt algorithm
    for non-linear least squares fitting is also "just an algorithm", yet it is
    extremely effective at training a neural network.

    Hope that clarifies some things.
    Iain



    This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 13:28:38 EST