Re: Evidence and proof; was More on Gosse's OMPHALOS

From: Iain Strachan (iain@istrachan.clara.co.uk)
Date: Thu Feb 15 2001 - 13:23:32 EST

Next message: John W Burgeson: "Re: More on Gosse's OMPHALOS"

Previous message: Adrian Teo: "RE: So far, new genetics leave plenty of room for faith"
In reply to: George Andrews Jr.: "Re: Evidence and proof; was More on Gosse's OMPHALOS"
Next in thread: Iain Strachan: "Re: Evidence and proof; was More on Gosse's OMPHALOS"
Next in thread: gordon brown: "Re: More on Gosse's OMPHALOS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi, George,

Perhaps I can try to answer some of these questions:

<My comments about the curse of dimensionality being a problem to GA's
snipped>
>
> I do not understand why the curse of dimensionally presents a problem to
evolution.
> Isn't it only a problem for
> linear or parametric models?

As I understand it, GA's were invented to solve the "global optimization"
problem; you have a parametric model, coded for by a "genetic string", which
is usually a string of binary digits. The problem with conventional
optimization algorithms is that the hill-climbing procedures they adopt mean
that, depending on the original guess at a solution, the nearest local
optimum is reached, whether or not it happens to be the global one. The
promise of the GA is that by maintaining an ensemble of solutions, and
having randomising operations such as mutation and crossover, that you can
get out of local minima, and not get stuck. However, the way it seems to
work in practice is that the randomisation gives rise to a form of
hill-climbing, mediated by the power of natural selection. This is exactly
how it works in Richard Dawkins's famous "Methinks it is like a weasel"
example from "The Blind Watchmaker". This is really a very simple problem
to solve, as there are no local optima. Suppose, instead there are other
local optima, corresponding to other strings. In order to get from the
inerior optimum to the correct one, there are two possibilities:

(1) Gradual change. This is virtually impossible because one has to
descend from the local optimum to the saddle point on the surface before
climbing the correct one. This involves many successive fitness-decreasing
steps before things start to improve. Eventually, the pressure of natural
selection would pull you back up to the wrong local minimum that you were
stuck in.

(2) A sudden "lucky" mutation, or random change, that pulled you into the
basin of attraction of the correct solution. But this is where we fall foul
of the curse of dimension. Suppose you have to get within a radius of R of
the correct solution, and you happen to be 2R away. The volume of a
hypersphere in N dimensions is proportional to R^N, and hence the chance of
a random step getting to the right point is 2^-N, so if N=200, it isn'tgoing
to happen on any timescale comparable to the age of the universe.

> initial-condition dependent ensembles? Hence, GA's actually benefit
from the so
> called "blessing of dimensionally" in that there are lots solutions to
choose from.
>

I believe the sort of thing they have been applied to is the "Travelling
Salesman Problem", for which, in general, there aren't going to be many
solutions at all. I don't know how successfully they have been applied to
this problem, however.

What I can report on is some work I did some years back in attempting to get
a GA to train a neural network. In this case, there are in fact many
equivalent (redundant) parametric solutions to choose from, by the simple
procedure of re-ordering the so-called "hidden units" in the neural network.
We thought that this would actually cause a problem for GA's, because
performing a crossover between two neural nets that were swapped with
respect to their hidden units would produce a nonsense (trivially, like
combining two dogs with nose and ears, into one that had no nose and two
sets of ears). So we invented a way of re-ordering the genes so the part of
solution space was non-redundant, expecting to see a vast improvement in the
performance. In fact it made no difference at all; the population
statistics showed that even without the ordering, all the members of the
population converged to a fixed order rapidly. It had nothing to do with
crossover, just to do with population statistics. We had hoped to publish
the results, but found to our dismay that someone else had already published
and come to exactly the same conclusion.

It was not long after that that we started to go off GA's as having much
potential. A number of papers started to appear that demonstrated by
empirical experimentation that the mathematical "justifications" for GA's
working, which had been derived from biological considerations to do with
natural evolution, were in fact not true. This was achieved by posing the
problem in a way to violate the kind of conditions that the maths required,
and finding that the algorithm behaved in exactly the same way.

An example of this sort of thing can be found at:

http://citeseer.nj.nec.com/baluja95removing.html

where the role of the "Crossover" operator in the standard genetic algorithm
is seriously called into question, because a much simpler population based
algorithm without crossover outperforms a GA on a problem that had been
tailor made to demonstrate the benefits of the crossover operator.

I should point out here that I'm not saying that there is no use for GA's;
I have used them with quite a bit of success on moderate sized problems
(choosing the best combination of inputs for a neural network, given around
24 possible choices. The GA found the same solution as an exhaustive search
(2^24 choices) in around half an hour (the exhaustive search taking 48
hours). But whether it would work as well for selecting from 100 choices, I
do not know.

The GA is a kind of "black box" algorithm that works reasonably well when
you don't know what else to do. I've even been to a talk at a neural
networks conference where the speaker apologised for using a GA, because he
realised he was just invoking a black box without really knowing how it
worked.

Hence, although many people set great store by Dawkins's "Methinks it is
like a weasel" example, I do not think either that, or GA's done properly
are a justification for evolution.

> In terms of training neural nets, since we train each other, perhaps it is
a matter
> of equality of complexity that is the
> problem you allude to; i.e. "it takes one to train one." Isn't a GA less
complex
> than a many parameter neural net--since it is an algorithm?

Of course we do "train each other" in terms of education, but clearly the
human brain is already wired up at birth many sophisticate processing
capabilities, like being able to breathe, cry, kick, coordinate muscles and
so forth, recognise a human face, and so forth; the kind of tasks that make
artificial neural networks look hopelessly trivial. But this is how it
comes pre-wired at birth, so that level of "training" would have to have
arisen through evolution if you don't accept the "Intelligent Design"
hypothesis.

As to whether the GA is less complex than the Neural net; I don't think
that is the case, a neural net with around 50 parameters is also expressible
as a simple algorithm. And in any case, the Levenburg-Marquardt algorithm
for non-linear least squares fitting is also "just an algorithm", yet it is
extremely effective at training a neural network.

Hope that clarifies some things.
Iain

Next message: John W Burgeson: "Re: More on Gosse's OMPHALOS"
Previous message: Adrian Teo: "RE: So far, new genetics leave plenty of room for faith"
In reply to: George Andrews Jr.: "Re: Evidence and proof; was More on Gosse's OMPHALOS"
Next in thread: Iain Strachan: "Re: Evidence and proof; was More on Gosse's OMPHALOS"
Next in thread: gordon brown: "Re: More on Gosse's OMPHALOS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 13:28:38 EST