Re: "Weasel" Cumulative Selection Demo

Wesley R. Elsberry (welsberr@orca.tamu.edu)
Thu, 26 Jun 97 05:32:53 CDT

I have a Perl script which performs the "weasel" simulation.

Features:
- Interactive or batch single run
- Parameters passed on command line for batch
- Can specify files to pick up "pool" and "target" strings from
- Produces basic statistics on the run
- Sends comma-delimited output to a log file
- For the target, can specify a target string length if collecting
the target from a file
- Follows the Dawkins method of generating an initial seed string
from which generation 2 is derived

Differences from Delphi Weasel
- A mismatch evaluation operator is provided, but not selectable
yet
- No "curses" type support, strictly a tty-style program

The Weasel Perl script is intended to demonstrate the relative
advantages of cumulative selection over random search for finding
targets in large search spaces. As a simulation of "evolutionary
process", this implementation, like Dawkins' original, is "misleading
in important ways". Please consult "The Blind Watchmaker" for the
complete discussion of what Weasel is intended -- and not intended --
to demonstrate.

Using Weasel

The files associated with Weasel are

weasel.pl -- The Weasel Perl script
mkbat.pl -- A script to produce a script for running large batches
pool.wsl -- A "pool" string file. Chop is used on the
first line of this file, so it had better have a
newline at the end.
target.wsl -- A "target" string file. Chop is used on the
first line of this file, so it had better have a
newline at the end.
weasel.log -- The Weasel log file, where comma-delimited information
about each run is appended. The comma-delimited
format is suitable for import into spreadsheets for
further analysis.

You will need a working Perl interpreter somewhere in your path.
Weasel was developed using bigperl on an MS-DOS machine, which
naturally led to the use of files rather than simple parameter
passing for long target strings.

The usage report

Typing "weasel.pl help" at the command line displays this message:

WEASEL. Copyright 1997 by W.R. Elsberry
Usage: WEASEL [popsize mut_rate {pool|file.wsl} {target|[length:]file.wsl}]

By running Weasel with no command-line arguments, Weasel interactively
asks for parameters for the run.

# weasel.pl
Population size? (100):
Mutation rate? (expressed as %) (6):
Target string? : METHINKS
Pool? : _ABCDEFGHIJKLMNOPQRSTUVWXYZ
Generation 1: VJSKLTCJ 44
Generation 2: MJSKLTCJ 35
Generation 3: MJSKLTKJ 27
Generation 4: MJSKLTKQ 20
Generation 5: MJSILKKQ 15
Generation 6: MJSILKKS 13
Generation 7: MFSILKKS 9
Generation 8: MFSILMKS 7
[...]

At each generation, the closest match to the target in the population
is displayed, along with its distance from the target.

There are two major ways (and several combinations) of using
command-line arguments with Weasel.

- Passing all values directly
- Passing filename information for "pool" and "target"

"Passing all values directly" sample

# weasel.pl 100 5.5 _ABCDEFGHIJKLMNOPQRSTUVWXYZ METHINKS

The above command line sets up a population size of 100, a
mutation rate of 5.5%, a pool containing all alphabetic
characters plus an underline pressed into duty as a space
to avoid problems with brain-dead OS argument handling,
and one eight-character target string.

"Passing filename information" examples

# weasel.pl 100 5.5 pool.wsl target.wsl

The above command line sets up a population size of 100, and a
mutation rate of 5.5%. The pool string is the first line of
text in the file "pool.wsl". The ".wsl" extension is how
filenames are distinguished from regular string values. The
target string is, unsurprisingly, taken as the first line of
text in the file "target.wsl".

$ weasel.pl 100 5.5 pool.wsl 8:target.wsl

The above command line sets up a population size of 100, and a
mutation rate of 5.5%. The pool string is the first line of text in
the file "pool.wsl". The target string is the first eight characters
from the first line of text in the file "target.wsl".

Because of the flexibility afforded by this scheme, you can
create batch files to collect data from repeated runs with
the same parameters or explore the solution space by varying
the parameters. A "mkbat.pl" script is provided as an
example of batch mode scripting.

The log file format explained

Here is an example line from the weasel.log file:

"LOG",19683,101,9,0.00045724737082761772585,194.88118811881187753,10,2,"THE","_ABCDEFGHIJKLMNOPQRSTUVWXYZ"

Field 1: The initial string was originally intended as a means of
extracting the comma-delimited strings from the stream of program
output.
Field 2: The searchspace size. searchspace = poolsize ** targetlength
Field 3: The total number of candidate strings considered.
Field 4: The number of unique candidate strings considered. This
statistic was frighteningly easy to collect using a Perl associative
array, and gives a straightforward quantification of the amount of
the searchspace which was explored.
Field 5: Proportion of searchspace explored by weasel search.
propsearchspace = uniquecandidates / searchspace.
Field 6: Relative advantage over random search.
reladvantage = searchspace / totalcandidates.
Field 7: Population size.
Field 8: Mutation rate, in percent.
Field 9: Target string, quoted.
Field 10: Pool string, quoted.

Weasel.pl ============================================================
#!/usr/bin/perl
# Modify the above line to fit your configuration

# Weasel in Perl
# Copyright 1997 by Wesley R. Elsberry
# Tool for exploring the issues of cumulative selection as an
# improvement over random search

# Collect parameters
#
# Target - The string to match
# Pool - The from which characters are selected
# Mutation rate - The probability that a character will be changed
# Population size - How many strings in the population

# Default values
$defpopsize = 100;
$defmutrate = 6;

# Global initialize
srand;
open(OUTFIL,">>weasel.log");

# Snag parameters, if any
while ($ARGV = shift) {
push(@args,$ARGV);
}

# Interactive single run
if (-1 >= $#args) {
print "Population size? ($defpopsize): ";
chop($popsize = <STDIN>);
if (0 >= $popsize) { $popsize = $defpopsize; }
print "Mutation rate? (expressed as %) ($defmutrate): ";
chop($mutrate = <STDIN>);
if (0 >= $mutrate) { $mutrate = $defmutrate; }
print "Target string? : ";
chop($target = <STDIN>);
print "Pool? : ";
chop($pool = <STDIN>);
$candcnt = &weasel_run($target,$pool,$mutrate,$popsize);
print "Number of candidates considered = $candcnt\n";
} elsif (3 == $#args) { # Assume the user knows what the user is doing
# Grab parameters
$popsize = $args[0];
$mutrate = $args[1];
$pool = $args[2];
$target = $args[3];

# Check for picking up the pool from a file
$_ = $pool;
if (/\.wsl$/) {
print "Found a weasel in the pool.\n";
open(INFIL,"<$_");
$_ = <INFIL>;
close(INFIL);
chop;
$pool = $_;
print "The pool is $pool\n";
}

# Check for picking up the target from a file
$_ = $target;
if (/\.wsl$/) {
print "Found a weasel at the target.\n";
# Check for a target length
if (/:/) {
($tlen,$tfile) = split(':',$_);
} else {
$tfile = $target;
}
open(INFIL,"<$tfile");
$_ = <INFIL>;
close(INFIL);
chop;
if (0 < $tlen) {
$target = substr($_,0,$tlen);
} else {
$target = $_;
}
print "The target is $target\n";
}
$candcnt = &weasel_run($target,$pool,$mutrate,$popsize);
print "Number of candidates considered = $candcnt\n";
} else {
print "WEASEL. Copyright 1997 by W.R. Elsberry\n";
print "Usage: WEASEL [popsize mut_rate {pool|file.wsl} {target|[length:]file.wsl}]\n";
}
close(OUTFIL);

# End of main routine

# Subprograms

# weasel_prob
# Routine to report on the probability that random search would
# find the target string
# Arguments: Target (string), Pool (string), Totalcand (scalar),
# Uniqcand (scalar)
sub weasel_prob {
local($target,$pool,$totalcand,$uniqcand) = @_;
local($totalrandprob,$searchspace,$weaselprob,$weaselrand);

# Find search space size
# Assume: each character in pool is unique
# each character in pool is equiprobable
$searchspace = length($pool) ** length($target);
$weaselprob = $uniqcand / $searchspace;
$weaselrand = $searchspace / $totalcand;
print "Search space size for\n Target=$target and\n Pool=$pool\n $searchspace\n";
print "Weasel search explored this much (total=1):\n $weaselprob\n";
print "Weasel search improvement over random search:\n $weaselrand\n";
print OUTFIL "\"LOG\",$searchspace,$totalcand,$uniqcand,$weaselprob,$weaselrand,";
}

# weasel_run
# Routine to do one run of the genetic string matching
# Arguments: Target (string), Pool (string), Mutation rate (scalar),
# Populatiion size (scalar)
sub weasel_run {
local($target,$pool,$mutrate,$popsize) = @_;
local($gencnt,$best,$bdiff,$cdiff,$ii,$candcnt,@pop);
local(%cand,$uniqcand);

$gencnt = 1;
$best = &replicate($target,$pool,100); # Random seed string
$bdiff = &distance($best,$target);
print "Generation $gencnt: $best $bdiff\n";

do {
# Generate a population
for ($ii=0; $ii < $popsize; $ii++) {
$pop[$ii] = &replicate($best,$pool,$mutrate);
# print "G$gencnt:$ii:$pop[$ii]\n";
$cand{$pop[$ii]}++;
}
# Find the new best string
$best = &pick_best_d($target,$popsize,@pop);
$bdiff = &distance($best,$target);
$gencnt++;
print "Generation $gencnt: $best $bdiff\n";
} until (0 == $bdiff);
$candcnt = ($gencnt-1)*$popsize + 1;
@keys = (keys %cand);
$uniqcand = $#keys + 1; # Index of last key plus 1 (base zero)
&weasel_prob($target,$pool,$candcnt,$uniqcand);
print OUTFIL "$popsize,$mutrate,\"$target\",\"$pool\"\n";
return($candcnt);
}

# replicate
# Routine to create a new string from a template string using a pool
# and a mutation rate
# Arguments: Subject (string), Pool (string), Mutrate (scalar)
sub replicate {
local($subject,$pool,$mutrate) = @_;
local($newsub,$ii,$jj,$mutate);

$newsub = "";
for ($ii=0; $ii < length($subject); $ii++) {
$mutate = rand() * 100.0;
if ($mutate < $mutrate) { # Pull character from pool
$jj = int(rand(length($pool)));
$newsub = $newsub . substr($pool,$jj,1);
} else { # Just copy from the subject
$newsub = $newsub . substr($subject,$ii,1);
}
}
return($newsub);
}

# pick_best_d
# Routine to find the best candidate string from a population using
# distance measure
# Arguments: Target (string), Popsize (scalar), Pop (array)
sub pick_best_d {
local($target,$popsize,@pop) = @_;
local($best,$bdiff,$cdiff,$ii);

$best = $pop[0]; # seed the best slot with a candidate

$bdiff = &distance($best,$target); # find the distance for the seed
# print("PBD: $best $bdiff $pop[0]\n");

for ($ii=0; $ii < $popsize; $ii++) {
$_ = $pop[$ii];
$cdiff = &distance($_,$target); # find the distance for the current string
if ($cdiff < $bdiff) {
$best = $_;
$bdiff = $cdiff;
}
# print("PBD: $best $bdiff $pop[$ii]\n");
}
return($best);
}

# distance
# Routine to return a distance measure of how dissimilar some
# string is from a target
# Arguments: Subject (string), Target (string)
sub distance {
local ($subject, $target) = @_; # Split parameters into local copies
local ($cnt) = 0;
local ($ii, $diff);

for ($ii = 0; $ii < length($subject); $ii++) {
$diff = ord(substr($target,$ii,1)) - ord(substr($subject,$ii,1));
if (0 < $diff) {
$cnt = $cnt + $diff;
} else {
$cnt = $cnt - $diff;
}
}
return($cnt);
}

# mismatch
# Routine to return the number of positions in which the subject
# string differs from the target string
sub mismatch {
local ($subject, $target) = @_; # Split parameters into local copies
local ($cnt) = 0;
local ($ii, $diff);

for ($ii = 0; $ii < length($subject); $ii++) {
$diff = ord(substr($target,$ii,1)) - ord(substr($subject,$ii,1));
if (0 != $diff) {
$cnt = $cnt + 1;
}
}
return($cnt);
}

End Weasel.pl =======================================================

Mkbat.pl ============================================================
#!/usr/bin/perl
#Modify above line for your configuration

for ($ii=10; $ii < 1200; $ii = $ii + 50) { # popsize
for ($jj=1; $jj <=10; $jj++) { # mutrate
for ($kk=3; $kk < 105; $kk = $kk + 5) { # target string length
print "weasel.pl $ii $jj pool.wsl $kk:longtarg.wsl > junk\n";
}
}
}
End mkbat.pl ========================================================

Have fun.

I would like to amass a database of the log file output. So if you
run this script and collect statistics, please send your log file to
me with "WEASEL LOG" in the subject line, or upload via anonymous ftp
to inia.tamug.tamu.edu. (Rename the logfile, please, so that none get
overwritten.)

Wesley