Re: Modelling of abiogenesis

Brian D. Harper (bharper@magnus.acs.ohio-state.edu)
Mon, 29 Apr 1996 15:06:12 -0400

At 02:16 PM 4/29/96 GMT, David Tyler wrote:
>Abstract: This post draws attention to a computer model of
>chemical evolution.
>
>Brian Harper's various posts on the Chemical Evolution issue and
>on Yockey's contributions have been much appreciated by myself.=20
>I noticed an item in New Scientist, 13th April 1996, page 16:
>"'Life' crawls out of the digital soup" by Paul Guinnessy.

I have taken just a brief look at the Physica D article written
by Pargellis but haven't looked at the New Scientist write-up
yet. I also found a electronic version of an article in SIGNAL=20
Magazine describing this work which I will post below.

I noticed in the Physica D article that the author will supply
the computer code (in basic) upon request. I've been searching
(so far in vain) for Pargellis e-mail address but haven't been
able to find it. Anyone know an efficient way of doing this?
Also, if I'm able to get the code, is there anyone interested
in playing around with it? This seems the best way of getting
some appreciation of what the simulation does and doesn't do.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D

downloaded from:

http://www.us.net/signal/CurrentIssue/April/Digital-apr.html

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
April 1996

=A9SIGNAL Magazine 1996

Digital Organisms Emerge From Random Code Soup

Laboratories blend computer science, mathematics and biology to=20
breed software with a genealogy.

By Andrew C. Braunberg

Telecommunications researchers are nurturing self-replicating=20
sequences of computer code that rise spontaneously out of random=20
collections of processing operations. These pieces of code, called=20
digital organisms, compete for sustenance--memory space and=20
computer processor unit time--in a battle to propagate their=20
binary genetic code to offspring.=20

Early research has created a rich environment of self-contained=20
self-replicators and viruses, which are defined as sequences of=20
code that require outside resources to replicate. Both types of=20
software organism evolve first by shedding unnecessary code and=20
then by acquiring subroutines that accelerate replication. This=20
ability to optimize software may have broad and important=20
application in the telecommunications industry, although commercial=20
systems could require a decade to develop, according to researchers.=20

The goal of the research, according to the peer- review journal=20
Physica-D, is to determine the factors that optimize the probability=20
of generating self-replicators and to document their evolution.=20
Dr. Andrew N. Pargellis, a scientist at Bell Laboratories, Murray=20
Hill, New Jersey, initiated the research.=20

The self-replication of digital organisms, or cells, involves four=20
main phases. A parent cell allocates memory for a child cell into=20
which the parent can copy its code. The parent's code is copied,=20
one operation at a time, into the new memory location. Next, the=20
child must separate from the parent. This digital push from the
nest involves initiating the child's apparatus for interacting=20
with its environment. Default values for registers--memory=20
locations for storing information about the cell--are set, and=20
execution code is installed. Each cell uses an instruction pointer=20
to execute operations and to track the cell's location in computer=20
memory. The final step for the parent is to reset its own pointer=20
and registers for a new replication cycle.=20

Cells vary from one to 30 operations in length. Sixteen operations,=20
which constitute the universe of possible functions, are available=20
to the cells. Most of the operations perform general computer=20
commands, such as resetting registers, jumping forward or backward=20
in the code sequence and comparing arithmetic values. The three most
important commands control replication by allocating new memory, by=20
copying operations from the parent to the child and by setting=20
the default values of the child's registers. More than one copy=20
statement could exist in a cell. An organism with four copy=20
statements could copy itself four times faster than a cell with=20
one copy command.=20

Pargellis starts each experiment--more than 100 have been completed
--by generating 40 random sequences of code, or cells. Initially,=20
the cells have a uniform size distribution of one to 25 operations.=20
The stringing together of building blocks mimics biological systems,=20
Pargellis states. For example, proteins form by sequencing a set of=20
20 amino acids. The system allows a maximum of 1,000 cells with no=20
more than 30 operations per cell. The 1,000 cells are ordered=20
numerically in a memory stack.=20

The chance of creating a randomly generated cell capable of self-
replication is one in 10,000. If none of the existing cells is=20
capable of reproduction, then the program will add an additional=20
40 cells to the environment after an established time period. A
typical cell initially may require 50 instructions to copy itself,=20
Pargellis says.=20

Each cell executing its operations simultaneously would mimic=20
most closely a real biological system, Pargellis notes. This is=20
impossible on a desktop work station, however. He adds that the=20
program runs on a Pentium-equipped personal computer. The program=20
instead performs five operations on the first cell and then
five on the next until the whole population has been visited.=20
After one cycle, the next five operations of each cell are=20
performed in sequence.=20

These operation cycles continue until all 1,000 memory slots are=20
filled. Then a new generation is initiated with the execution=20
of a "reaper" routine, which kills off approximately one third=20
of the cells. The reaper routine begins each generation. Remaining=20
cells then are shifted down in the memory stack. This results in=20
an older average age for the cells at the bottom of the stack.=20
The reaper program targets these older cells first. Approximately=20
50 of the newly emptied memory locations are filled with randomly=20
generated sequences of operations. The remaining sites eventually=20
are filled with code offspring. When all the sites are filled,=20
the reaper returns, and a new generation begins.=20

According to Pargellis, the parameters for maximizing the spontaneous=20
generation of viable cells are few and well understood. The number=20
of unique cells that can be created from a collection of operations=20
is a function of the number of unique operations available and the=20
number of operations allowed in each cell. Increasing the diversity=20
of operations or the length of allowable sequences increases the
complexity of the environment. By carefully selecting the operations=20
available to cells and by limiting cell length, researchers maximize=20
the percentage of self-replicating cells. Cells with as few as five=20
operations can evolve to reproduce without outside resources.=20

Additionally, address registers must be minimized, and default values=20
for all registers and pointers must be set. If cells have the ability=20
to capture pointers from other cells, they can dominate the environment=20
more quickly. Time penalties are imposed on cells that improperly=20
execute instructions. This creates a fitness barrier that weeds out=20
poor replicators. The density of randomly generated cells in the=20
environment also is limited, because in the early stages of cell=20
evolution, cells often trade resources with their nearest neighbors=20
in a non-parasitic manner. Therefore some population stability must=20
exist for this method to succeed.=20

Each cell is allocated a slice of computer processor unit time=20
during each generation. The time allotted for each cell is=20
proportional to its size. By reducing the time slightly, cells=20
can be driven to smaller size over time, Pargellis claims.=20
Evolution proceeds primarily through the introduction of random=20
mutations in offspring. Approximately 10 percent of the offspring=20
in each new generation will have at least one operational mutation.=20
Sixty percent of these mutations are substitutional, and 40 percent=20
are split evenly between insertions and deletions of operations.=20
Ninety-seven percent of these mutations result in code that cannot
replicate successfully. A gene bank records the genotype of any=20
cell capable of generating a grandchild--in other words, a cell=20
that is viable for at least two generations.=20

Evolution also is driven by withholding computer processor unit time=20
from cells that execute operations improperly. A cell might lose=20
some or all of its computer processor unit time depending on the=20
infraction. For example, a cell would lose all of its computer=20
processor unit time for attempting to initialize an offspring's
registers before the parent copied its operations to the offspring.=20
These time penalties create a fitness landscape.=20

In this environment, fitness is a measure of reproduction efficiency=20
(SIGNAL, December 1994, page 27). It measures the time required to=20
generate offspring. Cells increase fitness, either by eliminating=20
useless code or by mutating useless code into useful code. Examples=20
of increasing efficiency are the avoidance of time penalties or the=20
incorporation of additional copy operations. As more efficient cells
are created, randomly generated cells find it more difficult to gain=20
a foothold in the environment because of their initial reproductive=20
inefficiencies.=20

Punctuated equilibrium has been observed in the system. In this type=20
of evolution, the average size of the organism drops quickly as the=20
cell sheds useless code. A block of unexecuted or useless code is=20
called an intron.=20

Introns in the biological world have lasted at least a billion years.=20
A protein in humans, called beta-globulin, has the same introns as=20
the beta-globulin in corn plants. The corn plant and vertebrates=20
diverged from each other about a billion years ago, Pargellis states.=20
The DNA in both corn and humans has mutated greatly over this time,=20
but not the introns.=20

In the computer experiment, most of the introns are useless for=20
the lifetime of the cell. But in some cases, introns--not executed=20
because of a jump-forward command--have mutated over time and have=20
became useful. For example, an intron could mutate into a copy=20
instruction. If in the future the jump operation is eliminated,=20
the cell would have an extra copy instruction, thus increasing its
reproductive efficiency.=20

A different tack for quicker reproduction is for a cell to hijack=20
a neighbor cell's computer processor unit time or to use a neighbor's=20
memory for additional offspring. One way of grabbing more computer=20
processor unit time is by trapping an adjacent cell's pointer.=20

A virus requires resources or instructions from at least one other=20
program. Most viruses that emerge are small, with only two or three=20
instructions. As the environment matures and organisms evolve,=20
colonies of self-replicators often are beset with clouds of swarming=20
parasites.=20

A true self-replicator is a program that does not need any other=20
program's resources to copy itself. The minimum size for these=20
self-contained units is five operations.=20

This interaction between self replicators and parasites is displayed=20
with an interaction grid on the researcher's computer screen. The grid=20
is 40 lines x 24 lines and contains at least one cell at each grid=20
point. More than one cell can occupy the same position if they each=20
share resources.=20

A cell manages its metabolism with the help of numerical and address=20
registers. Each cell has four registers--two numerical and two=20
address. The numerical register A defaults to the value 1, and the=20
numerical register B defaults to the number of operations in the cell.=20
When the cell encounters a copy statement, it copies the operation=20
corresponding to the number in the A register. For example, it=20
initially will copy operation 1. Then the A register value is=20
incremented by one. On the next copy command, the second operation=20
will be copied to the memory location allocated for the child.=20
The A register acts as a copy counter.=20

The address registers C and D are used for jump statements. C is used=20
as the jump location. If a jump command is encountered, the program=20
will move to the operation at the site listed in register C. Pargellis=20
notes that the D register is not used often and in hindsight probably=20
is not needed. He adds that the use of site-specific jump statements=20
mimics enzymes that search for specific sites on DNA strings.=20

Pargellis states that evolutionary algorithms are very powerful and=20
that, given enough time, they ultimately can provide a best solution=20
to any given problem. The difficulty is that they currently take a=20
great deal of time to do simple tasks. Initial applications may be=20
in streamlining existing computer code.=20

Questions remain to be answered, Pargellis emphasizes, such as what=20
sort of interactions between digital organisms might enable researchers=20
to optimize a particular problem. He adds that this knowledge could=20
enable multiple programs to interact and change large pieces of code=20
faster than existing methods. Running as a background application,=20
digital organisms continually may upgrade and improve existing software.=20

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Brian Harper | "I can't take my guesses back
Associate Professor | That I based on almost facts
Applied Mechanics | That ain't necessarily so"
Ohio State University | -- Willie Nelson
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D