[ ERCB Home |
New |
Feature |
Brief |
DDJ |
Letters |
Links
]
Genetic Programming
Review by Peter D. Varhol
Copyright (C) Dr. Dobb's Journal, January, 1994
We can all generally agree upon what a good computer program looks like.
It tends to be short, uses computer time efficiently, and certainly leads
to the one right answer under all circumstances. The algorithms it uses
are easy to understand, not convoluted, and expressed in the most concise
way possible. We can look at it, nod our heads, and say, "Yeah, this
makes sense." Well, throw away those preconceived notions of what a
proper computer program looks like when you open John Koza's Genetic
Programming: On the Programming of Computers by Means of Natural Selection.
This comprehensive book examines what genetic programming is all about,
how its concepts derive from the roots of human genetics, and how it can
be used to solve a wide variety of problems in system control, planning,
and decision support.
Genetic programming begins with program induction--the idea that, under
many circumstances, problem solving can be thought of as the discovery of
a computer program that produces the proper outputs when given the appropriate
inputs. The program itself is a black box--feed it the inputs, and it produces
the correct outputs for that input set. This obviously isn't true for all
computer programs, but this analogy fits well to a large number of programs
that perform computations or data manipulations. The process starts off
with the random selection of a number of possible programs. These are tested
against the problem, and the best ones are retained, while the worst are
discarded. The remaining ones are used to generate other programs, all of
which are then once again tested against the problem. If this is done correctly,
the average "fit" of the population increases, although the genetic
algorithms themselves are not concerned with average fitness, but rather,
the fitness of individual programs.
The interesting thing about genetic programming is that this program may
be neither the shortest, the most efficient, nor even the best. It is, however,
the most "fit" of those generated, in that it is the closest solution
to the problem, or the shortest, or the best in whatever measure of fitness
you care to use.
It is easy to see the analogy with classical genetics. The fittest programs
survive in each succeeding generation, until a threshold criterion is reached.
The threshold criterion may specify an average error rate, or average time
to reach a solution, or some other quantitative standard that fitness can
be measured against. Of course, unlike genetics in biology, genetic programming
must stop at some point in time, since the goal is to formulate a solution
to a problem.
There is even an analogy to mutation. At each generation of the program,
it is possible to introduce a random change to a random part of one or more
of the programs. In succeeding generations, a mutation may result in programs
that would never have been derived from the original algorithm.
Is genetic programming, then, guaranteed to give the optimum result? No,
according to Koza. It might, after a while, produce an answer that meets
the solution parameters of the problem, but there is no way of telling whether
or not there is a best answer, let alone whether we have found it.
So what good is this technique, if it is not guaranteed to give the optimal
or most-efficient answer? There are many problems, especially those with
nonlinear behavior, that do not easily lend themselves to any analytical
answer using conventional techniques. Chaotic processes such as the weather
or the structural behavior of materials may lend themselves well to genetic
techniques.
How do these genetic techniques work? Koza describes the genetic algorithm
in three steps:
- Randomly generate an initial population of computer programs capable
of solving the problem.
- Iteratively perform the following operations:
- Execute each program in the population and assign it a fitness value
according to how well it solves the problem.
- Create a new population of computer programs by applying the following
operations:
- Copy existing programs to the new population.
- Create new programs by genetically recombining randomly chosen
parts of two existing programs.
- Evaluate the resulting population of programs. The best program that
appears in any generation may be an acceptable solution to the problem.
One concern you might have about the genetic approach is that it may lead
into a blind alley. That is, a search path may prove to be promising at
first, but after succeeding generations, it may become clear that the algorithm
is not converging on an optimum range of solutions. No particular approach
is guaranteed to converge on a solution. One solution is the aforementioned
mutation. By randomly mutating one or more characteristics of the program,
it's possible to produce potential solutions that could not have been derived
from the genetic algorithm itself. If the mutations do not improve the overall
fitness of the programs in succeeding generations, they will gradually be
weeded out.
Representing the problem to be modeled genetically is also a concern. Koza
favors a binary string in his initial examples, which have some nice characteristics
from a genetics standpoint. However, he acknowledges that this is overly
simplistic, and does not well represent the characteristics of computer
programs. In more complex cases, he uses equations to represent behavior,
and subsequent generations of programs continually modify the equations
according to genetic principles. Since we're already used to representing
the behavior of a system with equations, this seems to be one appropriate
form of problem representation.
Once the problem representation is selected, the programmer has to develop
a measure of fitness. Fitness can be measured in a number of different ways
and is highly dependent upon the specification of the problem. If the desired
output from the program is known, fitness can be simply the difference between
the desired and actual outputs from one of the programs. If the goal is
to maximize the result, then fitness is measured by the magnitude of the
output. These steps are not automatic, and the programmer's selection of
the representation and the fitness measure can greatly affect the outcome.
Koza provides plenty of examples and explains them well, but setting up
a problem seems just as much an art as a science. An intuitive feel for
the problem may be just as important as good genetic technique. For example,
I applied Koza's explanation of symbolic regression to determine the appropriate
model form for data that predicts the amount of time needed to manufacture
a given item, given the number of items that have already been manufactured.
The goal of symbolic regression is to find the functional form, rather than
the functional coefficients, for a given set of finite data.
I'd expect this data to take a negative log form, since the problem effectively
describes the manufacturing learning curve. However, I defined several functional
forms as candidates: F={+, --, *, SIN, COS, EXP, LOG}. My measure of fitness
is the average error between the predicted times and the actual times. For
each form, I designed (like Koza, using Lisp) a set of four equations for
which I substituted one or more of these transformations. For example, one
of those generated was the simple linear-regression form (+ A (* B X)).
(A and B are the y-intercept and slope, respectively, calculated as a part
of each form.) The result after each generation was 24 different functional
forms. I threw out the 12 worst and combined aspects of the most-successful
12 in different ways. My best effort, after three generations, was (+ A
(* COS (B) (LOG (* X (-- 0 1))))), which is reasonably close to the negative
log answer I'd expect.
The genetic programming approach may be one of the best proposed so far
for the slippery concept of machine learning. The computer produces a range
of possible outputs, examines the algorithms that produced those outputs,
keeps the best ones for another trial, and produces more through the genetic-recombination
process. The system is clearly learning to produce a reasonable output.
Upon reflection, genetic programming seems comparable to supervised-learning
neural-network techniques. What's the difference between genetic programming
and this class of neural networks? Both have the capability of learning,
both deal well with nonlinear systems, and both can be viewed as a black
box. At one level, the two concepts are clearly related. Through successive
trials, each attempts to converge upon an acceptable solution. There are
also analogies to fitness, in that possible solutions are discarded if they
do not conform well to the expected output.
What is different is the learning algorithm. A neural network adjusts coefficient
values as the error is propagated back through the network. It has no "memory"
and can return to previous states based on succeeding inputs. The genetic
algorithm, on the other hand, tries to improve the overall population fitness
in each succeeding generation. Koza gives an example of a genetic technique
to choose the appropriate neural-network structure to solve a problem. For
anyone who has ever attempted to build a neural net before, this is an attractive
alternative to the usual trial-and-error approach. Lisp is Koza's language
of choice for experimentation in genetic programming. There are advantages
to using Lisp, not the least of which are the easy manipulation of symbols
and the ability to rapidly prototype different genetic structures.
Genetic programming won't replace traditional, procedural programming anytime
soon. Most of the programming problems we deal with have exact and readily
available solutions. Genetic programming is similar to the expert-system
paradigm in that it seeks acceptable, though not necessarily the best, solutions
to problems that aren't precisely defined or are nonlinear in nature.
Nonetheless, Genetic Programming is well worth the space on your
bookshelf. More and more problems we deal with have these characteristics,
and, though still in its conceptual infancy, genetic programming is a potentially
powerful approach. Koza's treatment is so comprehensive that, while this
may not be the last word on genetic programming, it may be the only word
you'll need.
Genetic Programming
John R. Koza
MIT Press, 1992, 819 pp, $55.00
ISBN 0-262-11170-5
Electronic Review of Computer Books
Created 5/1/96 / Last modified 6/4/96 / webmaster@ercb.com