Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Genetic Programming

Genetic Programming, GP.

Taxonomy

The Genetic Programming algorithm is an example of an Evolutionary Algorithm and belongs to the field of Evolutionary Computation and more broadly Computational Intelligence and Biologically Inspired Computation. The Genetic Programming algorithm is a sibling to other Evolutionary Algorithms such as the Genetic Algorithm, Evolution Strategies, Evolutionary Programming, and Learning Classifier Systems. Technically, the Genetic Programming algorithm is an extension of the Genetic Algorithm. The Genetic Algorithm is a parent to a host of variations and extensions.

Inspiration

The Genetic Programming algorithm is inspired by population genetics (including heredity and gene frequencies), and evolution at the population level, as well as the Mendelian understanding of the structure (such as chromosomes, genes, alleles) and mechanisms (such as recombination and mutation). This is the so-called new or modern synthesis of evolutionary biology.

Metaphor

Individuals of a population contribute their genetic material (called the genotype) proportional to their suitability of their expressed genome (called their phenotype) to their environment. The next generation is created through a process of mating that involves genetic operators such as recombination of two individuals genomes in the population and the introduction of random copying errors (called mutation). This iterative process may result in an improved adaptive-fit between the phenotypes of individuals in a population and the environment.

Programs may be evolved and used in a secondary adaptive process, where an assessment of candidates at the end of that secondary adaptive process is used for differential reproductive success in the first evolutionary process. This system may be understood as the inter-dependencies experienced in evolutionary development where evolution operates upon an embryo that in turn develops into an individual in an environment that eventually may reproduce.

Strategy

The objective of the Genetic Programming algorithm is to use induction to devise a computer program. This is achieved by using evolutionary operators on candidate programs with a tree structure to improve the adaptive fit between the population of candidate programs and an objective function. An assessment of a candidate solution involves its execution.

Procedure

Algorithm (below) provides a pseudocode listing of the Genetic Programming algorithm for minimizing a cost function, based on Koza and Poli's tutorial [Koza2005].

The Genetic Program uses LISP-like symbolic expressions called S-expressions that represent the graph of a program with function nodes and terminal nodes. While the algorithm is running, the programs are treated like data, and when they are evaluated they are executed. The traversal of a program graph is always depth first, and functions must always return a value.

Input: $Population_{size}$, $nodes_{func}$, $nodes_{term}$, $P_{crossover}$, $P_{mutation}$, $P_{reproduction}$, $P_{alteration}$
Output: $S_{best}$
Population $\leftarrow$ InitializePopulation{$Population_{size}$, $nodes_{func}$, $nodes_{term}$}
EvaluatePopulation{Population}
$S_{best}$ $\leftarrow$ GetBestSolution{Population}
While ($\neg$StopCondition())
    Children $\leftarrow \emptyset$
    While (Size{Children} $<$ $Population_{size}$)
        Operator $\leftarrow$ SelectGeneticOperator{$P_{crossover}$, $P_{mutation}$, $P_{reproduction}$, $P_{alteration}$}
        If (Operator $\equiv$ CrossoverOperator)
            $Parent_{1}$, $Parent_{2}$ $\leftarrow$ SelectParents{Population, $Population_{size}$}
            $Child_{1}$, $Child_{2}$ $\leftarrow$ Crossover{$Parent_{1}$, $Parent_{2}$}
            Children $\leftarrow$ $Child_{1}$
            Children $\leftarrow$ $Child_{2}$
        ElseIf (Operator $\equiv$ MutationOperator)
            $Parent_{1}$ $\leftarrow$ SelectParents{Population, $Population_{size}$}
            $Child_{1}$ $\leftarrow$ Mutate{$Parent_{1}$}
            Children $\leftarrow$ $Child_{1}$
        ElseIf (Operator $\equiv$ ReproductionOperator)
            $Parent_{1}$ $\leftarrow$ SelectParents{Population, $Population_{size}$}
            $Child_{1}$ $\leftarrow$ Reproduce{$Parent_{1}$}
            Children $\leftarrow$ $Child_{1}$
        ElseIf (Operator $\equiv$ AlterationOperator)
            $Parent_{1}$ $\leftarrow$ SelectParents{Population, $Population_{size}$}
            $Child_{1}$ $\leftarrow$ AlterArchitecture{$Parent_{1}$}
            Children $\leftarrow$ $Child_{1}$
        End
    End
    EvaluatePopulation{Children}
    $S_{best}$ $\leftarrow$ GetBestSolution{Children, $S_{best}$}
    Population $\leftarrow$ Children
End
Return ($S_{best}$)

Pseudocode for Genetic Programming.

Heuristics

The Genetic Programming algorithm was designed for inductive automatic programming and is well suited to symbolic regression, controller design, and machine learning tasks under the broader name of function approximation.
Traditionally Lisp symbolic expressions are evolved and evaluated in a virtual machine, although the approach has been applied with compiled programming languages.
The evaluation (fitness assignment) of a candidate solution typically takes the structure of the program into account, rewarding parsimony.
The selection process should be balanced between random selection and greedy selection to bias the search towards fitter candidate solutions (exploitation), whilst promoting useful diversity into the population (exploration).
A program may respond to zero or more input values and may produce one or more outputs.
All functions used in the function node set must return a usable result. For example, the division function must return a sensible value (such as zero or one) when a division by zero occurs.
All genetic operations ensure (or should ensure) that syntactically valid and executable programs are produced as a result of their application.
The Genetic Programming algorithm is commonly configured with a high-probability of crossover ($\geq 90%$) and a low-probability of mutation ($\leq 1%$). Other operators such as reproduction and architecture alterations are used with moderate-level probabilities and fill in the probabilistic gap.
Architecture altering operations are not limited to the duplication and deletion of sub-structures of a given program.
The crossover genetic operator in the algorithm is commonly configured to select a function as a the cross-point with a high-probability ($\geq 90%$) and low-probability of selecting a terminal as a cross-point ($\leq 10%$).
The function set may also include control structures such as conditional statements and loop constructs.
The Genetic Programing algorithm can be realized as a stack-based virtual machine as opposed to a call graph [Perkis1994].
The Genetic Programming algorithm can make use of Automatically Defined Functions (ADFs) that are sub-graphs and are promoted to the status of functions for reuse and are co-evolved with the programs.
The genetic operators employed during reproduction in the algorithm may be considered transformation programs for candidate solutions and may themselves be co-evolved in the algorithm [Angeline1996].

Code Listing

Listing (below) provides an example of the Genetic Programming algorithm implemented in the Ruby Programming Language based on Koza and Poli's tutorial [Koza2005].

The demonstration problem is an instance of a symbolic regression, where a function must be devised to match a set of observations. In this case the target function is a quadratic polynomial $x^2+x+1$ where $x \in [-1,1]$. The observations are generated directly from the target function without noise for the purposes of this example. In practical problems, if one knew and had access to the target function then the genetic program would not be required.

The algorithm is configured to search for a program with the function set ${ +, -, \times, \div }$ and the terminal set ${ X, R }$, where $X$ is the input value, and $R$ is a static random variable generated for a program $X \in [-5,5]$. A division by zero returns a value of one. The fitness of a candidate solution is calculated by evaluating the program on range of random input values and calculating the Root Mean Squared Error (RMSE). The algorithm is configured with a 90% probability of crossover, 8% probability of reproduction (copying), and a 2% probability of mutation. For brevity, the algorithm does not implement the architecture altering genetic operation and does not bias crossover points towards functions over terminals.

def rand_in_bounds(min, max)
  return min + (max-min)*rand()
end

def print_program(node)
  return node if !node.kind_of?(Array)
  return "(#{node[0]} #{print_program(node[1])} #{print_program(node[2])})"
end

def eval_program(node, map)
  if !node.kind_of?(Array)
    return map[node].to_f if !map[node].nil?
    return node.to_f
  end
  arg1, arg2 = eval_program(node[1], map), eval_program(node[2], map)
  return 0 if node[0] === :/ and arg2 == 0.0
  return arg1.__send__(node[0], arg2)
end

def generate_random_program(max, funcs, terms, depth=0)
  if depth==max-1 or (depth>1 and rand()<0.1)
    t = terms[rand(terms.size)]
    return ((t=='R') ? rand_in_bounds(-5.0, +5.0) : t)
  end
  depth += 1
  arg1 = generate_random_program(max, funcs, terms, depth)
  arg2 = generate_random_program(max, funcs, terms, depth)
  return [funcs[rand(funcs.size)], arg1, arg2]
end

def count_nodes(node)
  return 1 if !node.kind_of?(Array)
  a1 = count_nodes(node[1])
  a2 = count_nodes(node[2])
  return a1+a2+1
end

def target_function(input)
  return input**2 + input + 1
end

def fitness(program, num_trials=20)
  sum_error = 0.0
  num_trials.times do |i|
    input = rand_in_bounds(-1.0, 1.0)
    error = eval_program(program, {'X'=>input}) - target_function(input)
    sum_error += error.abs
  end
  return sum_error / num_trials.to_f
end

def tournament_selection(pop, bouts)
  selected = Array.new(bouts){pop[rand(pop.size)]}
  selected.sort!{|x,y| x[:fitness]<=>y[:fitness]}
  return selected.first
end

def replace_node(node, replacement, node_num, cur_node=0)
  return [replacement,(cur_node+1)] if cur_node == node_num
  cur_node += 1
  return [node,cur_node] if !node.kind_of?(Array)
  a1, cur_node = replace_node(node[1], replacement, node_num, cur_node)
  a2, cur_node = replace_node(node[2], replacement, node_num, cur_node)
  return [[node[0], a1, a2], cur_node]
end

def copy_program(node)
  return node if !node.kind_of?(Array)
  return [node[0], copy_program(node[1]), copy_program(node[2])]
end

def get_node(node, node_num, current_node=0)
  return node,(current_node+1) if current_node == node_num
  current_node += 1
  return nil,current_node if !node.kind_of?(Array)
  a1, current_node = get_node(node[1], node_num, current_node)
  return a1,current_node if !a1.nil?
  a2, current_node = get_node(node[2], node_num, current_node)
  return a2,current_node if !a2.nil?
  return nil,current_node
end

def prune(node, max_depth, terms, depth=0)
  if depth == max_depth-1
    t = terms[rand(terms.size)]
    return ((t=='R') ? rand_in_bounds(-5.0, +5.0) : t)
  end
  depth += 1
  return node if !node.kind_of?(Array)
  a1 = prune(node[1], max_depth, terms, depth)
  a2 = prune(node[2], max_depth, terms, depth)
  return [node[0], a1, a2]
end

def crossover(parent1, parent2, max_depth, terms)
  pt1, pt2 = rand(count_nodes(parent1)-2)+1, rand(count_nodes(parent2)-2)+1
  tree1, c1 = get_node(parent1, pt1)
  tree2, c2 = get_node(parent2, pt2)
  child1, c1 = replace_node(parent1, copy_program(tree2), pt1)
  child1 = prune(child1, max_depth, terms)
  child2, c2 = replace_node(parent2, copy_program(tree1), pt2)
  child2 = prune(child2, max_depth, terms)
  return [child1, child2]
end

def mutation(parent, max_depth, functs, terms)
  random_tree = generate_random_program(max_depth/2, functs, terms)
  point = rand(count_nodes(parent))
  child, count = replace_node(parent, random_tree, point)
  child = prune(child, max_depth, terms)
  return child
end

def search(max_gens, pop_size, max_depth, bouts, p_repro, p_cross, p_mut, functs, terms)
  population = Array.new(pop_size) do |i|
    {:prog=>generate_random_program(max_depth, functs, terms)}
  end
  population.each{|c| c[:fitness] = fitness(c[:prog])}
  best = population.sort{|x,y| x[:fitness] <=> y[:fitness]}.first
  max_gens.times do |gen|
    children = []
    while children.size < pop_size
      operation = rand()
      p1 = tournament_selection(population, bouts)
      c1 = {}
      if operation < p_repro
        c1[:prog] = copy_program(p1[:prog])
      elsif operation < p_repro+p_cross
        p2 = tournament_selection(population, bouts)
        c2 = {}
        c1[:prog],c2[:prog] = crossover(p1[:prog], p2[:prog], max_depth, terms)
        children << c2
      elsif operation < p_repro+p_cross+p_mut
        c1[:prog] = mutation(p1[:prog], max_depth, functs, terms)
      end
      children << c1 if children.size < pop_size
    end
    children.each{|c| c[:fitness] = fitness(c[:prog])}
    population = children
    population.sort!{|x,y| x[:fitness] <=> y[:fitness]}
    best = population.first if population.first[:fitness] <= best[:fitness]
    puts " > gen #{gen}, fitness=#{best[:fitness]}"
    break if best[:fitness] == 0
  end
  return best
end

if __FILE__ == $0
  # problem configuration
  terms = ['X', 'R']
  functs = [:+, :-, :*, :/]
  # algorithm configuration
  max_gens = 100
  max_depth = 7
  pop_size = 100
  bouts = 5
  p_repro = 0.08
  p_cross = 0.90
  p_mut = 0.02
  # execute the algorithm
  best = search(max_gens, pop_size, max_depth, bouts, p_repro, p_cross, p_mut, functs, terms)
  puts "done! Solution: f=#{best[:fitness]}, #{print_program(best[:prog])}"
end

Genetic Programming in Ruby

Download: genetic_programming.rb.

References

Primary Sources

An early work by Cramer involved the study of a Genetic Algorithm using an expression tree structure for representing computer programs for primitive mathematical operations [Cramer1985]. Koza is credited with the development of the field of Genetic Programming. An early paper by Koza referred to his hierarchical genetic algorithms as an extension to the simple genetic algorithm that use symbolic expressions (S-expressions) as a representation and were applied to a range of induction-style problems [Koza1989]. The seminal reference for the field is Koza's 1992 book on Genetic Programming [Koza1992].

Learn More

The field of Genetic Programming is vast, including many books, dedicated conferences and thousands of publications. Koza is generally credited with the development and popularizing of the field, publishing a large number of books and papers himself. Koza provides a practical introduction to the field as a tutorial and provides recent overview of the broader field and usage of the technique [Koza2005].

In addition his the seminal 1992 book, Koza has released three more volumes in the series including volume II on Automatically Defined Functions (ADFs) [Koza1994], volume III that considered the Genetic Programming Problem Solver (GPPS) for automatically defining the function set and program structure for a given problem [Koza1999], and volume IV that focuses on the human competitive results the technique is able to achieve in a routine manner [Koza2003]. All books are rich with targeted and practical demonstration problem instances.

Some additional excellent books include a text by Banzhaf et al. that provides an introduction to the field [Banzhaf1998], Langdon and Poli's detailed look at the technique [Langdon2002], and Poli, Langdon, and McPhee's contemporary and practical field guide to Genetic Programming [Poli2008].

Bibliography

[Angeline1996]	P. J. Angeline, "Two Self-Adaptive Crossover Operators for Genetic Programming", in Advances in Genetic Programming 2, 1996.
[Banzhaf1998]	W. Banzhaf and P. Nordin and R. E. Keller and F. D. Francone, "Genetic Programming – An Introduction; On the Automatic Evolution\n\tof Computer Programs and its Applications", Morgan Kaufmann, 1998.
[Cramer1985]	N. L. Cramer, "A Representation for the Adaptive Generation of Simple Sequential\n\tPrograms", in Proceedings of the 1st International Conference on Genetic Algorithms, 1985.
[Koza1989]	J. R. Koza, "Hierarchical genetic algorithms operating on populations of computer\n\tprograms", in Proceedings of the Eleventh International Joint Conference on Artificial\n\tIntelligence IJCAI-89, 1989.
[Koza1992]	J. R. Koza, "Genetic programming: On the programming of computers by means of\n\tnatural selection", MIT Press, 1992.
[Koza1994]	J. R. Koza, "Genetic programming II: Automatic discovery of reusable programs", MIT Press, 1994.
[Koza1999]	J. R. Koza and F. H. Bennett III and D. Andre and M. A. Keane, "Genetic programming III: Darwinian invention and problem solving", Morgan Kaufmann, 1999.
[Koza2003]	J. R. Koza and M. A. Keane and M. J. Streeter and W. Mydlowec and\n\tJ. Yu and G. Lanza, "Genetic Programming IV: Routine Human-Competitive Machine Intelligence", Springer, 2003.
[Koza2005]	J. R. Koza and R. Poli, "5: Genetic Programming", in Search methodologies: Introductory tutorials in optimization and\n\tdecision support techniques, pages 127–164, Springer, 2005.
[Langdon2002]	W. B. Langdon and R. Poli, "Foundations of Genetic Programming", Springer-Verlag, 2002.
[Perkis1994]	T. Perkis, "Stack-Based Genetic Programming", in Proc IEEE Congress on Computational Intelligence, 1994.
[Poli2008]	R. Poli and W. B. Langdon and N. F. McPhee, "A Field Programmers Guide to Genetic Programming", Lulu Enterprises, 2008.

Please Note: This content was automatically generated from the book content and may contain minor differences.