Clever Algorithms: Nature-Inspired Programming Recipes

By Jason Brownlee PhD

Home | Read Online

This is the ad-supported version of the book. Buy it now if you like it.

Genetic Programming

Genetic Programming, GP.


The Genetic Programming algorithm is an example of an Evolutionary Algorithm and belongs to the field of Evolutionary Computation and more broadly Computational Intelligence and Biologically Inspired Computation. The Genetic Programming algorithm is a sibling to other Evolutionary Algorithms such as the Genetic Algorithm, Evolution Strategies, Evolutionary Programming, and Learning Classifier Systems. Technically, the Genetic Programming algorithm is an extension of the Genetic Algorithm. The Genetic Algorithm is a parent to a host of variations and extensions.


The Genetic Programming algorithm is inspired by population genetics (including heredity and gene frequencies), and evolution at the population level, as well as the Mendelian understanding of the structure (such as chromosomes, genes, alleles) and mechanisms (such as recombination and mutation). This is the so-called new or modern synthesis of evolutionary biology.


Individuals of a population contribute their genetic material (called the genotype) proportional to their suitability of their expressed genome (called their phenotype) to their environment. The next generation is created through a process of mating that involves genetic operators such as recombination of two individuals genomes in the population and the introduction of random copying errors (called mutation). This iterative process may result in an improved adaptive-fit between the phenotypes of individuals in a population and the environment.

Programs may be evolved and used in a secondary adaptive process, where an assessment of candidates at the end of that secondary adaptive process is used for differential reproductive success in the first evolutionary process. This system may be understood as the inter-dependencies experienced in evolutionary development where evolution operates upon an embryo that in turn develops into an individual in an environment that eventually may reproduce.


The objective of the Genetic Programming algorithm is to use induction to devise a computer program. This is achieved by using evolutionary operators on candidate programs with a tree structure to improve the adaptive fit between the population of candidate programs and an objective function. An assessment of a candidate solution involves its execution.


Algorithm (below) provides a pseudocode listing of the Genetic Programming algorithm for minimizing a cost function, based on Koza and Poli's tutorial [Koza2005].

The Genetic Program uses LISP-like symbolic expressions called S-expressions that represent the graph of a program with function nodes and terminal nodes. While the algorithm is running, the programs are treated like data, and when they are evaluated they are executed. The traversal of a program graph is always depth first, and functions must always return a value.

Input: $Population_{size}$, $nodes_{func}$, $nodes_{term}$, $P_{crossover}$, $P_{mutation}$, $P_{reproduction}$, $P_{alteration}$
Output: $S_{best}$
Population $\leftarrow$ InitializePopulation($Population_{size}$, $nodes_{func}$, $nodes_{term}$)
$S_{best}$ $\leftarrow$ GetBestSolution(Population)
While ($\neg$StopCondition())
    Children $\leftarrow \emptyset$
    While (Size(Children) < $Population_{size}$)
        Operator $\leftarrow$ SelectGeneticOperator($P_{crossover}$, $P_{mutation}$, $P_{reproduction}$, $P_{alteration}$)
        If (Operator $\equiv$ CrossoverOperator)
            $Parent_{1}$, $Parent_{2}$ $\leftarrow$ SelectParents(Population, $Population_{size}$)
            $Child_{1}$, $Child_{2}$ $\leftarrow$ Crossover($Parent_{1}$, $Parent_{2}$)
            Children $\leftarrow$ $Child_{1}$
            Children $\leftarrow$ $Child_{2}$
        ElseIf (Operator $\equiv$ MutationOperator)
            $Parent_{1}$ $\leftarrow$ SelectParents(Population, $Population_{size}$)
            $Child_{1}$ $\leftarrow$ Mutate($Parent_{1}$)
            Children $\leftarrow$ $Child_{1}$
        ElseIf (Operator $\equiv$ ReproductionOperator)
            $Parent_{1}$ $\leftarrow$ SelectParents(Population, $Population_{size}$)
            $Child_{1}$ $\leftarrow$ Reproduce($Parent_{1}$)
            Children $\leftarrow$ $Child_{1}$
        ElseIf (Operator $\equiv$ AlterationOperator)
            $Parent_{1}$ $\leftarrow$ SelectParents(Population, $Population_{size}$)
            $Child_{1}$ $\leftarrow$ AlterArchitecture($Parent_{1}$)
            Children $\leftarrow$ $Child_{1}$
    $S_{best}$ $\leftarrow$ GetBestSolution(Children, $S_{best}$)
    Population $\leftarrow$ Children
Return ($S_{best}$)
Pseudocode for Genetic Programming.


  • The Genetic Programming algorithm was designed for inductive automatic programming and is well suited to symbolic regression, controller design, and machine learning tasks under the broader name of function approximation.
  • Traditionally Lisp symbolic expressions are evolved and evaluated in a virtual machine, although the approach has been applied with compiled programming languages.
  • The evaluation (fitness assignment) of a candidate solution typically takes the structure of the program into account, rewarding parsimony.
  • The selection process should be balanced between random selection and greedy selection to bias the search towards fitter candidate solutions (exploitation), whilst promoting useful diversity into the population (exploration).
  • A program may respond to zero or more input values and may produce one or more outputs.
  • All functions used in the function node set must return a usable result. For example, the division function must return a sensible value (such as zero or one) when a division by zero occurs.
  • All genetic operations ensure (or should ensure) that syntactically valid and executable programs are produced as a result of their application.
  • The Genetic Programming algorithm is commonly configured with a high-probability of crossover ($\geq 90\%$) and a low-probability of mutation ($\leq 1\%$). Other operators such as reproduction and architecture alterations are used with moderate-level probabilities and fill in the probabilistic gap.
  • Architecture altering operations are not limited to the duplication and deletion of sub-structures of a given program.
  • The crossover genetic operator in the algorithm is commonly configured to select a function as a the cross-point with a high-probability ($\geq 90\%$) and low-probability of selecting a terminal as a cross-point ($\leq 10\%$).
  • The function set may also include control structures such as conditional statements and loop constructs.
  • The Genetic Programing algorithm can be realized as a stack-based virtual machine as opposed to a call graph [Perkis1994].
  • The Genetic Programming algorithm can make use of Automatically Defined Functions (ADFs) that are sub-graphs and are promoted to the status of functions for reuse and are co-evolved with the programs.
  • The genetic operators employed during reproduction in the algorithm may be considered transformation programs for candidate solutions and may themselves be co-evolved in the algorithm [Angeline1996].

Code Listing

Listing (below) provides an example of the Genetic Programming algorithm implemented in the Ruby Programming Language based on Koza and Poli's tutorial [Koza2005].

The demonstration problem is an instance of a symbolic regression, where a function must be devised to match a set of observations. In this case the target function is a quadratic polynomial $x^2+x+1$ where $x \in [-1,1]$. The observations are generated directly from the target function without noise for the purposes of this example. In practical problems, if one knew and had access to the target function then the genetic program would not be required.

The algorithm is configured to search for a program with the function set $\{ +, -, \times, \div \}$ and the terminal set $\{ X, R \}$, where $X$ is the input value, and $R$ is a static random variable generated for a program $X \in [-5,5]$. A division by zero returns a value of one. The fitness of a candidate solution is calculated by evaluating the program on range of random input values and calculating the Root Mean Squared Error (RMSE). The algorithm is configured with a 90% probability of crossover, 8% probability of reproduction (copying), and a 2% probability of mutation. For brevity, the algorithm does not implement the architecture altering genetic operation and does not bias crossover points towards functions over terminals.

def rand_in_bounds(min, max)
  return min + (max-min)*rand()

def print_program(node)
  return node if !node.kind_of?(Array)
  return "(#{node[0]} #{print_program(node[1])} #{print_program(node[2])})"

def eval_program(node, map)
  if !node.kind_of?(Array)
    return map[node].to_f if !map[node].nil?
    return node.to_f
  arg1, arg2 = eval_program(node[1], map), eval_program(node[2], map)
  return 0 if node[0] === :/ and arg2 == 0.0
  return arg1.__send__(node[0], arg2)

def generate_random_program(max, funcs, terms, depth=0)
  if depth==max-1 or (depth>1 and rand()<0.1)
    t = terms[rand(terms.size)]
    return ((t=='R') ? rand_in_bounds(-5.0, +5.0) : t)
  depth += 1
  arg1 = generate_random_program(max, funcs, terms, depth)
  arg2 = generate_random_program(max, funcs, terms, depth)
  return [funcs[rand(funcs.size)], arg1, arg2]

def count_nodes(node)
  return 1 if !node.kind_of?(Array)
  a1 = count_nodes(node[1])
  a2 = count_nodes(node[2])
  return a1+a2+1

def target_function(input)
  return input**2 + input + 1

def fitness(program, num_trials=20)
  sum_error = 0.0
  num_trials.times do |i|
    input = rand_in_bounds(-1.0, 1.0)
    error = eval_program(program, {'X'=>input}) - target_function(input)
    sum_error += error.abs
  return sum_error / num_trials.to_f

def tournament_selection(pop, bouts)
  selected ={pop[rand(pop.size)]}
  selected.sort!{|x,y| x[:fitness]<=>y[:fitness]}
  return selected.first

def replace_node(node, replacement, node_num, cur_node=0)
  return [replacement,(cur_node+1)] if cur_node == node_num
  cur_node += 1
  return [node,cur_node] if !node.kind_of?(Array)
  a1, cur_node = replace_node(node[1], replacement, node_num, cur_node)
  a2, cur_node = replace_node(node[2], replacement, node_num, cur_node)
  return [[node[0], a1, a2], cur_node]

def copy_program(node)
  return node if !node.kind_of?(Array)
  return [node[0], copy_program(node[1]), copy_program(node[2])]

def get_node(node, node_num, current_node=0)
  return node,(current_node+1) if current_node == node_num
  current_node += 1
  return nil,current_node if !node.kind_of?(Array)
  a1, current_node = get_node(node[1], node_num, current_node)
  return a1,current_node if !a1.nil?
  a2, current_node = get_node(node[2], node_num, current_node)
  return a2,current_node if !a2.nil?
  return nil,current_node

def prune(node, max_depth, terms, depth=0)
  if depth == max_depth-1
    t = terms[rand(terms.size)]
    return ((t=='R') ? rand_in_bounds(-5.0, +5.0) : t)
  depth += 1
  return node if !node.kind_of?(Array)
  a1 = prune(node[1], max_depth, terms, depth)
  a2 = prune(node[2], max_depth, terms, depth)
  return [node[0], a1, a2]

def crossover(parent1, parent2, max_depth, terms)
  pt1, pt2 = rand(count_nodes(parent1)-2)+1, rand(count_nodes(parent2)-2)+1
  tree1, c1 = get_node(parent1, pt1)
  tree2, c2 = get_node(parent2, pt2)
  child1, c1 = replace_node(parent1, copy_program(tree2), pt1)
  child1 = prune(child1, max_depth, terms)
  child2, c2 = replace_node(parent2, copy_program(tree1), pt2)
  child2 = prune(child2, max_depth, terms)
  return [child1, child2]

def mutation(parent, max_depth, functs, terms)
  random_tree = generate_random_program(max_depth/2, functs, terms)
  point = rand(count_nodes(parent))
  child, count = replace_node(parent, random_tree, point)
  child = prune(child, max_depth, terms)
  return child

def search(max_gens, pop_size, max_depth, bouts, p_repro, p_cross, p_mut, functs, terms)
  population = do |i|
    {:prog=>generate_random_program(max_depth, functs, terms)}
  population.each{|c| c[:fitness] = fitness(c[:prog])}
  best = population.sort{|x,y| x[:fitness] <=> y[:fitness]}.first
  max_gens.times do |gen|
    children = []
    while children.size < pop_size
      operation = rand()
      p1 = tournament_selection(population, bouts)
      c1 = {}
      if operation < p_repro
        c1[:prog] = copy_program(p1[:prog])
      elsif operation < p_repro+p_cross
        p2 = tournament_selection(population, bouts)
        c2 = {}
        c1[:prog],c2[:prog] = crossover(p1[:prog], p2[:prog], max_depth, terms)
        children << c2
      elsif operation < p_repro+p_cross+p_mut
        c1[:prog] = mutation(p1[:prog], max_depth, functs, terms)
      children << c1 if children.size < pop_size
    children.each{|c| c[:fitness] = fitness(c[:prog])}
    population = children
    population.sort!{|x,y| x[:fitness] <=> y[:fitness]}
    best = population.first if population.first[:fitness] <= best[:fitness]
    puts " > gen #{gen}, fitness=#{best[:fitness]}"
    break if best[:fitness] == 0
  return best

if __FILE__ == $0
  # problem configuration
  terms = ['X', 'R']
  functs = [:+, :-, :*, :/]
  # algorithm configuration
  max_gens = 100
  max_depth = 7
  pop_size = 100
  bouts = 5
  p_repro = 0.08
  p_cross = 0.90
  p_mut = 0.02
  # execute the algorithm
  best = search(max_gens, pop_size, max_depth, bouts, p_repro, p_cross, p_mut, functs, terms)
  puts "done! Solution: f=#{best[:fitness]}, #{print_program(best[:prog])}"
Genetic Programming in Ruby


Primary Sources

An early work by Cramer involved the study of a Genetic Algorithm using an expression tree structure for representing computer programs for primitive mathematical operations [Cramer1985]. Koza is credited with the development of the field of Genetic Programming. An early paper by Koza referred to his hierarchical genetic algorithms as an extension to the simple genetic algorithm that use symbolic expressions (S-expressions) as a representation and were applied to a range of induction-style problems [Koza1989]. The seminal reference for the field is Koza's 1992 book on Genetic Programming [Koza1992].

Learn More

The field of Genetic Programming is vast, including many books, dedicated conferences and thousands of publications. Koza is generally credited with the development and popularizing of the field, publishing a large number of books and papers himself. Koza provides a practical introduction to the field as a tutorial and provides recent overview of the broader field and usage of the technique [Koza2005].

In addition his the seminal 1992 book, Koza has released three more volumes in the series including volume II on Automatically Defined Functions (ADFs) [Koza1994], volume III that considered the Genetic Programming Problem Solver (GPPS) for automatically defining the function set and program structure for a given problem [Koza1999], and volume IV that focuses on the human competitive results the technique is able to achieve in a routine manner [Koza2003]. All books are rich with targeted and practical demonstration problem instances.

Some additional excellent books include a text by Banzhaf et al. that provides an introduction to the field [Banzhaf1998], Langdon and Poli's detailed look at the technique [Langdon2002], and Poli, Langdon, and McPhee's contemporary and practical field guide to Genetic Programming [Poli2008].


[Angeline1996] P. J. Angeline, "Two Self-Adaptive Crossover Operators for Genetic Programming", in Advances in Genetic Programming 2, 1996.
[Banzhaf1998] W. Banzhaf and P. Nordin and R. E. Keller and F. D. Francone, "Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications", Morgan Kaufmann, 1998.
[Cramer1985] N. L. Cramer, "A Representation for the Adaptive Generation of Simple Sequential Programs", in Proceedings of the 1st International Conference on Genetic Algorithms, 1985.
[Koza1989] J. R. Koza, "Hierarchical genetic algorithms operating on populations of computer programs", in Proceedings of the Eleventh International Joint Conference on Artificial Intelligence IJCAI-89, 1989.
[Koza1992] J. R. Koza, "Genetic programming: On the programming of computers by means of natural selection", MIT Press, 1992.
[Koza1994] J. R. Koza, "Genetic programming II: Automatic discovery of reusable programs", MIT Press, 1994.
[Koza1999] J. R. Koza and F. H. Bennett III and D. Andre and M. A. Keane, "Genetic programming III: Darwinian invention and problem solving", Morgan Kaufmann, 1999.
[Koza2003] J. R. Koza and M. A. Keane and M. J. Streeter and W. Mydlowec and J. Yu and G. Lanza, "Genetic Programming IV: Routine Human-Competitive Machine Intelligence", Springer, 2003.
[Koza2005] J. R. Koza and R. Poli, "5: Genetic Programming", in Search methodologies: Introductory tutorials in optimization and decision support techniques, pages 127–164, Springer, 2005.
[Langdon2002] W. B. Langdon and R. Poli, "Foundations of Genetic Programming", Springer-Verlag, 2002.
[Perkis1994] T. Perkis, "Stack-Based Genetic Programming", in Proc IEEE Congress on Computational Intelligence, 1994.
[Poli2008] R. Poli and W. B. Langdon and N. F. McPhee, "A Field Programmers Guide to Genetic Programming", Lulu Enterprises, 2008.
Clever Algorithms: Nature-Inspired Programming Recipes

Free Course

Get one algorithm per week...
  • ...delivered to your inbox
  • ...described in detail
  • read at your own pace
Sign-up Now

Own A Copy

This 438-page ebook has...
  • ...45 algorithm descriptions
  • practice usage
  • ...pseudo code
  • ...Ruby code
  • ...primary sources
Buy Now

Please Note: This content was automatically generated from the book content and may contain minor differences.

Clever Algorithms: Nature-Inspired Programming Recipes

Do you like Clever Algorithms?
Buy the book now.

© Copyright 2015. All Rights Reserved. | About | Contact | Privacy