Gene Expression ProgrammingGene Expression Programming, GEP. TaxonomyGene Expression Programming is a Global Optimization algorithm and an Automatic Programming technique, and it is an instance of an Evolutionary Algorithm from the field of Evolutionary Computation. It is a sibling of other Evolutionary Algorithms such as a the Genetic Algorithm as well as other Evolutionary Automatic Programming techniques such as Genetic Programming and Grammatical Evolution. InspirationGene Expression Programming is inspired by the replication and expression of the DNA molecule, specifically at the gene level. The expression of a gene involves the transcription of its DNA to RNA which in turn forms amino acids that make up proteins in the phenotype of an organism. The DNA building blocks are subjected to mechanisms of variation (mutations such as coping errors) as well as recombination during sexual reproduction. MetaphorGene Expression Programming uses a linear genome as the basis for genetic operators such as mutation, recombination, inversion, and transposition. The genome is comprised of chromosomes and each chromosome is comprised of genes that are translated into an expression tree to solve a given problem. The robust gene definition means that genetic operators can be applied to the subsymbolic representation without concern for the structure of the resultant gene expression, providing separation of genotype and phenotype. StrategyThe objective of the Gene Expression Programming algorithm is to improve the adaptive fit of an expressed program in the context of a problem specific cost function. This is achieved through the use of an evolutionary process that operates on a subsymbolic representation of candidate solutions using surrogates for the processes (descent with modification) and mechanisms (genetic recombination, mutation, inversion, transposition, and gene expression) of evolution. ProcedureA candidate solution is represented as a linear string of symbols called Karva notation or a Kexpression, where each symbol maps to a function or terminal node. The linear representation is mapped to an expression tree in a breadthfirst manner. A Kexpression has fixed length and is comprised of one or more subexpressions (genes), which are also defined with a fixed length. A gene is comprised of two sections, a head which may contain any function or terminal symbols, and a tail section that may only contain terminal symbols. Each gene will always translate to a syntactically correct expression tree, where the tail portion of the gene provides a genetic buffer which ensures closure of the expression. Algorithm (below) provides a pseudocode listing of the Gene Expression Programming algorithm for minimizing a cost function. Input :
Grammar , $Population_{size}$, $Head_{length}$, $Tail_{length}$, $P_{crossover}$, $P_{mutation}$
Output :
$S_{best}$
Population $\leftarrow$ InitializePopulation ($Population_{size}$, Grammar , $Head_{length}$, $Tail_{length}$)For ($S_{i}$ $\in$ Population )$Si_{program}$ $\leftarrow$ DecodeBreadthFirst ($Si_{genome}$, Grammar )$Si_{cost}$ $\leftarrow$ Execute ($Si_{program}$)End $S_{best}$ $\leftarrow$ GetBestSolution (Population )While ($\neg$StopCondition ())Parents $\leftarrow$ SelectParents (Population , $Population_{size}$)Children $\leftarrow \emptyset$For ($Parent_{1}$, $Parent_{2}$ $\in$ Parents )$Si_{genome}$ $\leftarrow$ Crossover ($Parent_{1}$, $Parent_{2}$, $P_{crossover}$)$Si_{genome}$ $\leftarrow$ Mutate ($Si_{genome}$, $P_{mutation}$)Children $\leftarrow$ $S_{i}$End For ($S_{i}$ $\in$ Children )$Si_{program}$ $\leftarrow$ DecodeBreadthFirst ($Si_{genome}$, Grammar )$Si_{cost}$ $\leftarrow$ Execute ($Si_{program}$)End Population $\leftarrow$ Replace (Population , Children )$S_{best}$ $\leftarrow$ GetBestSolution (Children )End Return ($S_{best}$)Heuristics
Code ListingListing (below) provides an example of the Gene Expression Programming algorithm implemented in the Ruby Programming Language based on the seminal version proposed by Ferreira [Ferreira2001]. The demonstration problem is an instance of symbolic regression $f(x)=x^4+x^3+x^2+x$, where $x\in[1,10]$. The grammar used in this problem is: Functions: $F=\{+,,\div,\times,\}$ and Terminals: $T=\{x\}$. The algorithm uses binary tournament selection, uniform crossover and point mutations. The Kexpression is decoded to an expression tree in a breadthfirst manner, which is then parsed depth first as a Ruby expression string for display and direct evaluation. Solutions are evaluated by generating a number of random samples from the domain and calculating the mean error of the program to the expected outcome. Programs that contain a single term or those that return an invalid (NaN) or infinite result are penalized with an enormous error value. def binary_tournament(pop) i, j = rand(pop.size), rand(pop.size) return (pop[i][:fitness] < pop[j][:fitness]) ? pop[i] : pop[j] end def point_mutation(grammar, genome, head_length, rate=1.0/genome.size.to_f) child ="" genome.size.times do i bit = genome[i].chr if rand() < rate if i < head_length selection = (rand() < 0.5) ? grammar["FUNC"]: grammar["TERM"] bit = selection[rand(selection.size)] else bit = grammar["TERM"][rand(grammar["TERM"].size)] end end child << bit end return child end def crossover(parent1, parent2, rate) return ""+parent1 if rand()>=rate child = "" parent1.size.times do i child << ((rand()<0.5) ? parent1[i] : parent2[i]) end return child end def reproduce(grammar, selected, pop_size, p_crossover, head_length) children = [] selected.each_with_index do p1, i p2 = (i.modulo(2)==0) ? selected[i+1] : selected[i1] p2 = selected[0] if i == selected.size1 child = {} child[:genome] = crossover(p1[:genome], p2[:genome], p_crossover) child[:genome] = point_mutation(grammar, child[:genome], head_length) children << child end return children end def random_genome(grammar, head_length, tail_length) s = "" head_length.times do selection = (rand() < 0.5) ? grammar["FUNC"]: grammar["TERM"] s << selection[rand(selection.size)] end tail_length.times { s << grammar["TERM"][rand(grammar["TERM"].size)]} return s end def target_function(x) return x**4.0 + x**3.0 + x**2.0 + x end def sample_from_bounds(bounds) return bounds[0] + ((bounds[1]  bounds[0]) * rand()) end def cost(program, bounds, num_trials=30) errors = 0.0 num_trials.times do x = sample_from_bounds(bounds) expression, score = program.gsub("x", x.to_s), 0.0 begin score = eval(expression) rescue score = 0.0/0.0 end return 9999999 if score.nan? or score.infinite? errors += (score  target_function(x)).abs end return errors / num_trials.to_f end def mapping(genome, grammar) off, queue = 0, [] root = {} root[:node] = genome[off].chr; off+=1 queue.push(root) while !queue.empty? do current = queue.shift if grammar["FUNC"].include?(current[:node]) current[:left] = {} current[:left][:node] = genome[off].chr; off+=1 queue.push(current[:left]) current[:right] = {} current[:right][:node] = genome[off].chr; off+=1 queue.push(current[:right]) end end return root end def tree_to_string(exp) return exp[:node] if (exp[:left].nil? or exp[:right].nil?) left = tree_to_string(exp[:left]) right = tree_to_string(exp[:right]) return "(#{left} #{exp[:node]} #{right})" end def evaluate(candidate, grammar, bounds) candidate[:expression] = mapping(candidate[:genome], grammar) candidate[:program] = tree_to_string(candidate[:expression]) candidate[:fitness] = cost(candidate[:program], bounds) end def search(grammar, bounds, h_length, t_length, max_gens, pop_size, p_cross) pop = Array.new(pop_size) do {:genome=>random_genome(grammar, h_length, t_length)} end pop.each{c evaluate(c, grammar, bounds)} best = pop.sort{x,y x[:fitness] <=> y[:fitness]}.first max_gens.times do gen selected = Array.new(pop){i binary_tournament(pop)} children = reproduce(grammar, selected, pop_size, p_cross, h_length) children.each{c evaluate(c, grammar, bounds)} children.sort!{x,y x[:fitness] <=> y[:fitness]} best = children.first if children.first[:fitness] <= best[:fitness] pop = (children+pop).first(pop_size) puts " > gen=#{gen}, f=#{best[:fitness]}, g=#{best[:genome]}" end return best end if __FILE__ == $0 # problem configuration grammar = {"FUNC"=>["+","","*","/"], "TERM"=>["x"]} bounds = [1.0, 10.0] # algorithm configuration h_length = 20 t_length = h_length * (21) + 1 max_gens = 150 pop_size = 80 p_cross = 0.85 # execute the algorithm best = search(grammar, bounds, h_length, t_length, max_gens, pop_size, p_cross) puts "done! Solution: f=#{best[:fitness]}, program=#{best[:program]}" end Download: gene_expression_programming.rb.
ReferencesPrimary SourcesThe Gene Expression Programming algorithm was proposed by Ferreira in a paper that detailed the approach, provided a careful walkthrough of the process and operators, and demonstrated the the algorithm on a number of benchmark problem instances including symbolic regression [Ferreira2001]. Learn MoreFerreira provided an early and detailed introduction and overview of the approach as book chapter, providing a stepbystep walkthrough of the procedure and sample applications [Ferreira2002]. A more contemporary and detailed introduction is provided in a later book chapter [Ferreira2005]. Ferreira published a book on the approach in 2002 covering background, the algorithm, and demonstration applications which is now in its second edition [Ferreira2006]. Bibliography

Free CourseGet one algorithm per week...
Own A CopyThis 438page ebook has...


Please Note: This content was automatically generated from the book content and may contain minor differences. 

Do you like Clever Algorithms? Buy the book now. 