Bayesian Optimization AlgorithmBayesian Optimization Algorithm, BOA. TaxonomyThe Bayesian Optimization Algorithm belongs to the field of Estimation of Distribution Algorithms, also referred to as Population ModelBuilding Genetic Algorithms (PMBGA) an extension to the field of Evolutionary Computation. More broadly, BOA belongs to the field of Computational Intelligence. The Bayesian Optimization Algorithm is related to other Estimation of Distribution Algorithms such as the Population Incremental Learning Algorithm, and the Univariate Marginal Distribution Algorithm. It is also the basis for extensions such as the Hierarchal Bayesian Optimization Algorithm (hBOA) and the Incremental Bayesian Optimization Algorithm (iBOA). InspirationBayesian Optimization Algorithm is a technique without an inspiration. It is related to the Genetic Algorithm and other Evolutionary Algorithms that are inspired by the biological theory of evolution by means of natural selection. StrategyThe information processing objective of the technique is to construct a probabilistic model that describes the relationships between the components of fit solutions in the problem space. This is achieved by repeating the process of creating and sampling from a Bayesian network that contains the conditional dependancies, independencies, and conditional probabilities between the components of a solution. The network is constructed from the relative frequencies of the components within a population of high fitness candidate solutions. Once the network is constructed, the candidate solutions are discarded and a new population of candidate solutions are generated from the model. The process is repeated until the model converges on a fit prototype solution. ProcedureAlgorithm (below) provides a pseudocode listing of the Bayesian Optimization Algorithm for minimizing a cost function. The Bayesian network is constructed each iteration using a greedy algorithm. The network is assessed based on its fit of the information in the population of candidate solutions using either a Bayesian Dirichlet Metric (BD) [Pelikan1999a], or a Bayesian Information Criterion (BIC). Refer to Chapter 3 of Pelikan's book for a more detailed presentation of the pseudocode for BOA [Pelikan2005]. Input :
$Bits_{num}$, $Population_{size}$, $Selection_{size}$
Output :
$S_{best}$
Population $\leftarrow$ InitializePopulation ($Bits_{num}$, $Population_{size}$)EvaluatePopulation (Population )$S_{best}$ $\leftarrow$ GetBestSolution (Population )While ($\neg$StopCondition ())Selected $\leftarrow$ SelectFitSolutions (Population , $Selection_{size}$)Model $\leftarrow$ ConstructBayesianNetwork (Selected )Offspring $\leftarrow$ $\emptyset$For ($i$ To $Population_{size}$)Offspring $\leftarrow$ ProbabilisticallyConstructSolution (Model )End EvaluatePopulation (Offspring )$S_{best}$ $\leftarrow$ GetBestSolution (Offspring )Population $\leftarrow$ Combine (Population , Offspring )End Return ($S_{best}$)Heuristics
Code ListingListing (below) provides an example of the Bayesian Optimization Algorithm implemented in the Ruby Programming Language. The demonstration problem is a maximizing binary optimization problem called OneMax that seeks a binary string of unity (all '1' bits). The objective function provides only an indication of the number of correct bits in a candidate string, not the positions of the correct bits. The Bayesian Optimization Algorithm can be tricky to implement given the use of of a Bayesian Network at the core of the technique. The implementation of BOA provided is based on the the C++ implementation provided by Pelikan, version 1.0 [Pelikan1999b]. Specifically, the implementation uses the K2 metric to construct a Bayesian network from a population of candidate solutions [Cooper1992]. Essentially, this metric is a greedy algorithm that starts with an empty graph and adds the arc with the most gain each iteration until a maximum number of edges have been added or no further edges can be added. The result is a directed acyclic graph. The process that constructs the graph imposes limits, such as the maximum number of edges and the maximum number of inbound connections per node. New solutions are sampled from the graph by first topologically ordering the graph (so that bits can be generated based on their dependencies), then probabilistically sampling the bits based on the conditional probabilities encoded in the graph. The algorithm used for sampling the conditional probabilities from the network is Probabilistic Logic Sampling [Henrion1988]. The stopping condition is either the best solution for the problem is found or the system converges to a single bit pattern. Given that the implementation was written for clarity, it is slow to execute and provides an great opportunity for improvements and efficiencies. def onemax(vector) return vector.inject(0){sum, value sum + value} end def random_bitstring(size) return Array.new(size){ ((rand()<0.5) ? 1 : 0) } end def path_exists?(i, j, graph) visited, stack = [], [i] while !stack.empty? return true if stack.include?(j) k = stack.shift next if visited.include?(k) visited << k graph[k][:out].each {m stack.unshift(m) if !visited.include?(m)} end return false end def can_add_edge?(i, j, graph) return !graph[i][:out].include?(j) && !path_exists?(j, i, graph) end def get_viable_parents(node, graph) viable = [] graph.size.times do i if node!=i and can_add_edge?(node, i, graph) viable << i end end return viable end def compute_count_for_edges(pop, indexes) counts = Array.new(2**(indexes.size)){0} pop.each do p index = 0 indexes.reverse.each_with_index do v,i index += ((p[:bitstring][v] == 1) ? 1 : 0) * (2**i) end counts[index] += 1 end return counts end def fact(v) return v <= 1 ? 1 : v*fact(v1) end def k2equation(node, candidates, pop) counts = compute_count_for_edges(pop, [node]+candidates) total = nil (counts.size/2).times do i a1, a2 = counts[i*2], counts[(i*2)+1] rs = (1.0/fact((a1+a2)+1).to_f) * fact(a1).to_f * fact(a2).to_f total = (total.nil? ? rs : total*rs) end return total end def compute_gains(node, graph, pop, max=2) viable = get_viable_parents(node[:num], graph) gains = Array.new(graph.size) {1.0} gains.each_index do i if graph[i][:in].size < max and viable.include?(i) gains[i] = k2equation(node[:num], node[:in]+[i], pop) end end return gains end def construct_network(pop, prob_size, max_edges=3*pop.size) graph = Array.new(prob_size) {i {:out=>[], :in=>[], :num=>i} } gains = Array.new(prob_size) max_edges.times do max, from, to = 1, nil, nil graph.each_with_index do node, i gains[i] = compute_gains(node, graph, pop) gains[i].each_with_index {v,j from,to,max = i,j,v if v>max} end break if max <= 0.0 graph[from][:out] << to graph[to][:in] << from end return graph end def topological_ordering(graph) graph.each {n n[:count] = n[:in].size} ordered,stack = [], graph.select {n n[:count]==0} while ordered.size < graph.size current = stack.shift current[:out].each do edge node = graph.find {n n[:num]==edge} node[:count] = 1 stack << node if node[:count] <= 0 end ordered << current end return ordered end def marginal_probability(i, pop) return pop.inject(0.0){s,x s + x[:bitstring][i]} / pop.size.to_f end def calculate_probability(node, bitstring, graph, pop) return marginal_probability(node[:num], pop) if node[:in].empty? counts = compute_count_for_edges(pop, [node[:num]]+node[:in]) index = 0 node[:in].reverse.each_with_index do v,i index += ((bitstring[v] == 1) ? 1 : 0) * (2**i) end i1 = index + (1*2**(node[:in].size)) i2 = index + (0*2**(node[:in].size)) a1, a2 = counts[i1].to_f, counts[i2].to_f return a1/(a1+a2) end def probabilistic_logic_sample(graph, pop) bitstring = Array.new(graph.size) graph.each do node prob = calculate_probability(node, bitstring, graph, pop) bitstring[node[:num]] = ((rand() < prob) ? 1 : 0) end return {:bitstring=>bitstring} end def sample_from_network(pop, graph, num_samples) ordered = topological_ordering(graph) samples = Array.new(num_samples) do probabilistic_logic_sample(ordered, pop) end return samples end def search(num_bits, max_iter, pop_size, select_size, num_children) pop = Array.new(pop_size) { {:bitstring=>random_bitstring(num_bits)} } pop.each{c c[:cost] = onemax(c[:bitstring])} best = pop.sort!{x,y y[:cost] <=> x[:cost]}.first max_iter.times do it selected = pop.first(select_size) network = construct_network(selected, num_bits) arcs = network.inject(0){s,x s+x[:out].size} children = sample_from_network(selected, network, num_children) children.each{c c[:cost] = onemax(c[:bitstring])} children.each {c puts " >>sample, f=#{c[:cost]} #{c[:bitstring]}"} pop = pop[0...(pop_sizeselect_size)] + children pop.sort! {x,y y[:cost] <=> x[:cost]} best = pop.first if pop.first[:cost] >= best[:cost] puts " >it=#{it}, arcs=#{arcs}, f=#{best[:cost]}, [#{best[:bitstring]}]" converged = pop.select {x x[:bitstring]!=pop.first[:bitstring]}.empty? break if converged or best[:cost]==num_bits end return best end if __FILE__ == $0 # problem configuration num_bits = 20 # algorithm configuration max_iter = 100 pop_size = 50 select_size = 15 num_children = 25 # execute the algorithm best = search(num_bits, max_iter, pop_size, select_size, num_children) puts "done! Solution: f=#{best[:cost]}/#{num_bits}, s=#{best[:bitstring]}" end Download: boa.rb.>
ReferencesPrimary SourcesThe Bayesian Optimization Algorithm was proposed by Pelikan, Goldberg, and CantúPaz in the technical report [Pelikan1998a], that was later published [Pelikan2002]. The technique was proposed as an extension to the state of Estimation of Distribution algorithms (such as the Univariate Marginal Distribution Algorithm and the Bivariate Marginal Distribution Algorithm) that used a Bayesian Network to model the relationships and conditional probabilities for the components expressed in a population of fit candidate solutions. Pelikan, Goldberg, and CantúPaz also described the approach applied to deceptive binary optimization problems (trap functions) in a paper that was published before the seminal journal article [Pelikan1999a]. Learn MorePelikan and Goldberg described an extension to the approach called the Hierarchical Bayesian Optimization Algorithm (hBOA) [Pelikan2000] [Pelikan2001b]. The differences in the hBOA algorithm are that it replaces the decision tables (used to store the probabilities) with decision graphs and used a niching method called Restricted Tournament Replacement to maintain diversity in the selected set of candidate solutions used to construct the network models. Pelikan's work on BOA culminated in his PhD thesis that provides a detailed treatment of the approach, its configuration and application [Pelikan2002a]. Pelikan, Sastry, and Goldberg proposed the Incremental Bayesian Optimization Algorithm (iBOA) extension of the approach that removes the population and adds incremental updates to the Bayesian network [Pelikan2008]. Pelikan published a book that focused on the technique, walking through the development of probabilistic algorithms inspired by evolutionary computation, a detailed look at the Bayesian Optimization Algorithm (Chapter 3), the hierarchic extension to Hierarchical Bayesian Optimization Algorithm and demonstration studies of the approach on test problems [Pelikan2005]. Bibliography

Free CourseGet one algorithm per week...
Own A CopyThis 438page ebook has...


Please Note: This content was automatically generated from the book content and may contain minor differences. 

Do you like Clever Algorithms? Buy the book now. 