Clever Algorithms: Nature-Inspired Programming Recipes

By Jason Brownlee PhD

Home | Read Online

This is the ad-supported version of the book. Buy it now if you like it.

Bayesian Optimization Algorithm

Bayesian Optimization Algorithm, BOA.


The Bayesian Optimization Algorithm belongs to the field of Estimation of Distribution Algorithms, also referred to as Population Model-Building Genetic Algorithms (PMBGA) an extension to the field of Evolutionary Computation. More broadly, BOA belongs to the field of Computational Intelligence. The Bayesian Optimization Algorithm is related to other Estimation of Distribution Algorithms such as the Population Incremental Learning Algorithm, and the Univariate Marginal Distribution Algorithm. It is also the basis for extensions such as the Hierarchal Bayesian Optimization Algorithm (hBOA) and the Incremental Bayesian Optimization Algorithm (iBOA).


Bayesian Optimization Algorithm is a technique without an inspiration. It is related to the Genetic Algorithm and other Evolutionary Algorithms that are inspired by the biological theory of evolution by means of natural selection.


The information processing objective of the technique is to construct a probabilistic model that describes the relationships between the components of fit solutions in the problem space. This is achieved by repeating the process of creating and sampling from a Bayesian network that contains the conditional dependancies, independencies, and conditional probabilities between the components of a solution. The network is constructed from the relative frequencies of the components within a population of high fitness candidate solutions. Once the network is constructed, the candidate solutions are discarded and a new population of candidate solutions are generated from the model. The process is repeated until the model converges on a fit prototype solution.


Algorithm (below) provides a pseudocode listing of the Bayesian Optimization Algorithm for minimizing a cost function. The Bayesian network is constructed each iteration using a greedy algorithm. The network is assessed based on its fit of the information in the population of candidate solutions using either a Bayesian Dirichlet Metric (BD) [Pelikan1999a], or a Bayesian Information Criterion (BIC). Refer to Chapter 3 of Pelikan's book for a more detailed presentation of the pseudocode for BOA [Pelikan2005].

Input: $Bits_{num}$, $Population_{size}$, $Selection_{size}$
Output: $S_{best}$
Population $\leftarrow$ InitializePopulation($Bits_{num}$, $Population_{size}$)
$S_{best}$ $\leftarrow$ GetBestSolution(Population)
While ($\neg$StopCondition())
    Selected $\leftarrow$ SelectFitSolutions(Population, $Selection_{size}$)
    Model $\leftarrow$ ConstructBayesianNetwork(Selected)
    Offspring $\leftarrow$ $\emptyset$
    For ($i$ To $Population_{size}$)
        Offspring $\leftarrow$ ProbabilisticallyConstructSolution(Model)
    $S_{best}$ $\leftarrow$ GetBestSolution(Offspring)
    Population $\leftarrow$ Combine(Population, Offspring)
Return ($S_{best}$)
Pseudocode for BOA.


  • The Bayesian Optimization Algorithm was designed and investigated on binary string-base problems, most commonly representing binary function optimization problems.
  • Bayesian networks are typically constructed (grown) from scratch each iteration using an iterative process of adding, removing, and reversing links. Additionally, past networks may be used as the basis for the subsequent generation.
  • A greedy hill-climbing algorithm is used each algorithm iteration to optimize a Bayesian network to represent a population of candidate solutions.
  • The fitness of constructed Bayesian networks may be assessed using the Bayesian Dirichlet Metric (BD) or a Minimum Description length method called the Bayesian Information Criterion (BIC).

Code Listing

Listing (below) provides an example of the Bayesian Optimization Algorithm implemented in the Ruby Programming Language. The demonstration problem is a maximizing binary optimization problem called OneMax that seeks a binary string of unity (all '1' bits). The objective function provides only an indication of the number of correct bits in a candidate string, not the positions of the correct bits.

The Bayesian Optimization Algorithm can be tricky to implement given the use of of a Bayesian Network at the core of the technique. The implementation of BOA provided is based on the the C++ implementation provided by Pelikan, version 1.0 [Pelikan1999b]. Specifically, the implementation uses the K2 metric to construct a Bayesian network from a population of candidate solutions [Cooper1992]. Essentially, this metric is a greedy algorithm that starts with an empty graph and adds the arc with the most gain each iteration until a maximum number of edges have been added or no further edges can be added. The result is a directed acyclic graph. The process that constructs the graph imposes limits, such as the maximum number of edges and the maximum number of in-bound connections per node.

New solutions are sampled from the graph by first topologically ordering the graph (so that bits can be generated based on their dependencies), then probabilistically sampling the bits based on the conditional probabilities encoded in the graph. The algorithm used for sampling the conditional probabilities from the network is Probabilistic Logic Sampling [Henrion1988]. The stopping condition is either the best solution for the problem is found or the system converges to a single bit pattern.

Given that the implementation was written for clarity, it is slow to execute and provides an great opportunity for improvements and efficiencies.

def onemax(vector)
  return vector.inject(0){|sum, value| sum + value}

def random_bitstring(size)
  return{ ((rand()<0.5) ? 1 : 0) }

def path_exists?(i, j, graph)
  visited, stack = [], [i]
  while !stack.empty?
    return true if stack.include?(j)
    k = stack.shift
    next if visited.include?(k)
    visited << k
    graph[k][:out].each {|m| stack.unshift(m) if !visited.include?(m)}
  return false

def can_add_edge?(i, j, graph)
  return !graph[i][:out].include?(j) && !path_exists?(j, i, graph)

def get_viable_parents(node, graph)
  viable = []
  graph.size.times do |i|
    if node!=i and can_add_edge?(node, i, graph)
      viable << i
  return viable

def compute_count_for_edges(pop, indexes)
  counts =**(indexes.size)){0}
  pop.each do |p|
    index = 0
    indexes.reverse.each_with_index do |v,i|
      index += ((p[:bitstring][v] == 1) ? 1 : 0) * (2**i)
    counts[index] += 1
 return counts

def fact(v)
  return v <= 1 ? 1 : v*fact(v-1)

def k2equation(node, candidates, pop)
  counts = compute_count_for_edges(pop, [node]+candidates)
  total = nil
  (counts.size/2).times do |i|
    a1, a2 = counts[i*2], counts[(i*2)+1]
    rs = (1.0/fact((a1+a2)+1).to_f) * fact(a1).to_f * fact(a2).to_f
    total = (total.nil? ? rs : total*rs)
  return total

def compute_gains(node, graph, pop, max=2)
  viable = get_viable_parents(node[:num], graph)
  gains = {-1.0}
  gains.each_index do |i|
    if graph[i][:in].size < max and viable.include?(i)
      gains[i] = k2equation(node[:num], node[:in]+[i], pop)
  return gains

def construct_network(pop, prob_size, max_edges=3*pop.size)
  graph = {|i| {:out=>[], :in=>[], :num=>i} }
  gains =
  max_edges.times do
    max, from, to = -1, nil, nil
    graph.each_with_index do |node, i|
      gains[i] = compute_gains(node, graph, pop)
      gains[i].each_with_index {|v,j| from,to,max = i,j,v if v>max}
    break if max <= 0.0
    graph[from][:out] << to
    graph[to][:in] << from
  return graph

def topological_ordering(graph)
  graph.each {|n| n[:count] = n[:in].size}
  ordered,stack = [], {|n| n[:count]==0}
  while ordered.size < graph.size
    current = stack.shift
    current[:out].each do |edge|
      node = graph.find {|n| n[:num]==edge}
      node[:count] -= 1
      stack << node if node[:count] <= 0
    ordered << current
  return ordered

def marginal_probability(i, pop)
  return pop.inject(0.0){|s,x| s + x[:bitstring][i]} / pop.size.to_f

def calculate_probability(node, bitstring, graph, pop)
  return marginal_probability(node[:num], pop) if node[:in].empty?
  counts = compute_count_for_edges(pop, [node[:num]]+node[:in])
  index = 0
  node[:in].reverse.each_with_index do |v,i|
    index += ((bitstring[v] == 1) ? 1 : 0) * (2**i)
  i1 = index + (1*2**(node[:in].size))
  i2 = index + (0*2**(node[:in].size))
  a1, a2 = counts[i1].to_f, counts[i2].to_f
  return a1/(a1+a2)

def probabilistic_logic_sample(graph, pop)
  bitstring =
  graph.each do |node|
    prob = calculate_probability(node, bitstring, graph, pop)
    bitstring[node[:num]] = ((rand() < prob) ? 1 : 0)
  return {:bitstring=>bitstring}

def sample_from_network(pop, graph, num_samples)
  ordered = topological_ordering(graph)
  samples = do
    probabilistic_logic_sample(ordered, pop)
  return samples

def search(num_bits, max_iter, pop_size, select_size, num_children)
  pop = { {:bitstring=>random_bitstring(num_bits)} }
  pop.each{|c| c[:cost] = onemax(c[:bitstring])}
  best = pop.sort!{|x,y| y[:cost] <=> x[:cost]}.first
  max_iter.times do |it|
    selected = pop.first(select_size)
    network = construct_network(selected, num_bits)
    arcs = network.inject(0){|s,x| s+x[:out].size}
    children = sample_from_network(selected, network, num_children)
    children.each{|c| c[:cost] = onemax(c[:bitstring])}
    children.each {|c| puts " >>sample, f=#{c[:cost]} #{c[:bitstring]}"}
    pop = pop[0...(pop_size-select_size)] + children
    pop.sort! {|x,y| y[:cost] <=> x[:cost]}
    best = pop.first if pop.first[:cost] >= best[:cost]
    puts " >it=#{it}, arcs=#{arcs}, f=#{best[:cost]}, [#{best[:bitstring]}]"
    converged = {|x| x[:bitstring]!=pop.first[:bitstring]}.empty?
    break if converged or best[:cost]==num_bits
  return best

if __FILE__ == $0
  # problem configuration
  num_bits = 20
  # algorithm configuration
  max_iter = 100
  pop_size = 50
  select_size = 15
  num_children = 25
  # execute the algorithm
  best = search(num_bits, max_iter, pop_size, select_size, num_children)
  puts "done! Solution: f=#{best[:cost]}/#{num_bits}, s=#{best[:bitstring]}"
Bayesian Optimization Algorithm in Ruby
Download: boa.rb.>


Primary Sources

The Bayesian Optimization Algorithm was proposed by Pelikan, Goldberg, and Cantú-Paz in the technical report [Pelikan1998a], that was later published [Pelikan2002]. The technique was proposed as an extension to the state of Estimation of Distribution algorithms (such as the Univariate Marginal Distribution Algorithm and the Bivariate Marginal Distribution Algorithm) that used a Bayesian Network to model the relationships and conditional probabilities for the components expressed in a population of fit candidate solutions. Pelikan, Goldberg, and Cantú-Paz also described the approach applied to deceptive binary optimization problems (trap functions) in a paper that was published before the seminal journal article [Pelikan1999a].

Learn More

Pelikan and Goldberg described an extension to the approach called the Hierarchical Bayesian Optimization Algorithm (hBOA) [Pelikan2000] [Pelikan2001b]. The differences in the hBOA algorithm are that it replaces the decision tables (used to store the probabilities) with decision graphs and used a niching method called Restricted Tournament Replacement to maintain diversity in the selected set of candidate solutions used to construct the network models. Pelikan's work on BOA culminated in his PhD thesis that provides a detailed treatment of the approach, its configuration and application [Pelikan2002a]. Pelikan, Sastry, and Goldberg proposed the Incremental Bayesian Optimization Algorithm (iBOA) extension of the approach that removes the population and adds incremental updates to the Bayesian network [Pelikan2008].

Pelikan published a book that focused on the technique, walking through the development of probabilistic algorithms inspired by evolutionary computation, a detailed look at the Bayesian Optimization Algorithm (Chapter 3), the hierarchic extension to Hierarchical Bayesian Optimization Algorithm and demonstration studies of the approach on test problems [Pelikan2005].


[Cooper1992] G. F. Cooper and E. Herskovits, "A Bayesian Method for the Induction of Probabilistic Networks from Data", Machine Learning, 1992.
[Henrion1988] M. Henrion, "Propagation of Uncertainty by Probabilistic Logic Sampling in Bayes Networks", in Uncertainty in Artificial Intelligence 2, pages 149–163, Elsevier Science Publishing Company, Inc., 1988.
[Pelikan1998a] M. Pelikan and D. E. Goldberg and E. Cantú–Paz, "Linkage Problem, Distribution Estimation, and Bayesian Networks", Technical Report IlliGAL Report No. 98013, llinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, 1998.
[Pelikan1999a] M. Pelikan and D. E. Goldberg and E. Cantú–Paz, "BOA: The Bayesian optimization algorithm.", in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-99), 1999.
[Pelikan1999b] M. Pelikan, "A Simple Implementation of the Bayesian Optimization Algorithm (BOA) in C++ (version 1.0)", Technical Report IlliGAL Report No. 99011, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory, Urbana, IL, 1999.
[Pelikan2000] M. Pelikan and D. E. Goldberg, "Hierarchical Problem Solving and the Bayesian Optimization Algorithms", in Genetic and Evolutionary Computation Conference 2000 (GECCO-2000), 2000.
[Pelikan2001b] M. Pelikan and D. E. Goldberg, "Escaping hierarchical traps with competent genetic algorithms", in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), 2001.
[Pelikan2002] M. Pelikan and D. E. Goldberg and E. Cantú–Paz, "Linkage Problem, Distribution Estimation, and Bayesian Networks", Evolutionary Computation, 2002.
[Pelikan2002a] M. Pelikan, "Bayesian optimization algorithm: From single level to hierarchy", [PhD Thesis] University of Illinois at Urbana-Champaign, 2002.
[Pelikan2005] M. Pelikan, "Hierarchical Bayesian Optimization Algorithm: Toward a New Generation of Evolutionary Algorithms", Springer, 2005.
[Pelikan2008] M. Pelikan and K. Sastry and D. E. Goldberg, "iBOA: The Incremental Bayesian Optimization Algorithms", in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2008), 2008.
Clever Algorithms: Nature-Inspired Programming Recipes

Free Course

Get one algorithm per week...
  • ...delivered to your inbox
  • ...described in detail
  • read at your own pace
Sign-up Now

Own A Copy

This 438-page ebook has...
  • ...45 algorithm descriptions
  • practice usage
  • ...pseudo code
  • ...Ruby code
  • ...primary sources
Buy Now

Please Note: This content was automatically generated from the book content and may contain minor differences.

Clever Algorithms: Nature-Inspired Programming Recipes

Do you like Clever Algorithms?
Buy the book now.

© Copyright 2015. All Rights Reserved. | About | Contact | Privacy