Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Introduction

Welcome to Clever Algorithms! This is a handbook of recipes for computational problem solving techniques from the fields of Computational Intelligence, Biologically Inspired Computation, and Metaheuristics. Clever Algorithms are interesting, practical, and fun to learn about and implement. Research scientists may be interested in browsing algorithm inspirations in search of an interesting system or process analogs to investigate. Developers and software engineers may compare various problem solving algorithms and technique-specific guidelines. Practitioners, students, and interested amateurs may implement state-of-the-art algorithms to address business or scientific needs, or simply play with the fascinating systems they represent.

This introductory chapter provides relevant background information on Artificial Intelligence and Algorithms. The core of the book provides a large corpus of algorithms presented in a complete and consistent manner. The final chapter covers some advanced topics to consider once a number of algorithms have been mastered. This book has been designed as a reference text, where specific techniques are looked up, or where the algorithms across whole fields of study can be browsed, rather than being read cover-to-cover. This book is an algorithm handbook and a technique guidebook, and I hope you find something useful.

What is AI

Artificial Intelligence

The field of classical Artificial Intelligence (AI) coalesced in the 1950s drawing on an understanding of the brain from neuroscience, the new mathematics of information theory, control theory referred to as cybernetics, and the dawn of the digital computer. AI is a cross-disciplinary field of research that is generally concerned with developing and investigating systems that operate or act intelligently. It is considered a discipline in the field of computer science given the strong focus on computation.

Russell and Norvig provide a perspective that defines Artificial Intelligence in four categories: 1) systems that think like humans, 2) systems that act like humans, 3) systems that think rationally, 4) systems that act rationally [Russell2009]. In their definition, acting like a human suggests that a system can do some specific things humans can do, this includes fields such as the Turing test, natural language processing, automated reasoning, knowledge representation, machine learning, computer vision, and robotics. Thinking like a human suggests systems that model the cognitive information processing properties of humans, for example a general problem solver and systems that build internal models of their world. Thinking rationally suggests laws of rationalism and structured thought, such as syllogisms and formal logic. Finally, acting rationally suggests systems that do rational things such as expected utility maximization and rational agents.

Luger and Stubblefield suggest that AI is a sub-field of computer science concerned with the automation of intelligence, and like other sub-fields of computer science has both theoretical concerns (how and why do the systems work?) and application concerns (where and when can the systems be used?) [Luger1993]. They suggest a strong empirical focus to research, because although there may be a strong desire for mathematical analysis, the systems themselves defy analysis given their complexity. The machines and software investigated in AI are not black boxes, rather analysis proceeds by observing the systems interactions with their environments, followed by an internal assessment of the system to relate its structure back to its behavior.

Artificial Intelligence is therefore concerned with investigating mechanisms that underlie intelligence and intelligence behavior. The traditional approach toward designing and investigating AI (the so-called 'good old fashioned' AI) has been to employ a symbolic basis for these mechanisms. A newer approach historically referred to as scruffy artificial intelligence or soft computing does not necessarily use a symbolic basis, instead patterning these mechanisms after biological or natural processes. This represents a modern paradigm shift in interest from symbolic knowledge representations, to inference strategies for adaptation and learning, and has been referred to as neat versus scruffy approaches to AI. The neat philosophy is concerned with formal symbolic models of intelligence that can explain why they work, whereas the scruffy philosophy is concerned with intelligent strategies that explain how they work [Sloman1990].

Neat AI

The traditional stream of AI concerns a top down perspective of problem solving, generally involving symbolic representations and logic processes that most importantly can explain why the systems work. The successes of this prescriptive stream include a multitude of specialist approaches such as rule-based expert systems, automatic theorem provers, and operations research techniques that underly modern planning and scheduling software. Although traditional approaches have resulted in significant success they have their limits, most notably scalability. Increases in problem size result in an unmanageable increase in the complexity of such problems meaning that although traditional techniques can guarantee an optimal, precise, or true solution, the computational execution time or computing memory required can be intractable.

Scruffy AI

There have been a number of thrusts in the field of AI toward less crisp techniques that are able to locate approximate, imprecise, or partially-true solutions to problems with a reasonable cost of resources. Such approaches are typically descriptive rather than prescriptive, describing a process for achieving a solution (how), but not explaining why they work (like the neater approaches).

Scruffy AI approaches are defined as relatively simple procedures that result in complex emergent and self-organizing behavior that can defy traditional reductionist analyses, the effects of which can be exploited for quickly locating approximate solutions to intractable problems. A common characteristic of such techniques is the incorporation of randomness in their processes resulting in robust probabilistic and stochastic decision making contrasted to the sometimes more fragile determinism of the crisp approaches. Another important common attribute is the adoption of an inductive rather than deductive approach to problem solving, generalizing solutions or decisions from sets of specific observations made by the system.

Natural Computation

An important perspective on scruffy Artificial Intelligence is the motivation and inspiration for the core information processing strategy of a given technique. Computers can only do what they are instructed, therefore a consideration is to distill information processing from other fields of study, such as the physical world and biology. The study of biologically motivated computation is called Biologically Inspired Computing [Castro2005a], and is one of three related fields of Natural Computing [Forbes2000] [Forbes2005] [Paton1994]. Natural Computing is an interdisciplinary field concerned with the relationship of computation and biology, which in addition to Biologically Inspired Computing is also comprised of Computationally Motivated Biology and Computing with Biology [Paun2005] [Marrow2000].

Biologically Inspired Computation

Biologically Inspired Computation is computation inspired by biological metaphor, also referred to as Biomimicry, and Biomemetics in other engineering disciplines [Castro2005] [Benyus1998]. The intent of this field is to devise mathematical and engineering tools to generate solutions to computation problems. The field involves using procedures for finding solutions abstracted from the natural world for addressing computationally phrased problems.

Computationally Motivated Biology

Computationally Motivated Biology involves investigating biology using computers. The intent of this area is to use information sciences and simulation to model biological systems in digital computers with the aim to replicate and better understand behaviors in biological systems. The field facilitates the ability to better understand life-as-it-is and investigate life-as-it-could-be. Typically, work in this sub-field is not concerned with the construction of mathematical and engineering tools, rather it is focused on simulating natural phenomena. Common examples include Artificial Life, Fractal Geometry (L-systems, Iterative Function Systems, Particle Systems, Brownian motion), and Cellular Automata. A related field is that of Computational Biology generally concerned with modeling biological systems and the application of statistical methods such as in the sub-field of Bioinformatics.

Computation with Biology

Computation with Biology is the investigation of substrates other than silicon in which to implement computation [Aaronson2005]. Common examples include molecular or DNA Computing and Quantum Computing.

Computational Intelligence

Computational Intelligence is a modern name for the sub-field of AI concerned with sub-symbolic (also called messy, scruffy, and soft) techniques. Computational Intelligence describes techniques that focus on strategy and outcome. The field broadly covers sub-disciplines that focus on adaptive and intelligence systems, not limited to: Evolutionary Computation, Swarm Intelligence (Particle Swarm and Ant Colony Optimization), Fuzzy Systems, Artificial Immune Systems, and Artificial Neural Networks [Engelbrecht2007] [Pedrycz1997]. This section provides a brief summary of the each of the five primary areas of study.

Evolutionary Computation

A paradigm that is concerned with the investigation of systems inspired by the neo-Darwinian theory of evolution by means of natural selection (natural selection theory and an understanding of genetics). Popular evolutionary algorithms include the Genetic Algorithm, Evolution Strategy, Genetic and Evolutionary Programming, and Differential Evolution [Baeck2000] [Baeck2000a]. The evolutionary process is considered an adaptive strategy and is typically applied to search and optimization domains [Goldberg1989] [Holland1975].

Swarm Intelligence

A paradigm that considers collective intelligence as a behavior that emerges through the interaction and cooperation of large numbers of lesser intelligent agents. The paradigm consists of two dominant sub-fields 1) Ant Colony Optimization that investigates probabilistic algorithms inspired by the foraging behavior of ants [Bonabeau1999] [Dorigo2004], and 2) Particle Swarm Optimization that investigates probabilistic algorithms inspired by the flocking and foraging behavior of birds and fish [Kennedy2001]. Like evolutionary computation, swarm intelligence-based techniques are considered adaptive strategies and are typically applied to search and optimization domains.

Artificial Neural Networks

Neural Networks are a paradigm that is concerned with the investigation of architectures and learning strategies inspired by the modeling of neurons in the brain [Bishop1995]. Learning strategies are typically divided into supervised and unsupervised which manage environmental feedback in different ways. Neural network learning processes are considered adaptive learning and are typically applied to function approximation and pattern recognition domains.

Fuzzy Intelligence

Fuzzy Intelligence is a paradigm that is concerned with the investigation of fuzzy logic, which is a form of logic that is not constrained to true and false determinations like propositional logic, but rather functions which define approximate truth, or degree’s of truth [Zadeh1996]. Fuzzy logic and fuzzy systems are a logic system used as a reasoning strategy and are typically applied to expert system and control system domains.

Artificial Immune Systems

A collection of approaches inspired by the structure and function of the acquired immune system of vertebrates. Popular approaches include clonal selection, negative selection, the dendritic cell algorithm, and immune network algorithms. The immune-inspired adaptive processes vary in strategy and show similarities to the fields of Evolutionary Computation and Artificial Neural Networks, and are typically used for optimization and pattern recognition domains [Castro2002].

Metaheuristics

Another popular name for the strategy-outcome perspective of scruffy AI is metaheuristics. In this context, heuristic is an algorithm that locates 'good enough' solutions to a problem without concern for whether the solution can be proven to be correct or optimal [Michalewicz2004]. Heuristic methods trade-off concerns such as precision, quality, and accuracy in favor of computational effort (space and time efficiency). The greedy search procedure that only takes cost-improving steps is an example of heuristic method.

Like heuristics, metaheuristics may be considered a general algorithmic framework that can be applied to different optimization problems with relative few modifications to adapt them to a specific problem [Glover2003] [Talbi2009]. The difference is that metaheuristics are intended to extend the capabilities of heuristics by combining one or more heuristic methods (referred to as procedures) using a higher-level strategy (hence 'meta'). A procedure in a metaheuristic is considered black-box in that little (if any) prior knowledge is known about it by the metaheuristic, and as such it may be replaced with a different procedure. Procedures may be as simple as the manipulation of a representation, or as complex as another complete metaheuristic. Some examples of metaheuristics include iterated local search, tabu search, the genetic algorithm, ant colony optimization, and simulated annealing.

Blum and Roli outline nine properties of metaheuristics [Blum2003], as follows:

Metaheuristics are strategies that "guide" the search process.
The goal is to efficiently explore the search space in order to find (near-)optimal solutions.
Techniques which constitute metaheuristic algorithms range from simple local search procedures to complex learning processes.
Metaheuristic algorithms are approximate and usually non-deterministic.
They may incorporate mechanisms to avoid getting trapped in confined areas of the search space.
The basic concepts of metaheuristics permit an abstract level description.
Metaheuristics are not problem-specific.
Metaheuristics may make use of domain-specific knowledge in the form of heuristics that are controlled by the upper level strategy.
Today's more advanced metaheuristics use search experience (embodied in some form of memory) to guide the search.

Hyperheuristics are yet another extension that focuses on heuristics that modify their parameters (online or offline) to improve the efficacy of solution, or the efficiency of the computation. Hyperheuristics provide high-level strategies that may employ machine learning and adapt their search behavior by modifying the application of the sub-procedures or even which procedures are used (operating on the space of heuristics which in turn operate within the problem domain) [Burke2003a] [Burke2003].

Clever Algorithms

This book is concerned with 'clever algorithms', which are algorithms drawn from many sub-fields of artificial intelligence not limited to the scruffy fields of biologically inspired computation, computational intelligence and metaheuristics. The term 'clever algorithms' is intended to unify a collection of interesting and useful computational tools under a consistent and accessible banner. An alternative name (Inspired Algorithms) was considered, although ultimately rejected given that not all of the algorithms to be described in the project have an inspiration (specifically a biological or physical inspiration) for their computational strategy. The set of algorithms described in this book may generally be referred to as 'unconventional optimization algorithms' (for example, see [Corne1999]), as optimization is the main form of computation provided by the listed approaches. A technically more appropriate name for these approaches is stochastic global optimization (for example, see [Weise2007] and [Luke2009]).

Algorithms were selected in order to provide a rich and interesting coverage of the fields of Biologically Inspired Computation, Metaheuristics and Computational Intelligence. Rather than a coverage of just the state-of-the-art and popular methods, the algorithms presented also include historic and newly described methods. The final selection was designed to provoke curiosity and encourage exploration and a wider view of the field.

Problem Domains

Algorithms from the fields of Computational Intelligence, Biologically Inspired Computing, and Metaheuristics are applied to difficult problems, to which more traditional approaches may not be suited. Michalewicz and Fogel propose five reasons why problems may be difficult [Michalewicz2004] (page 11):

The number of possible solutions in the search space is so large as to forbid an exhaustive search for the best answer.
The problem is so complicated, that just to facilitate any answer at all, we have to use such simplified models of the problem that any result is essentially useless.
The evaluation function that describes the quality of any proposed solution is noisy or varies with time, thereby requiring not just a single solution but an entire series of solutions.
The possible solutions are so heavily constrained that constructing even one feasible answer is difficult, let alone searching for an optimal solution.
The person solving the problem is inadequately prepared or imagines some psychological barrier that prevents them from discovering a solution.

This section introduces two problem formalisms that embody many of the most difficult problems faced by Artificial and Computational Intelligence. They are: Function Optimization and Function Approximation. Each class of problem is described in terms of its general properties, a formalism, and a set of specialized sub-problems. These problem classes provide a tangible framing of the algorithmic techniques described throughout the work.

Function Optimization

Real-world optimization problems and generalizations thereof can be drawn from most fields of science, engineering, and information technology (for a sample [Ali1997] [Toern1999]). Importantly, function optimization problems have had a long tradition in the fields of Artificial Intelligence in motivating basic research into new problem solving techniques, and for investigating and verifying systemic behavior against benchmark problem instances.

Problem Description

Mathematically, optimization is defined as the search for a combination of parameters commonly referred to as decision variables ($x = \left{x_1, x_2, x_3, \ldots x_n\right}$) which minimize or maximize some ordinal quantity ($c$) (typically a scalar called a score or cost) assigned by an objective function or cost function ($f$), under a set of constraints ($g = \left{g_1, g_2, g_3, \ldots g_n\right}$). For example, a general minimization case would be as follows: $f(x') \leq f(x), \forall x_i \in x$. Constraints may provide boundaries on decision variables (for example in a real-value hypercube $\Re^n$), or may generally define regions of feasibility and in-feasibility in the decision variable space. In applied mathematics the field may be referred to as Mathematical Programming. More generally the field may be referred to as Global or Function Optimization given the focus on the objective function. For more general information on optimization refer to Horst et al. [Horst2000].

Sub-Fields of Study

The study of optimization is comprised of many specialized sub-fields, based on an overlapping taxonomy that focuses on the principle concerns in the general formalism. For example, with regard to the decision variables, one may consider univariate and multivariate optimization problems. The type of decision variables promotes specialities for continuous, discrete, and permutations of variables. Dependencies between decision variables under a cost function define the fields of Linear Programming, Quadratic Programming, and Nonlinear Programming. A large class of optimization problems can be reduced to discrete sets and are considered in the field of Combinatorial Optimization, to which many theoretical properties are known, most importantly that many interesting and relevant problems cannot be solved by an approach with polynomial time complexity (so-called NP, for example see Papadimitriou and Steiglitz [Papadimitriou1998]).

The evaluation of variables against a cost function, collectively may be considered a response surface. The shape of such a response surface may be convex, which is a class of functions to which many important theoretical findings have been made, not limited to the fact that location of the local optimal configuration also means the global optimal configuration of decision variables has been located [Boyd2004]. Many interesting and real-world optimization problems produce cost surfaces that are non-convex or so called multi-modal (Taken from statistics referring to the centers of mass in distributions, although in optimization it refers to 'regions of interest' in the search space, in particular valleys in minimization, and peaks in maximization cost surfaces.) (rather than unimodal) suggesting that there are multiple peaks and valleys. Further, many real-world optimization problems with continuous decision variables cannot be differentiated given their complexity or limited information availability, meaning that derivative-based gradient descent methods (that are well understood) are not applicable, necessitating the use of so-called 'direct search' (sample or pattern-based) methods [Lewis2000]. Real-world objective function evaluation may be noisy, discontinuous, and/or dynamic, and the constraints of real-world problem solving may require an approximate solution in limited time or resources, motivating the need for heuristic approaches.

Function Approximation

Real-world Function Approximation problems are among the most computationally difficult considered in the broader field of Artificial Intelligence for reasons including: incomplete information, high-dimensionality, noise in the sample observations, and non-linearities in the target function. This section considers the Function Approximation formalism and related specializations as a general motivating problem to contrast and compare with Function Optimization.

Problem Description

Function Approximation is the problem of finding a function ($f$) that approximates a target function ($g$), where typically the approximated function is selected based on a sample of observations ($x$, also referred to as the training set) taken from the unknown target function. In machine learning, the function approximation formalism is used to describe general problem types commonly referred to as pattern recognition, such as classification, clustering, and curve fitting (called a decision or discrimination function). Such general problem types are described in terms of approximating an unknown Probability Density Function (PDF), which underlies the relationships in the problem space, and is represented in the sample data. This perspective of such problems is commonly referred to as statistical machine learning and/or density estimation [Fukunaga1990] [Bishop1995].

Sub-Fields of Study

The function approximation formalism can be used to phrase some of the hardest problems faced by Computer Science, and Artificial Intelligence in particular, such as natural language processing and computer vision. The general process focuses on 1) the collection and preparation of the observations from the target function, 2) the selection and/or preparation of a model of the target function, and 3) the application and ongoing refinement of the prepared model. Some important problem-based sub-fields include:

Feature Selection where a feature is considered an aggregation of one-or-more attributes, where only those features that have meaning in the context of the target function are necessary to the modeling function [Kudo2000] [Guyon2003].
Classification where observations are inherently organized into labelled groups (classes) and a supervised process models an underlying discrimination function to classify unobserved samples.
Clustering where observations may be organized into groups based on underlying common features, although the groups are unlabeled requiring a process to model an underlying discrimination function without corrective feedback.
Curve or Surface Fitting where a model is prepared that provides a 'best-fit' (called a regression) for a set of observations that may be used for interpolation over known observations and extrapolation for observations outside what has been modeled.

The field of Function Optimization is related to Function Approximation, as many-sub-problems of Function Approximation may be defined as optimization problems. Many of the technique paradigms used for function approximation are differentiated based on the representation and the optimization process used to minimize error or maximize effectiveness on a given approximation problem. The difficulty of Function Approximation problems center around 1) the nature of the unknown relationships between attributes and features, 2) the number (dimensionality) of attributes and features, and 3) general concerns of noise in such relationships and the dynamic availability of samples from the target function. Additional difficulties include the incorporation of prior knowledge (such as imbalance in samples, incomplete information and the variable reliability of data), and problems of invariant features (such as transformation, translation, rotation, scaling, and skewing of features).

Unconventional Optimization

Not all algorithms described in this book are for optimization, although those that are may be referred to as 'unconventional' to differentiate them from the more traditional approaches. Examples of traditional approaches include (but are not not limited) mathematical optimization algorithms (such as Newton's method and Gradient Descent that use derivatives to locate a local minimum) and direct search methods (such as the Simplex method and the Nelder-Mead method that use a search pattern to locate optima). Unconventional optimization algorithms are designed for the more difficult problem instances, the attributes of which were introduced in . This section introduces some common attributes of this class of algorithm.

Black Box Algorithms

Black Box optimization algorithms are those that exploit little, if any, information from a problem domain in order to devise a solution. They are generalized problem solving procedures that may be applied to a range of problems with very little modification [Droste2006]. Domain specific knowledge refers to known relationships between solution representations and the objective cost function. Generally speaking, the less domain specific information incorporated into a technique, the more flexible the technique, although the less efficient it will be for a given problem. For example, 'random search' is the most general black box approach and is also the most flexible requiring only the generation of random solutions for a given problem. Random search allows resampling of the domain which gives it a worst case behavior that is worse than enumerating the entire search domain. In practice, the more prior knowledge available about a problem, the more information that can be exploited by a technique in order to efficiently locate a solution for the problem, heuristically or otherwise. Therefore, black box methods are those methods suitable for those problems where little information from the problem domain is available to be used by a problem solving approach.

No-Free-Lunch

The No-Free-Lunch Theorem of search and optimization by Wolpert and Macready proposes that all black box optimization algorithms are the same for searching for the extremum of a cost function when averaged over all possible functions [Wolpert1997] [Wolpert1995]. The theorem has caused a lot of pessimism and misunderstanding, particularly in relation to the evaluation and comparison of Metaheuristic and Computational Intelligence algorithms.

The implication of the theorem is that searching for the 'best' general-purpose black box optimization algorithm is irresponsible as no such procedure is theoretically possible. No-Free-Lunch applies to stochastic and deterministic optimization algorithms as well as to algorithms that learn and adjust their search strategy over time. It is independent of the performance measure used and the representation selected. Wolpert and Macready's original paper was produced at a time when grandiose generalizations were being made as to algorithm, representation, or configuration superiority. The practical impact of the theory is to encourage practitioners to bound claims of applicability for search and optimization algorithms. Wolpert and Macready encouraged effort be put into devising practical problem classes and into the matching of suitable algorithms to problem classes. Further, they compelled practitioners to exploit domain knowledge in optimization algorithm application, which is now an axiom in the field.

Stochastic Optimization

Stochastic optimization algorithms are those that use randomness to elicit non-deterministic behaviors, contrasted to purely deterministic procedures. Most algorithms from the fields of Computational Intelligence, Biologically Inspired Computation, and Metaheuristics may be considered to belong the field of Stochastic Optimization. Algorithms that exploit randomness are not random in behavior, rather they sample a problem space in a biased manner, focusing on areas of interest and neglecting less interesting areas [Spall2003]. A class of techniques that focus on the stochastic sampling of a domain, called Markov Chain Monte Carlo (MCMC) algorithms, provide good average performance, and generally offer a low chance of the worst case performance. Such approaches are suited to problems with many coupled degrees of freedom, for example large, high-dimensional spaces. MCMC approaches involve stochastically sampling from a target distribution function similar to Monte Carlo simulation methods using a process that resembles a biased Markov chain.

Monte Carlo methods are used for selecting a statistical sample to approximate a given target probability density function and are traditionally used in statistical physics. Samples are drawn sequentially and the process may include criteria for rejecting samples and biasing the sampling locations within high-dimensional spaces.
Markov Chain processes provide a probabilistic model for state transitions or moves within a discrete domain called a walk or a chain of steps. A Markov system is only dependent on the current position in the domain in order to probabilistically determine the next step in the walk.

MCMC techniques combine these two approaches to solve integration and optimization problems in large dimensional spaces by generating samples while exploring the space using a Markov chain process, rather than sequentially or independently [Andrieu2003]. The step generation is configured to bias sampling in more important regions of the domain. Three examples of MCMC techniques include the Metropolis-Hastings algorithm, Simulated Annealing for global optimization, and the Gibbs sampler which are commonly employed in the fields of physics, chemistry, statistics, and economics.

Inductive Learning

Many unconventional optimization algorithms employ a process that includes the iterative improvement of candidate solutions against an objective cost function. This process of adaptation is generally a method by which the process obtains characteristics that improve the system's (candidate solution) relative performance in an environment (cost function). This adaptive behavior is commonly achieved through a 'selectionist process' of repetition of the steps: generation, test, and selection. The use of non-deterministic processes mean that the sampling of the domain (the generation step) is typically non-parametric, although guided by past experience.

The method of acquiring information is called inductive learning or learning from example, where the approach uses the implicit assumption that specific examples are representative of the broader information content of the environment, specifically with regard to anticipated need. Many unconventional optimization approaches maintain a single candidate solution, a population of samples, or a compression thereof that provides both an instantaneous representation of all of the information acquired by the process, and the basis for generating and making future decisions.

This method of simultaneously acquiring and improving information from the domain and the optimization of decision making (where to direct future effort) is called the $k$-armed bandit (two-armed and multi-armed bandit) problem from the field of statistical decision making known as game theory [Robbins1952] [Bergemann2006]. This formalism considers the capability of a strategy to allocate available resources proportional to the future payoff the strategy is expected to receive. The classic example is the 2-armed bandit problem used by Goldberg to describe the behavior of the genetic algorithm [Goldberg1989]. The example involves an agent that learns which one of the two slot machines provides more return by pulling the handle of each (sampling the domain) and biasing future handle pulls proportional to the expected utility, based on the probabilistic experience with the past distribution of the payoff. The formalism may also be used to understand the properties of inductive learning demonstrated by the adaptive behavior of most unconventional optimization algorithms.

The stochastic iterative process of generate and test can be computationally wasteful, potentially re-searching areas of the problem space already searched, and requiring many trials or samples in order to achieve a 'good enough' solution. The limited use of prior knowledge from the domain (black box) coupled with the stochastic sampling process mean that the adapted solutions are created without top-down insight or instruction can sometimes be interesting, innovative, and even competitive with decades of human expertise [Koza2003].

Book Organization

The remainder of this book is organized into two parts: Algorithms that describes a large number of techniques in a complete and a consistent manner presented in a rough algorithm groups, and Extensions that reviews more advanced topics suitable for when a number of algorithms have been mastered.

Algorithms

Algorithms are presented in six groups or kingdoms distilled from the broader fields of study each in their own chapter, as follows:

Stochastic Algorithms that focuses on the introduction of randomness into heuristic methods (Stochastic Algorithms Chapter).
Evolutionary Algorithms inspired by evolution by means of natural selection (Evolutionary Algorithms Chapter).
Physical Algorithms inspired by physical and social systems (Physical Algorithms Chapter).
Probabilistic Algorithms that focuses on methods that build models and estimate distributions in search domains (Probabilistic Algorithms Chapter).
Swarm Algorithms that focuses on methods that exploit the properties of collective intelligence (Swarm Algorithms Chapter).
Immune Algorithms inspired by the adaptive immune system of vertebrates (Immune Algorithms Chapter).
Neural Algorithms inspired by the plasticity and learning qualities of the human nervous system (Neural Algorithms Chapter).

A given algorithm is more than just a procedure or code listing, each approach is an island of research. The meta-information that define the context of a technique is just as important to understanding and application as abstract recipes and concrete implementations. A standardized algorithm description is adopted to provide a consistent presentation of algorithms with a mixture of softer narrative descriptions, programmatic descriptions both abstract and concrete, and most importantly useful sources for finding out more information about the technique.

The standardized algorithm description template covers the following subjects:

Name: The algorithm name defines the canonical name used to refer to the technique, in addition to common aliases, abbreviations, and acronyms. The name is used as the heading of an algorithm description.
Taxonomy: The algorithm taxonomy defines where a technique fits into the field, both the specific sub-fields of Computational Intelligence and Biologically Inspired Computation as well as the broader field of Artificial Intelligence. The taxonomy also provides a context for determining the relationships between algorithms.
Inspiration: (where appropriate) The inspiration describes the specific system or process that provoked the inception of the algorithm. The inspiring system may non-exclusively be natural, biological, physical, or social. The description of the inspiring system may include relevant domain specific theory, observation, nomenclature, and those salient attributes of the system that are somehow abstractly or conceptually manifest in the technique.
Metaphor: (where appropriate) The metaphor is a description of the technique in the context of the inspiring system or a different suitable system. The features of the technique are made apparent through an analogous description of the features of the inspiring system. The explanation through analogy is not expected to be literal, rather the method is used as an allegorical communication tool. The inspiring system is not explicitly described, this is the role of the 'inspiration' topic, which represents a loose dependency for this topic.
Strategy: The strategy is an abstract description of the computational model. The strategy describes the information processing actions a technique shall take in order to achieve an objective, providing a logical separation between a computational realization (procedure) and an analogous system (metaphor). A given problem solving strategy may be realized as one of a number of specific algorithms or problem solving systems.
Procedure: The algorithmic procedure summarizes the specifics of realizing a strategy as a systemized and parameterized computation. It outlines how the algorithm is organized in terms of the computation, data structures, and representations.
Heuristics: The heuristics section describes the commonsense, best practice, and demonstrated rules for applying and configuring a parameterized algorithm. The heuristics relate to the technical details of the technique's procedure and data structures for general classes of application (neither specific implementations nor specific problem instances).
Code Listing: The code listing description provides a minimal but functional version of the technique implemented with a programming language. The code description can be typed into a computer and provide a working execution of the technique. The technique implementation also includes a minimal problem instance to which it is applied, and both the problem and algorithm implementations are complete enough to demonstrate the techniques procedure. The description is presented as a programming source code listing with a terse introductory summary.
References: The references section includes a listing of both primary sources of information about the technique as well as useful introductory sources for novices to gain a deeper understanding of the theory and application of the technique. The description consists of hand-selected reference material including books, peer reviewed conference papers, and journal articles.

Source code examples are included in the algorithm descriptions, and the Ruby Programming Language was selected for use throughout the book. Ruby was selected because it supports the procedural programming paradigm, adopted to ensure that examples can be easily ported to object-oriented and other paradigms. Additionally, Ruby is an interpreted language, meaning the code can be directly executed without an introduced compilation step, and it is free to download and use from the Internet. (Ruby can be downloaded for free from http://www.ruby-lang.org) Ruby is concise, expressive, and supports meta-programming features that improve the readability of code examples.

The sample code provides a working version of a given technique for demonstration purposes. Having a tinker with a technique can really bring it to life and provide valuable insight into a method. The sample code is a minimum implementation, providing plenty of opportunity to explore, extend and optimize. All of the source code for the algorithms presented in this book is available from the companion website, online at http://www.CleverAlgorithms.com. All algorithm implementations were tested with Ruby 1.8.6, 1.8.7 and 1.9.

Extensions

There are some some advanced topics that cannot be meaningfully considered until one has a firm grasp of a number of algorithms, and these are discussed at the back of the book. The Advanced Topics chapter addresses topics such as: the use of alternative programming paradigms when implementing clever algorithms, methodologies used when devising entirely new approaches, strategies to consider when testing clever algorithms, visualizing the behavior and results of algorithms, and comparing algorithms based on the results they produce using statistical methods. Like the background information provided in this chapter, the extensions provide a gentle introduction and starting point into some advanced topics, and references for seeking a deeper understanding.

How to Read this Book

This book is a reference text that provides a large compendium of algorithm descriptions. It is a trusted handbook of practical computational recipes to be consulted when one is confronted with difficult function optimization and approximation problems. It is also an encompassing guidebook of modern heuristic methods that may be browsed for inspiration, exploration, and general interest.

The audience for this work may be interested in the fields of Computational Intelligence, Biologically Inspired Computation, and Metaheuristics and may count themselves as belonging to one of the following broader groups:

Scientists: Research scientists concerned with theoretically or empirically investigating algorithms, addressing questions such as: What is the motivating system and strategy for a given technique? What are some algorithms that may be used in a comparison within a given subfield or across subfields?
Engineers: Programmers and developers concerned with implementing, applying, or maintaining algorithms, addressing questions such as: What is the procedure for a given technique? What are the best practice heuristics for employing a given technique?
Students: Undergraduate and graduate students interested in learning about techniques, addressing questions such as: What are some interesting algorithms to study? How to implement a given approach?
Amateurs: Practitioners interested in knowing more about algorithms, addressing questions such as: What classes of techniques exist and what algorithms do they provide? How to conceptualize the computation of a technique?

Bibliography

[Aaronson2005]	S. Aaronson, "NP-complete problems and physical reality", ACM SIGACT News (COLUMN: Complexity theory), 2005.
[Ali1997]	M. M. Ali and C. Storey and A T\örn, "Application of Stochastic Global Optimization Algorithms to Practical\n\tProblems", Journal of Optimization Theory and Applications, 1997.
[Andrieu2003]	C. Andrieu and N. de Freitas and A. Doucet and M. I. Jordan, "An Introduction to MCMC for Machine Learning", Machine Learning, 2003.
[Baeck2000]	T. B\äck and D. B. Fogel and Z. Michalewicz (editors), "Evolutionary Computation 1: Basic Algorithms and Operators", IoP, 2000.
[Baeck2000a]	T. B\äck and D. B. Fogel and Z. Michalewicz (editors), "Evolutionary Computation 2: Advanced Algorithms and Operations", IoP, 2000.
[Benyus1998]	J. M. Benyus, "Biomimicry: Innovation Inspired by Nature", Quill, 1998.
[Bergemann2006]	D. Bergemann and J. Valimaki, "Bandit Problems", Technical Report 1551, Cowles Foundation, Yale University, 2006.
[Bishop1995]	C. M. Bishop, "Neural Networks for Pattern Recognition", Oxford University Press, 1995.
[Blum2003]	C. Blum and A. Roli, "Metaheuristics in combinatorial optimization: Overview and conceptual\n\tcomparison", ACM Computing Surveys (CSUR), 2003.
[Bonabeau1999]	E. Bonabeau and M. Dorigo and G. Theraulaz, "Swarm Intelligence: From Natural to Artificial Systems", Oxford University Press US, 1999.
[Boyd2004]	S. Boyd and L. Vandenberghe, "Convex Optimization", Cambridge University Press, 2004.
[Burke2003]	E. K. Burke and G. Kendall and E. Soubeiga, "A Tabu-Search Hyper-Heuristic for Timetabling and Rostering", Journal of Heuristics, 2003.
[Burke2003a]	E. K. Burke and E. Hart and G. Kendall and J. Newall and P. Ross\n\tand S. Schulenburg, "Hyper-heuristics: An emerging direction in modern search technology", in Handbook of Metaheuristics, pages 457–474, Kluwer, 2003.
[Castro2002]	L. N. de Castro and J. Timmis, "Artificial Immune Systems: A New Computational Intelligence Approach", Springer, 2002.
[Castro2005]	L. N. de Castro and F. J. Von Zuben, "Recent developments in biologically inspired computing", Idea Group Inc, 2005.
[Castro2005a]	L. N. de Castro and F. J. Von Zuben, "From biologically inspired computing to natural computing", in Recent developments in biologically inspired computing, Idea Group, 2005.
[Corne1999]	D. Corne and M. Dorigo and F. Glover, "New Ideas in Optimization", McGraw-Hill, 1999.
[Dorigo2004]	M. Dorigo and T. St\ützle, "Ant Colony Optimization", MIT Press, 2004.
[Droste2006]	S. Droste and T. Jansen and I. Wegener, "Upper and Lower Bounds for Randomized Search Heuristics in Black-Box\n\tOptimization", Theory of Computing Systems, 2006.
[Engelbrecht2007]	A. P. Engelbrecht, "Computational Intelligence: An Introduction", John Wiley and Sons, 2007.
[Flanagan2008]	D. Flanagan and Y. Matsumoto, "The Ruby Programming Language", O'Reilly Media, 2008.
[Forbes2000]	N. Forbes, "Biologically inspired computing", Computing in Science and Engineering, 2000.
[Forbes2005]	N. Forbes, "Imitation of Life: How Biology Is Inspiring Computing", The MIT Press, 2005.
[Fukunaga1990]	K. Fukunaga, "Introduction to Statistical Pattern Recognition", Academic Press, 1990.
[Glover2003]	F. Glover and G. A. Kochenberger, "Handbook of Metaheuristics", Springer, 2003.
[Goldberg1989]	D. E. Goldberg, "Genetic Algorithms in Search, Optimization, and Machine Learning", Addison-Wesley, 1989.
[Guyon2003]	I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection", Journal of Machine Learning Research, 2003.
[Holland1975]	J. H. Holland, "Adaptation in natural and artificial systems: An introductory analysis\n\twith applications to biology, control, and artificial intelligence", University of Michigan Press, 1975.
[Horst2000]	R. Horst and P. M. Pardalos and N. V. Thoai, "Introduction to Global Optimization", Kluwer Academic Publishers, 2000.
[Kennedy2001]	J. Kennedy and R. C. Eberhart and Y. Shi, "Swarm Intelligence", Morgan Kaufmann, 2001.
[Koza2003]	J. R. Koza and M. A. Keane and M. J. Streeter and W. Mydlowec and\n\tJ. Yu and G. Lanza, "Genetic Programming IV: Routine Human-Competitive Machine Intelligence", Springer, 2003.
[Kudo2000]	M. Kudo and J. Sklansky, "Comparison of algorithms that select features for pattern classifiers", Pattern Recognition, 2000.
[Lewis2000]	R. M. Lewis and V. T. and M. W. Trosset, "Direct search methods: then and now", Journal of Computational and Applied Mathematics, 2000.
[Luger1993]	G. F. Luger and W. A. Stubblefield, "Artificial Intelligence: Structures and Strategies for Complex Problem\n\tSolving", Benjamin/Cummings Pub. Co., 1993.
[Luke2009]	S. Luke, "Essentials of Metaheuristics", Lulu, 2010.
[Marrow2000]	P. Marrow, "Nature-inspired computing technology and applications", BT Technology Journal, 2000.
[Michalewicz2004]	Z. Michalewicz and D. B. Fogel, "How to Solve It: Modern Heuristics", Springer, 2004.
[Papadimitriou1998]	C. H. Papadimitriou and K. Steiglitz, "Combinatorial Optimization: Algorithms and Complexity", Courier Dover Publications, 1998.
[Paton1994]	R. Paton, "Introduction to computing with biological metaphors", in Computing With Biological Metaphors, pages 1–8, Chapman \& Hall, 1994.
[Paun2005]	G. Paŭn, "Bio-inspired computing paradigms (natural computing)", Unconventional Programming Paradigms, 2005.
[Pedrycz1997]	W. Pedrycz, "Computational Intelligence: An Introduction", CRC Press, 1997.
[Robbins1952]	H. Robbins, "Some aspects of the sequential design of experiments", Bull. Amer. Math. Soc., 1952.
[Russell2009]	S. Russell and P. Norvig, "Artificial Intelligence: A Modern Approach", Prentice Hall, 2009.
[Sloman1990]	A. Sloman, "Must intelligent systems be scruffy?", in Evolving Knowledge in Natural Science and Artificial Intelligence, Pitman, 1990.
[Spall2003]	J. C. Spall, "Introduction to stochastic search and optimization: estimation, simulation,\n\tand control", John Wiley and Sons, 2003.
[Talbi2009]	E. G. Talbi, "Metaheuristics: From Design to Implementation", John Wiley and Sons, 2009.
[Thomas2004]	D. Thomas and C. Fowler and A. Hunt, "Programming Ruby: The Pragmatic Programmers' Guide", Pragmatic Bookshelf, 2004.
[Toern1999]	A. T\örn and M. M. Ali and S. Viitanen, "Stochastic Global Optimization: Problem Classes and Solution Techniques", Journal of Global Optimization, 1999.
[Weise2007]	T. Weise, "Global Optimization Algorithms - Theory and Application", (Self Published), 2007.
[Wolpert1995]	D. H. Wolpert and W. G. Macready, "No Free Lunch Theorems for Search", Santa Fe Institute, 1995.
[Wolpert1997]	D. H. Wolpert and W. G. Macready, "No Free Lunch Theorems for Optimization", IEEE Transactions on Evolutionary Computation, 1997.
[Zadeh1996]	L. A. Zadeh and G. J. Klir and B. Yuan, "Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers", World Scientific, 1996.

Please Note: This content was automatically generated from the book content and may contain minor differences.