Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Problem Solving Strategies

The field of Data Mining has clear methodologies that guide a practitioner to solve problems, such as Knowledge Discovery in Databases (KDD) [Fayyad1996]. Metaheuristics and Computational Intelligence algorithms have no such methodology. (Some methods can be used for classification and regression and as such may fit into methodologies such as KDD.)

This section describes some of the considerations when applying algorithms from the fields of Metaheuristics, Computational Intelligence, and Biologically Inspired Computation to practical problem domains. This discussion includes:

The suitability of application of a given technique to a given problem and the transferability of algorithm and problem features
The distinction between strong and weak methods which use more or less problem specific information respectively, and the continuum between these extremes.
A summary of problem solving strategies that suggest different ways of applying a given technique to the function optimization and approximation fields.

Suitability of Application

From a problem-solving perspective, the tools that emerge from the field of Computational Intelligence are generally assessed with regard to their utility as efficiently or effectively solving problems. An important lesson from the No-Free-Lunch Theorem was to bound claims of applicability (see the Introduction), that is to consider the suitability of a given strategy with regard to the feature overlap with the attributes of a given problem domain. From a Computational Intelligence perspective, one may consider the architecture, processes, and constraints of a given strategy as the features of an approach.

The suitability of the application of a particular approach to a problem takes into considerations concerns such as the appropriateness (can the approach address the problem), the feasibility (available resources and related efficiency concerns), and the flexibility (ability to address unexpected or unintended effects). This section summarizes a general methodology toward addressing the problem of suitability in the context of Computational Intelligence tools. This methodology involves 1) the systematic elicitation of system and problem features, and 2) the consideration of the overlap of problem-problem, algorithm-algorithm, and problem-algorithm overlap of feature sets.

Systematic Feature Elicitation

A feature of a system (tool, strategy, model) or a problem is a distinctive element or property that may be used to differentiate it from similar and/or related cases. Examples may include functional concerns such as: processes, data structures, architectures, and constraints, as well as emergent concerns that may have a more subjective quality such as general behaviors, organizations, and higher-order structures. The process of the elicitation of features may be taken from a system or problem perspective:

System Perspective: This requires a strong focus on the lower level functional elements and investigations that work toward correlating specific controlled procedures towards predictable emergent behaviors.
Problem Perspective: May require both a generalization of the specific case to the general problem case, as well as a functional or logical decomposition into constituent parts.

Problem generalization and functional decomposition are important and commonly used patterns for problem solving in the broader fields of Artificial Intelligence and Machine Learning. The promotion of simplification and modularity can reduce the cost and complexity of achieving solutions [Russell2009] [Brooks1986].

Feature Overlap

Overlap in elicited features may be considered from three important perspectives: between systems, between problems, and between a system and a problem. Further, such overlap may be considered at different levels of detail with regard to generalized problem solving strategies and problem definitions. These overlap cases are considered as follows:

System Overlap defines the suitability of comparing one system to another, referred to as comparability. For example, systems may be considered for the same general problems and compared in terms of theoretical or empirical capability, the results of which may only be meaningful if the systems are significantly similar to each other as assessed in terms of feature overlap.
Problem Overlap defines the suitability of comparing one problem to another, referred to as transferability. From a systems focus, transferability refers to the capability of a technique on a given problem to be successfully applied to another problem, the result of which is only meaningful if there is a strong overlap between the problems under consideration.
System-Problem Overlap defines the suitability of a system on a given problem, referred to as applicability. For example, a system is considered suitable for a given problem if it has a significant overlap in capabilities with the requirements of the problem definition.

Such mappings are imprecise given the subjective assessment and complexity required in both the elicitation and consideration overlap of the of features, the hardest of which is expected to be the mapping between systems and problems. The mapping of salient features of algorithms and problems was proposed as an important reconciliation of the No-Free-Lunch Theorem by Wolpert and Macready [Wolpert1997], although the important difference of this approach is that the system and algorithm are given prior to the assessment. In their first work on the theorem, Wolpert and Macready specifically propose the elicitation of the features from a problem-first perspective, for which specialized algorithms can be defined [Wolpert1995]. Therefore, this methodology of suitability may be considered a generalization of this reconciliation suitable for the altered Computational Intelligence (strategy first) perspective on Artificial Intelligence.

Strong and Weak Methods

Generally, the methods from the fields of Metaheuristics, Computational Intelligence, and Biologically Inspired Computation may be considered weak methods. They are general purpose and are typically considered black-box solvers for a range of problem domains. The stronger the method, the more that must be known about the problem domain. Rather than discriminating techniques into weak and strong it is more useful to consider a continuum of methods from pure block box techniques that have few assumptions about the problem domain, to strong methods that exploit most or all of the problem specific information available.

For example, the Traveling Salesman Problem is an example of a combinatorial optimization problem. A naive (such a Random Search) black box method may simply explore permutations of the cities. Slightly stronger methods may initialize the search with a heuristic-generated technique (such as nearest neighbor) and explore the search space using a variation method that also exploits heuristic information about the domain (such as a 2-opt variation). Continuing along this theme, a stochastic method may explore the search space using a combination of probabilistic and heuristic information (such as Ant Colony Optimization algorithms). At the other end of the scale the stochastic elements are decreased or removed until one is left with pure heuristic methods such as the Lin-Kernighan heuristic [Lin1973] and exact algorithms from linear and dynamic programming that focus on the structure and nature of the problem [Woeginger2003].

Approaching a problem is not as simple as selecting the strongest method available and solving it. The following describes two potential strategies:

Start Strong: Select the strongest technique available and apply it to the problem. Difficult problems can be resistant to traditional methods for many intrinsic and extrinsic reasons. Use products from a strong technique (best solution found, heuristics) to seed the next weaker method in line.
Start Weak: Strong methods do not exist for all problems, and if they do exist, the computation, skill, and/or time resources may not be available to exploit them. Start with a weak technique and use it to learn about the problem domain. Use this information to make better decisions about subsequent techniques to try that can exploit what has been learned.

In a real-world engineering or business scenario, the objective is to solve a problem or achieve the best possible solution to the problem within the operating constraints. Concerns of algorithm and technique purity become less important than they may be in their respective fields of research. Both of the above strategies suggest an iterative methodology, where the product or knowledge gained from one technique may be used to prime a subsequent stronger or weaker technique.

Domain-Specific Strategies

An algorithm may be considered a strategy for problem solving. There are a wide range of ways in which a given algorithm can be used to solve a problem. Function Optimization and Function Approximation were presented as two general classes of problems to which the algorithms from the fields of Metaheuristics, Computational Intelligence, and Biologically Inspired Computation are applied. This section reviews general problem problem solving strategies that may be adopted for a given technique in each of these general problem domains.

Function Optimization

This section reviews a select set of strategies for addressing optimization problems from the field of Metaheuristics and Computational Intelligence to provide general insight into the state of the interaction between stochastic algorithms and the field of optimization. This section draws heavily from the field of Evolutionary Computation, Swarm Intelligence, and related Computational Intelligence sub-fields.

Global and Local Optimization Global Optimization refers to seeking a globally optimal structure or approximation thereof in a given problem domain. Global is differentiated from Local Optimization in that the latter focuses on locating an optimal structure within a constrained region of the decision variable search space, such as a single peak or valley (basin of attraction). In the literature, global optimization problems refers to the class of optimization problems that generally cannot be addressed through more conventional approaches such as gradient descent methods (that require mathematical derivatives) and pattern search (that can get 'stuck' in local optima and never converge) [Price1977] [Toern1999].

A global search strategy provides the benefit of making few if any assumptions about where promising areas of the search space may be, potentially highlighting unintuitive combinations of parameters. A local search strategy provides the benefit of focus and refinement of an existing candidate solution. It is common to apply a local search method to the solutions located by a global search procedure as a refinement strategy (such as using a Hill Climber after a Genetic Algorithm), and some methods have both techniques built in (such as GRASP in ).

Parallel Optimization A natural step toward addressing difficult (large and rugged cost landscapes) is to exploit parallel and distributed hardware, to get an improved result in the same amount of time, the same result in less time, or both [Crainic2005]. Towards unifying the myriad of approaches and hardware configurations, a general consensus and taxonomy has been defined by the Parallel Evolutionary Algorithms (PEA) and Parallel Metaheuristics fields that considers the ratio of communication to computation called granularity [Cantu-Paz2000] [Alba2005a].

This taxonomy is presented concisely by Alba and Tomassini as a plot or trade-off of three concerns: 1) the number of sub-populations (models or parallel strategies working on the problem), 2) the coupling between the sub-populations (frequency and amplitude of communication), and 3) the size of the sub-populations (size or extent of the sub-models) [Alba2002].

Two important and relevant findings from the narrower field of Parallel Evolutionary Algorithms include 1) that tight coupling (frequent inter-system migration of candidate solutions) between coarse-grained models typically results in worse performance than a non-distributed approach [Alba2000], and 2) that loose coupling (infrequent migration) between coarse-grained models has been consistently shown to provide a super-linear increase in performance [Alba2002a] [Belding1995] [Cantu-Paz2000].

Cooperative Search This is a more general approach that considers the use of multiple models that work together to address a difficult optimization problems. Durfee et al. consider so-called Cooperative Distributed Problem Solving (CDPS) in which a network of loosely coupled solvers are employed to address complex distributed problems. In such systems, it is desirable to match the processing capabilities of the solver to the attributes of the problem. For example, a given problem may have spatially distributed, functionally distributed, or temporally distributed sub-problems to which a centralized and monolithic system may not be suitable.

Lesser [Lesser1990] considers CDPS and proposes such models perform distributed search on dependent or independent and potentially overlapping sub-problems as a motivating perspective for conducting research into Distributed Artificial Intelligence (DAI) (This perspective provided the basis for what became the field of Multi-Agent Systems (MAS).). Lesser points out that in real world applications, it is hard to get a optimal mapping between the allocated resources and the needs or availability of information for a given problem, suggesting that such problems may be caused by a mismatch in processing times and/or number of sub-problems, interdependencies between sub-problems, and local experts whose expertise cannot be effectively communicated. For a more detail on the relationships between parallel and cooperative search, El-Abd and Kamel provide a rigorous taxonomy [El-Abd2005].

Hybrid Search Hybrid Search is a perspective on optimization that focuses on the use of multiple and likely different approaches either sequentially (as in the canonical global and local search case), or in parallel (such as in Cooperative Search). For example in this latter case, it is common in the field of PEA to encourage different levels of exploration and exploitation across island populations by varying the operators or operator configurations used [Tanese1989] [Adamidis1996].

Talbi proposed a detailed 4-level taxonomy of Hybrid Metaheuristics that concerns parallel and cooperating approaches [Talbi2001]. The taxonomy encompasses parallel and cooperative considerations for optimization and focuses on the discriminating features in the lowest level such as heterogeneity, and specialization of approaches.

Functional Decomposition Three examples of a functional decomposition of optimization include 1) multiple objectives, 2) multiple constraints, and 3) partitions of the decision variable search space.

Multi-Objective Optimization (MOO) is a sub-field that is concerned with the optimization of two or more objective functions. A solution to a MOO conventionally involves locating and returning a set of candidate solutions called the non-dominated set [Deb2001]. The Pareto optimal set, is the set of optimal non-dominated solutions. For a given problem no feasible solution exists that dominates a Pareto optimal solution. All solutions that are Pareto optimal belong to the Pareto set, and the points that these solutions map to in the objective space is called the Pareto front. The complexity with MOO problems is in the typically unknown dependencies between decision variables across objectives, that in the case of conflicts, must be traded off (Purshouse and Fleming provide a taxonomy of such complexity [Purshouse2003]).

Constraint Satisfaction Problem's (CSP) involve the optimization of decision variables under a set of constraints. The principle complexity in such problems is in locating structures that are feasible or violate the least number of constraints, optimizing such feasibility [Tsang1993] [Kumar1992].

Search Space Partitioning involves partitioning of the decision variable search space (for example see Multispace Search by Gu et al. [Du1997] [Gu1997] [Gu1994]). This is a critical consideration given that for equal-sized dimensional bounds on parameters, an increase in decision variables results in an exponential increase in the volume of the space to search.

Availability Decomposition Optimization problems may be partitioned by the concerns of temporal and spatial distribution of 1) information availability, and 2) computation availability. An interesting area of research regarding variable information availability for optimization problems is called Interactive Evolutionary Computation, in which one or a collection of human operators dynamically interact with an optimization process [Takagi2001]. Example problem domains include but are not limited to computer graphics, industrial design, image processing, and drug design.

There is an increasing demand to exploit clusters of heterogeneous workstations to complete large-scale distributed computation tasks like optimization, typically in an opportunistic manner such as when individual machines are underutilized. The effect is that optimization strategies such as random partitioning of the search space (independent non-interacting processing) are required to take advantage of such environments for optimization problems [Schnekenburger1993] [Liu2000].

Meta Optimization One may optimize at a level above that considered in previous sections. Specifically, 1) the iterative generation of an inductive model called multiple restart optimization, and 2) the optimization of the parameters of the process that generates an inductive model of an optimization problem. Multiple or iterative restarts involves multiple independent algorithm executions from different (random) starting conditions. It is generally considered as a method for achieving an improved result in difficult optimization problems where a given strategy is deceived by local or false optima [Muselli1997] [Hu1994], typically requiring a restart schedule [Fukunaga1998].

A second and well studied form of meta optimization involves the optimization of the search process itself. Classical examples include the self-adaptation of mutation parameters (step sizes) in the Evolutionary Strategies (ES) and Evolutionary Programming (EP) approaches. Smith and Fogarty provided a review of genetic algorithms with adaptive strategies including a taxonomy in which the meta-adaptations are applied at one of three levels: 1) the population (adapting the overall sampling strategy), 2) the individual (adapting the creation of new samples in the decision variable space), and 3) components (modifying component contributions and/or individual step sizes as in ES and EP) [Smith1997b].

Function Approximation

This section reviews a select set of strategies for addressing Function Approximation problems from the fields of Artificial Intelligence and Computational Intelligence to provide general insight into the state of the interaction between stochastic algorithms and the field. The review draws heavily from the fields of Artificial Neural Networks, specifically Competitive Learning, as well as related inductive Machine Learning fields such as Instance Based Learning.

Vector Quantization Vector Quantization (VQ) refers to a method of approximating a target function using a set of exemplar (prototype or codebook) vectors. The exemplars represent a discrete subset of the problem, generally restricted to the features of interest using the natural representation of the observations in the problem space, typically an an unconstrained $n$-dimensional real valued space. The VQ method provides the advantage of a non-parametric model of a target function (like instance-based and lazy learning such as the $k$-Nearest-Neighbor method (kNN)) using a symbolic representation that is meaningful in the domain (like tree-based approaches).

The promotion of compression addresses the storage and retrieval concerns of kNN, although the selection of codebook vectors (the so-called quantization problem) is a hard problem that is known to be NP-complete [Garey1982]. More recently Kuncheva and Bezdek have worked towards unifying quantization methods in the application to classification problems, referring to the approaches as Nearest Prototype Classifiers (NPC) and proposing a generalized nearest prototype classifier [Kuncheva1998] [Kuncheva1998a].

Parallelization Instance-based approaches are inherently parallel given the generally discrete independent nature in which they are used, specifically in a case or per-query manner. As such, parallel hardware can be exploited in the preparation of the corpus of prototypes (parallel preparation), and more so in the application of the corpus given its read-only usage [Aamodt1994] [Nagendra1996] [Plaza1997]. With regard to vector quantization specifically, there is an industry centered around the design and development of VQ and WTA algorithms and circuits given their usage to compress digital audio and video data [Nakada1999] [Parhi1994].

Cooperative Methods Classical cooperative methods in the broader field of statistical machine learning are referred to as Ensemble Methods [Opitz1999] [Polikar2006] or more recently Multiclassifier Systems [Ghosh2002].

Boosting is based on the principle of combining a set of quasi-independent weak learners that collectively are as effective as a single strong learner [Kearns1988] [Schapire1992]. The seminal approach is called Adaptive Boosting (AdaBoost) that involves the preparation of a series of classifiers, where subsequent classifiers are prepared for the observations that are misclassified by the proceeding classifier models (creation of specialists) [Schapire2003].

Bootstrap Aggregation (bagging) involves partitioning the observations into $N$ randomly chosen subsets (with re-selection), and training a different model on each [Breiman1996]. Although robust to noisy datasets, the approach requires careful consideration as to the consensus mechanism between the independent models for decision making.

Stacked Generalization (stacking) involves creating a sequence of models of generally different types arranged into a stack, where subsequently added models generalize the behavior (success or failure) of the model before it with the intent of correcting erroneous decision making [Wolpert1992] [Ting1999].

Functional Decomposition As demonstrated, it is common in ensemble methods to partition the dataset either explicitly or implicitly to improve the approximation of the underlying target function. A first important decomposition involves partitioning the problem space into sub-spaces based on the attributes, regular groups of attributes called features, and decision attributes such as class labels. A popular method for attribute-based partitioning is called the Random Subspace Method, involving the random partitioning of attributes to which specialized model is prepared for each (commonly used on tree-based approaches) [Ho1998].

A related approach involves a hierarchical partitioning of attributes space into sub-vectors (sub-spaces) used to improve VQ-based compression [Gersho1984]. Another important functional decomposition methods involve the partitioning of the set of observations. The are many ways in which observations may be divided, although common approaches include pre-processing using clustering techniques to divide the set into natural groups, additional statistical approaches that partition based on central tendency and outliers, and re-sampling methods that are required to reduce the volume of observations.

Availability Decomposition The availability observations required to address function approximation in real-world problem domains motivate the current state of the art in Distributed Data Mining (DDM, or sometimes Collective Data Mining), Parallel Data Mining (PDM), and Distributed Knowledge Discovery in Database (DKDD) [Kargupta2000]. The general information availability concerns include 1) the intractable volume of observations, and 2) the spatial (geographical) and temporal distribution of information [Zaki1999]. In many real-world problems it is infeasible to centralize relevant observations for modeling, requiring scalable, load balancing, and incremental acquisition of information [Skillicorn1999].

Meta-Approximation The so-called ensemble or multiple-classifier methods may be considered meta approximation approaches as they are not specific to a given modeling technique. As with function optimization, meta-approaches may be divided into restart methods and meta-learning algorithms. The use of restart methods is a standard practice for connectionist approaches, and more generally in approaches that use random starting conditions and a gradient or local search method of refinement.

The method provides an opportunity for over-coming local optima in the error-response surface, when there is an unknown time remaining until convergence [Magdon-ismail2000], and can exploit parallel hardware to provide a speed advantage [Blas2005]. Ensemble methods and variants are examples of meta approximation approaches, as well as the use of consensus classifiers (gate networks in mixtures of experts) to integrate and weight the decision making properties from ensembles.

Bibliography

[Aamodt1994]	A. Aamodt and E. Plaza, "Case-Based Reasoning: Foundational Issues, Methodological Variations,\n\tand System Approaches", Artificial Intelligence Communications, 1994.
[Adamidis1996]	P. Adamidis and P. Adamidis and V. Petridis, "Co–operating Populations with Different Evolution Behaviours", in Proceedings IEEE International Conference on Evolutionary Computation, 1996.
[Alba2000]	E. Alba and J. M. Troya, "Influence of the Migration Policy in Parallel Distributed GAs with\n\tStructured and Panmictic Populations", Applied Intelligence, 2000.
[Alba2002]	E. Alba and M. Tomassini, "Parallelism and evolutionary algorithms", IEEE Transactions on Evolutionary Computation, 2002.
[Alba2002a]	E. Alba, "Parallel evolutionary algorithms can achieve super-linear performance", Information Processing Letters, 2002.
[Alba2005a]	E. Alba, "Parallel Metaheuristics: A New Class of Algorithms", John Wiley, 2005.
[Belding1995]	T. C. Belding, "The Distributed Genetic Algorithm Revisited", in Proceedings of the 6th International Conference on Genetic Algorithms, 1995.
[Blas2005]	A. Di Blas and A. Jagota and R. Hughey, "Optimizing neural networks on SIMD parallel computers", Parallel Computing, 2005.
[Breiman1996]	L. Breiman, "Bagging predictors", Machine Learning, 1996.
[Brooks1986]	R. Brooks, "A robust layered control system for a mobile robot", IEEE Journal Of Robotics And Automation, 1986.
[Cantu-Paz2000]	E. Cantú–Paz, "Efficient and Accurate Parallel Genetic Algorithms", Kluwer Academic Publishers (Springer), 2000.
[Crainic2005]	T. G. Crainic and N. Hail, "Parallel Metaheuristics Applications", in Parallel Metaheuristics, 2005.
[Deb2001]	K. Deb, "Multi-Objective Optimization Using Evolutionary Algorithms", John Wiley and Sons, 2001.
[Du1997]	B. Du and J. Gu and W. Wang and D. H. K. Tsang, "Multispace search for minimizing the maximum nodal degree", in Proceedings Sixth International Conference on Computer Communications\n\tand Networks, 1997.
[El-Abd2005]	M. El–Abd and M. Kamel, "A Taxonomy of Cooperative Search Algorithms", in Hybrid Metaheuristics, 2005.
[Fayyad1996]	U. Fayyad and G. Piatetsky-Shapiro and P. Smyth, "The KDD process for extracting useful knowledge from volumes of\n\tdata", Communications of the ACM, 1996.
[Fukunaga1998]	A. S. Fukunaga, "Restart Scheduling for genetic algorithms", in Parallel Problem Solving from Nature - PPSN V, 1998.
[Garey1982]	M. Garey and D. Johnson and H. Witsenhausen, "The complexity of the generalized Lloyd–Max problem (Corresp.)", IEEE Transactions on Information Theory, 1982.
[Gersho1984]	A. Gersho and A. Gersho and Y. Shoham, "Hierarchical vector quantization of speech with dynamic codebook\n\tallocation", in Proceedings of the IEEE International Conference on ICASSP '84.\n\tAcoustics, Speech, and Signal Processing, 1984.
[Ghosh2002]	J. Ghosh, "Multiclassifier Systems: Back to the Future", in Proceedings of the Third International Workshop on Multiple Classifier\n\tSystems, 2002.
[Gu1994]	J. Gu, "Multispace search: A new optimization approach", in Algorithms and Computation, 1994.
[Gu1997]	J. Gu, "Multispace Search for Satisfiability and NP–Hard Problems", in Satisfiability Problem: Theory and Applications : DIMACS Workshop, 1997.
[Ho1998]	T. K. Ho, "The random subspace method for constructing decision forests", IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998.
[Hu1994]	X. Hu and R. Shonkwiler and M. Spruill, "Random restarts in global optimization", Technical Report 110592-015, School of Mathematics, Georgia Institute of Technology, 1994.
[Kargupta2000]	H. Kargupta and P. Chan, "Advances in Distributed and Parallel Knowledge Discovery", AAAI Press/MIT Press, 2000.
[Kearns1988]	M. Kearns, "Thoughts on hypothesis boosting", 1988.
[Kumar1992]	V. Kumar, "Algorithms for constraint–satisfaction problems : a survey", The AI magazine (AI mag.), 1992.
[Kuncheva1998]	L. Kuncheva and J. Bezdek, "An integrated framework for generalized nearest prototype classifier\n\tdesign", Int. Journal of Uncertaintly, Fuzziness and Knowledge-Based Systems, 1998.
[Kuncheva1998a]	L. I. Kuncheva and J. C. Bezdek, "Nearest prototype classification: clustering, genetic algorithms,\n\tor random search?", IEEE Transactions on Systems, Man and Cybernetics, Part C, 1998.
[Lesser1990]	V. R. Lesser, "An Overview of DAI: Viewing Distributed AI as Distributed Search", Journal of Japanese Society for Artificial Intelligence-Special Issue\n\ton Distributed Artificial Intelligence, 1990.
[Lin1973]	S. Lin and B. W Kernighan, "An Effective Heuristic Algorithm for the Traveling-Salesman Problems", Operations Research, 1973.
[Liu2000]	P. Liu and D. W. Wang, "Reduction optimization in heterogeneous cluster environments", in Proceedings 14th International Parallel and Distributed Processing\n\tSymposium IPDPS 2000, 2000.
[Magdon-ismail2000]	M. Magdon–Ismail and A. F. Atiya, "The Early Restart Algorithm", Neural Computation, 2000.
[Muselli1997]	M. Muselli, "A Theoretical Approach to Restart in Global Optimization", Journal of Global Optimization, 1997.
[Nagendra1996]	M. V. Nagendra Prasad and V. R. Lesser and S. E. Lander, "Retrieval and reasoning in distributed case bases", Journal of Visual Communication and Image Representation, Special\n\tIssue on Digital Libraries, 1996.
[Nakada1999]	A. Nakada and T. Shibata and M. Konda and T. Morimoto and \n\tT. Ohmi, "A fully parallel vector-quantization processor for real-time motion-picture\n\tcompression", IEEE Journal of Solid-State Circuits, 1999.
[Opitz1999]	D. Opitz and R. Maclin, "Popular Ensemble Methods: An Empirical Study", Journal of Artificial Intelligence Research, 1999.
[Parhi1994]	K. K. Parhi and F. H. Wu and K. Genesan, "Sequential and parallel neural network vector quantizers", IEEE Transactions on Computers, 1994.
[Plaza1997]	E. Plaza and J. Lluis and A. F. Martin, "Cooperative Case-based Reasoning", in Distributed Artificial Intelligence Meets Machine Learning Learning\n\tin Multi-Agent Environments, 1997.
[Polikar2006]	R. Polikar, "Ensemble Based Systems in Decision Making", IEEE Circuits and Systems Magazine, 2006.
[Price1977]	W. L. Price, "A controlled random search procedure for global optimisation", The Computer Journal, 1977.
[Purshouse2003]	R. C. Purshouse and P. J. Fleming, "Conflict, Harmony, and Independence: Relationships in Evolutionary\n\tMulti-criterion Optimisation", in Proceedings of the Second International Conference on Evolutionary\n\tMulti-Criterion Optimization (EMO), 2003.
[Russell2009]	S. Russell and P. Norvig, "Artificial Intelligence: A Modern Approach", Prentice Hall, 2009.
[Schapire1992]	R. E. Schapire, "The strength of weak learnability", Machine Learning, 1992.
[Schapire2003]	R. E. Schapire, "The boosting approach to machine learning: An overview", in Nonlinear Estimation and Classification, 2003.
[Schnekenburger1993]	T. Schnekenburger, "Parallel Randomized Algorithms in Heterogeneous Environments", in Int. Conference on Systems Engineering, 1993.
[Skillicorn1999]	D. Skillicorn, "Strategies for parallel data mining", IEEE Concurrency, 1999.
[Smith1997b]	J. E. Smith and T. C. Fogarty, "Operator and parameter adaptation in genetic algorithms", Soft Computing - A Fusion of Foundations, Methodologies and Applications, 1997.
[Takagi2001]	H. Takagi, "Interactive evolutionary computation: Fusion of the capabilities\n\tof EC optimization and human evaluations", Proceedings of the IEEE, 2001.
[Talbi2001]	E. Talbi, "A taxonomy of hybrid metaheuristics", Journal of Heuristics, 2001.
[Tanese1989]	R. Tanese, "Distributed genetic algorithms", in Proceedings of the third international conference on Genetic algorithms, 1989.
[Ting1999]	K. M. Ting and I. H. Witten, "Issues in Stacked Generalization", Journal of Artificial Intelligence Research, 1999.
[Toern1999]	A. T\örn and M. M. Ali and S. Viitanen, "Stochastic Global Optimization: Problem Classes and Solution Techniques", Journal of Global Optimization, 1999.
[Tsang1993]	E. Tsang, "Foundations of Constraint Satisfaction", Academic Press, 1993.
[Woeginger2003]	G. J. Woeginger, "Exact Algorithms for NP–Hard Problems: A Surveys", Combinatorial Optimization – Eureka, You Shrink!, 2003.
[Wolpert1992]	D. H. Wolpert, "Stacked Generalization", Neural Networks, 1992.
[Wolpert1995]	D. H. Wolpert and W. G. Macready, "No Free Lunch Theorems for Search", Santa Fe Institute, 1995.
[Wolpert1997]	D. H. Wolpert and W. G. Macready, "No Free Lunch Theorems for Optimization", IEEE Transactions on Evolutionary Computation, 1997.
[Zaki1999]	M. J. Zaki, "Parallel and Distributed Data Mining: An Introduction", in Revised Papers from Large-Scale Parallel Data Mining, Workshop on\n\tLarge-Scale Parallel KDD Systems, SIGKDD, 1999.

Please Note: This content was automatically generated from the book content and may contain minor differences.