61 research outputs found

    Improving the efficiency of Bayesian Network Based EDAs and their application in Bioinformatics

    Get PDF
    Estimation of distribution algorithms (EDAs) is a relatively new trend of stochastic optimizers which have received a lot of attention during last decade. In each generation, EDAs build probabilistic models of promising solutions of an optimization problem to guide the search process. New sets of solutions are obtained by sampling the corresponding probability distributions. Using this approach, EDAs are able to provide the user a set of models that reveals the dependencies between variables of the optimization problems while solving them. In order to solve a complex problem, it is necessary to use a probabilistic model which is able to capture the dependencies. Bayesian networks are usually used for modeling multiple dependencies between variables. Learning Bayesian networks, especially for large problems with high degree of dependencies among their variables is highly computationally expensive which makes it the bottleneck of EDAs. Therefore introducing efficient Bayesian learning algorithms in EDAs seems necessary in order to use them for large problems. In this dissertation, after comparing several Bayesian network learning algorithms, we propose an algorithm, called CMSS-BOA, which uses a recently introduced heuristic called max-min parent children (MMPC) in order to constrain the model search space. This algorithm does not consider a fixed and small upper bound on the order of interaction between variables and is able solve problems with large numbers of variables efficiently. We compare the efficiency of CMSS-BOA with the standard Bayesian network based EDA for solving several benchmark problems and finally we use it to build a predictor for predicting the glycation sites in mammalian proteins

    Study on probabilistic model building genetic network programming

    Get PDF
    制度:新 ; 報告番号:甲3776号 ; 学位の種類:博士(工学) ; 授与年月日:2013/3/15 ; 早大学位記番号:新6149Waseda Universit

    A Clustering-Based Model-Building EA for Optimization Problems with Binary and Real-Valued Variables

    Get PDF
    We propose a novel clustering-based model-building evolutionary algorithm to tackle optimization problems that have both binary and real-valued variables. The search space is clustered every generation using a distance metric that considers binary and real-valued variables jointly in order to capture and exploit dependencies between variables of different types. After clustering, linkage learning takes place within each cluster to capture and exploit dependencies between variables of the same type. We compare this with a model-building approach that only considers dependencies between variables of the same type. Additionally, since many real-world problems have constraints, we examine the use of different well-known approaches to handling constraints: constraint domination, dynamic penalty and global competitive ranking. We experimentally analyze the performance of the proposed algorithms on various unconstrained problems as well as a selection of well-known MINLP benchmark problems that all have constraints, and compare our results with the Mixed-Integer Evolution Strategy (MIES). We find that our approach to clustering that is aimed at the processing of dependencies between binary and real-valued variables can significantly improve performance in terms of required population size and function evaluations when solving problems that exhibit properties such as multiple optima, strong mixed dependencies and constraints

    Adaptive algorithms for history matching and uncertainty quantification

    Get PDF
    Numerical reservoir simulation models are the basis for many decisions in regard to predicting, optimising, and improving production performance of oil and gas reservoirs. History matching is required to calibrate models to the dynamic behaviour of the reservoir, due to the existence of uncertainty in model parameters. Finally a set of history matched models are used for reservoir performance prediction and economic and risk assessment of different development scenarios. Various algorithms are employed to search and sample parameter space in history matching and uncertainty quantification problems. The algorithm choice and implementation, as done through a number of control parameters, have a significant impact on effectiveness and efficiency of the algorithm and thus, the quality of results and the speed of the process. This thesis is concerned with investigation, development, and implementation of improved and adaptive algorithms for reservoir history matching and uncertainty quantification problems. A set of evolutionary algorithms are considered and applied to history matching. The shared characteristic of applied algorithms is adaptation by balancing exploration and exploitation of the search space, which can lead to improved convergence and diversity. This includes the use of estimation of distribution algorithms, which implicitly adapt their search mechanism to the characteristics of the problem. Hybridising them with genetic algorithms, multiobjective sorting algorithms, and real-coded, multi-model and multivariate Gaussian-based models can help these algorithms to adapt even more and improve their performance. Finally diversity measures are used to develop an explicit, adaptive algorithm and control the algorithm’s performance, based on the structure of the problem. Uncertainty quantification in a Bayesian framework can be carried out by resampling of the search space using Markov chain Monte-Carlo sampling algorithms. Common critiques of these are low efficiency and their need for control parameter tuning. A Metropolis-Hastings sampling algorithm with an adaptive multivariate Gaussian proposal distribution and a K-nearest neighbour approximation has been developed and applied

    Investigating hybrids of evolution and learning for real-parameter optimization

    Get PDF
    In recent years, more and more advanced techniques have been developed in the field of hybridizing of evolution and learning, this means that more applications with these techniques can benefit from this progress. One example of these advanced techniques is the Learnable Evolution Model (LEM), which adopts learning as a guide for the general evolutionary search. Despite this trend and the progress in LEM, there are still many ideas and attempts which deserve further investigations and tests. For this purpose, this thesis has developed a number of new algorithms attempting to combine more learning algorithms with evolution in different ways. With these developments, we expect to understand the effects and relations between evolution and learning, and also achieve better performances in solving complex problems. The machine learning algorithms combined into the standard Genetic Algorithm (GA) are the supervised learning method k-nearest-neighbors (KNN), the Entropy-Based Discretization (ED) method, and the decision tree learning algorithm ID3. We test these algorithms on various real-parameter function optimization problems, especially the functions in the special session on CEC 2005 real-parameter function optimization. Additionally, a medical cancer chemotherapy treatment problem is solved in this thesis by some of our hybrid algorithms. The performances of these algorithms are compared with standard genetic algorithms and other well-known contemporary evolution and learning hybrid algorithms. Some of them are the CovarianceMatrix Adaptation Evolution Strategies (CMAES), and variants of the Estimation of Distribution Algorithms (EDA). Some important results have been derived from our experiments on these developed algorithms. Among them, we found that even some very simple learning methods hybridized properly with evolution procedure can provide significant performance improvement; and when more complex learning algorithms are incorporated with evolution, the resulting algorithms are very promising and compete very well against the state of the art hybrid algorithms both in well-defined real-parameter function optimization problems and a practical evaluation-expensive problem

    An Interval-based Multiobjective Approach to Feature Subset Selection Using Joint Modeling of Objectives and Variables

    Get PDF
    This paper studies feature subset selection in classification using a multiobjective estimation of distribution algorithm. We consider six functions, namely area under ROC curve, sensitivity, specificity, precision, F1 measure and Brier score, for evaluation of feature subsets and as the objectives of the problem. One of the characteristics of these objective functions is the existence of noise in their values that should be appropriately handled during optimization. Our proposed algorithm consists of two major techniques which are specially designed for the feature subset selection problem. The first one is a solution ranking method based on interval values to handle the noise in the objectives of this problem. The second one is a model estimation method for learning a joint probabilistic model of objectives and variables which is used to generate new solutions and advance through the search space. To simplify model estimation, l1 regularized regression is used to select a subset of problem variables before model learning. The proposed algorithm is compared with a well-known ranking method for interval-valued objectives and a standard multiobjective genetic algorithm. Particularly, the effects of the two new techniques are experimentally investigated. The experimental results show that the proposed algorithm is able to obtain comparable or better performance on the tested datasets

    Real-valued evolutionary multi-modal multi-objective optimization by hill-valley clustering

    Get PDF
    In model-based evolutionary algorithms (EAs), the underlying search distribution is adapted to the problem at hand, for example based on dependencies between decision variables. Hill-valley clustering is an adaptive niching method in which a set of solutions is clustered such that each cluster corresponds to a single mode in the fitness landscape. This can be used to adapt the search distribution of an EA to the number of modes, exploring each mode separately. Especially in a black-box setting, where the number of modes is a priori unknown, an adaptive approach is essential for good performance. In this work, we introduce multi-objective hill-valley clustering and combine it with MAMaLGaM, a multi-objective EA, into the multi-objective hill-valley EA (MO-HillVallEA). We empirically show that MO-HillVallEA outperforms MAMaLGaM and other well-known multi-objective optimization algorithms on a set of benchmark functions. Furthermore, and perhaps most important, we show that MO-HillVallEA is capable of obtaining and maintaining multiple approximation sets simultaneously over time

    Regularized model learning in EDAs for continuous and multi-objective optimization

    Get PDF
    Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods
    corecore