1,828 research outputs found

    Improved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration

    Get PDF
    Unlike traditional evolutionary algorithms which produce offspring via genetic operators, Estimation of Distribution Algorithms (EDAs) sample solutions from probabilistic models which are learned from selected individuals. It is hoped that EDAs may improve optimisation performance on epistatic fitness landscapes by learning variable interactions. However, hardly any rigorous results are available to support claims about the performance of EDAs, even for fitness functions without epistasis. The expected runtime of the Univariate Marginal Distribution Algorithm (UMDA) on OneMax was recently shown to be in O(nλlogλ)\mathcal{O}\left(n\lambda\log \lambda\right) by Dang and Lehre (GECCO 2015). Later, Krejca and Witt (FOGA 2017) proved the lower bound Ω(λn+nlogn)\Omega\left(\lambda\sqrt{n}+n\log n\right) via an involved drift analysis. We prove a O(nλ)\mathcal{O}\left(n\lambda\right) bound, given some restrictions on the population size. This implies the tight bound Θ(nlogn)\Theta\left(n\log n\right) when λ=O(logn)\lambda=\mathcal{O}\left(\log n\right), matching the runtime of classical EAs. Our analysis uses the level-based theorem and anti-concentration properties of the Poisson-Binomial distribution. We expect that these generic methods will facilitate further analysis of EDAs.Comment: 19 pages, 1 figur

    Level-Based Analysis of the Univariate Marginal Distribution Algorithm

    Get PDF
    Estimation of Distribution Algorithms (EDAs) are stochastic heuristics that search for optimal solutions by learning and sampling from probabilistic models. Despite their popularity in real-world applications, there is little rigorous understanding of their performance. Even for the Univariate Marginal Distribution Algorithm (UMDA) -- a simple population-based EDA assuming independence between decision variables -- the optimisation time on the linear problem OneMax was until recently undetermined. The incomplete theoretical understanding of EDAs is mainly due to lack of appropriate analytical tools. We show that the recently developed level-based theorem for non-elitist populations combined with anti-concentration results yield upper bounds on the expected optimisation time of the UMDA. This approach results in the bound O(nλlogλ+n2)\mathcal{O}(n\lambda\log \lambda+n^2) on two problems, LeadingOnes and BinVal, for population sizes λ>μ=Ω(logn)\lambda>\mu=\Omega(\log n), where μ\mu and λ\lambda are parameters of the algorithm. We also prove that the UMDA with population sizes μO(n)Ω(logn)\mu\in \mathcal{O}(\sqrt{n}) \cap \Omega(\log n) optimises OneMax in expected time O(λn)\mathcal{O}(\lambda n), and for larger population sizes μ=Ω(nlogn)\mu=\Omega(\sqrt{n}\log n), in expected time O(λn)\mathcal{O}(\lambda\sqrt{n}). The facility and generality of our arguments suggest that this is a promising approach to derive bounds on the expected optimisation time of EDAs.Comment: To appear in Algorithmica Journa

    Semiparametric Multivariate Accelerated Failure Time Model with Generalized Estimating Equations

    Full text link
    The semiparametric accelerated failure time model is not as widely used as the Cox relative risk model mainly due to computational difficulties. Recent developments in least squares estimation and induced smoothing estimating equations provide promising tools to make the accelerate failure time models more attractive in practice. For semiparametric multivariate accelerated failure time models, we propose a generalized estimating equation approach to account for the multivariate dependence through working correlation structures. The marginal error distributions can be either identical as in sequential event settings or different as in parallel event settings. Some regression coefficients can be shared across margins as needed. The initial estimator is a rank-based estimator with Gehan's weight, but obtained from an induced smoothing approach with computation ease. The resulting estimator is consistent and asymptotically normal, with a variance estimated through a multiplier resampling method. In a simulation study, our estimator was up to three times as efficient as the initial estimator, especially with stronger multivariate dependence and heavier censoring percentage. Two real examples demonstrate the utility of the proposed method

    From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms

    Full text link
    Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.Comment: Accepted for publication in "Journal of Machine Learning Research". Extended version of our GECCO 2020 paper. This article supersedes arXiv:2004.0714

    On the limitations of the univariate marginal distribution algorithm to deception and where bivariate EDAs might help

    Get PDF
    We introduce a new benchmark problem called Deceptive Leading Blocks (DLB) to rigorously study the runtime of the Univariate Marginal Distribution Algorithm (UMDA) in the presence of epistasis and deception. We show that simple Evolutionary Algorithms (EAs) outperform the UMDA unless the selective pressure μ/λ\mu/\lambda is extremely high, where μ\mu and λ\lambda are the parent and offspring population sizes, respectively. More precisely, we show that the UMDA with a parent population size of μ=Ω(logn)\mu=\Omega(\log n) has an expected runtime of eΩ(μ)e^{\Omega(\mu)} on the DLB problem assuming any selective pressure μλ141000\frac{\mu}{\lambda} \geq \frac{14}{1000}, as opposed to the expected runtime of O(nλlogλ+n3)\mathcal{O}(n\lambda\log \lambda+n^3) for the non-elitist (μ,λ) EA(\mu,\lambda)~\text{EA} with μ/λ1/e\mu/\lambda\leq 1/e. These results illustrate inherent limitations of univariate EDAs against deception and epistasis, which are common characteristics of real-world problems. In contrast, empirical evidence reveals the efficiency of the bi-variate MIMIC algorithm on the DLB problem. Our results suggest that one should consider EDAs with more complex probabilistic models when optimising problems with some degree of epistasis and deception.Comment: To appear in the 15th ACM/SIGEVO Workshop on Foundations of Genetic Algorithms (FOGA XV), Potsdam, German

    Upper Bounds on the Runtime of the Univariate Marginal Distribution Algorithm on OneMax

    Full text link
    A runtime analysis of the Univariate Marginal Distribution Algorithm (UMDA) is presented on the OneMax function for wide ranges of its parameters μ\mu and λ\lambda. If μclogn\mu\ge c\log n for some constant c>0c>0 and λ=(1+Θ(1))μ\lambda=(1+\Theta(1))\mu, a general bound O(μn)O(\mu n) on the expected runtime is obtained. This bound crucially assumes that all marginal probabilities of the algorithm are confined to the interval [1/n,11/n][1/n,1-1/n]. If μcnlogn\mu\ge c' \sqrt{n}\log n for a constant c>0c'>0 and λ=(1+Θ(1))μ\lambda=(1+\Theta(1))\mu, the behavior of the algorithm changes and the bound on the expected runtime becomes O(μn)O(\mu\sqrt{n}), which typically even holds if the borders on the marginal probabilities are omitted. The results supplement the recently derived lower bound Ω(μn+nlogn)\Omega(\mu\sqrt{n}+n\log n) by Krejca and Witt (FOGA 2017) and turn out as tight for the two very different values μ=clogn\mu=c\log n and μ=cnlogn\mu=c'\sqrt{n}\log n. They also improve the previously best known upper bound O(nlognloglogn)O(n\log n\log\log n) by Dang and Lehre (GECCO 2015).Comment: Version 4: added illustrations and experiments; improved presentation in Section 2.2; to appear in Algorithmica; the final publication is available at Springer via http://dx.doi.org/10.1007/s00453-018-0463-

    Bounds on Integrals with Respect to Multivariate Copulas

    Full text link
    Finding upper and lower bounds to integrals with respect to copulas is a quite prominent problem in applied probability. In their 2014 paper, Hofer and Iaco showed how particular two dimensional copulas are related to optimal solutions of the two dimensional assignment problem. Using this, they managed to approximate integrals with respect to two dimensional copulas. In this paper, we will further illuminate this connection, extend it to d-dimensional copulas and therefore generalize the method from Hofer and Iaco to arbitrary dimensions. We also provide convergence statements. As an example, we consider three dimensional dependence measures
    corecore