200 research outputs found

    On the limitations of the univariate marginal distribution algorithm to deception and where bivariate EDAs might help

    Get PDF
    We introduce a new benchmark problem called Deceptive Leading Blocks (DLB) to rigorously study the runtime of the Univariate Marginal Distribution Algorithm (UMDA) in the presence of epistasis and deception. We show that simple Evolutionary Algorithms (EAs) outperform the UMDA unless the selective pressure ÎŒ/λ\mu/\lambda is extremely high, where ÎŒ\mu and λ\lambda are the parent and offspring population sizes, respectively. More precisely, we show that the UMDA with a parent population size of ÎŒ=Ω(log⁥n)\mu=\Omega(\log n) has an expected runtime of eΩ(ÎŒ)e^{\Omega(\mu)} on the DLB problem assuming any selective pressure Όλ≄141000\frac{\mu}{\lambda} \geq \frac{14}{1000}, as opposed to the expected runtime of O(nλlog⁥λ+n3)\mathcal{O}(n\lambda\log \lambda+n^3) for the non-elitist (ÎŒ,λ) EA(\mu,\lambda)~\text{EA} with ÎŒ/λ≀1/e\mu/\lambda\leq 1/e. These results illustrate inherent limitations of univariate EDAs against deception and epistasis, which are common characteristics of real-world problems. In contrast, empirical evidence reveals the efficiency of the bi-variate MIMIC algorithm on the DLB problem. Our results suggest that one should consider EDAs with more complex probabilistic models when optimising problems with some degree of epistasis and deception.Comment: To appear in the 15th ACM/SIGEVO Workshop on Foundations of Genetic Algorithms (FOGA XV), Potsdam, German

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    Genetic Information in Agricultural Productivity and Product Development

    Get PDF
    A prominent facet of recent changes in agriculture has been the advent of precision breeding techniques. Another has been an increase in the level of information inputs and outputs associated with agricultural production. This paper identifies ways in which these features may complement in expanding the variety of processed products, the level of productivity, and the rate of change in productivity. Using a martingale concept of ĂŻÂŸâ€œmore information,ĂŻÂŸâ€ we identify conditions under which more information increases the incentives to invest and engage in product differentiation. A theory on how genetic uniformity can enhance the rate of learning through process experimentation, and so the rate of technical change, is also developed.experimentation, genetics, information, martingale, sorting, uniformity, value added.

    Simulation-based Bayesian inference for multi-fingered robotic grasping

    Full text link
    Multi-fingered robotic grasping is an undeniable stepping stone to universal picking and dexterous manipulation. Yet, multi-fingered grippers remain challenging to control because of their rich nonsmooth contact dynamics or because of sensor noise. In this work, we aim to plan hand configurations by performing Bayesian posterior inference through the full stochastic forward simulation of the robot in its environment, hence robustly accounting for many of the uncertainties in the system. While previous methods either relied on simplified surrogates of the likelihood function or attempted to learn to directly predict maximum likelihood estimates, we bring a novel simulation-based approach for full Bayesian inference based on a deep neural network surrogate of the likelihood-to-evidence ratio. Hand configurations are found by directly optimizing through the resulting amortized and differentiable expression for the posterior. The geometry of the configuration space is accounted for by proposing a Riemannian manifold optimization procedure through the neural posterior. Simulation and physical benchmarks demonstrate the high success rate of the procedure

    A multiple-phenotype imputation method for genetic studies

    Get PDF
    Genetic association studies have yielded a wealth of biologic discoveries. However, these have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of these datasets. Joint genotypephenotype analyses of complex, high-dimensional datasets represent an important way to move beyond simple GWAS with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. In this paper we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real datasets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of associatio
    • 

    corecore