188,968 research outputs found

    Genetic programming: the ratio of crossover to mutation as a function of time

    Get PDF
    This article studies the sub-tree operators: mutation and crossover, within the context of Genetic Programming. Two standard problems, symbolic linear regression and a non-linear tree, were presented to the algorithm at each stage. The behaviour of the operators in regard to fitness is first established, followed by an analysis of the most optimal ratio between crossover and mutation. Subsequently, three algorithms are presented as candidates to dynamically learn the most optimal level of this ratio. The results of each algorithm are then compared to each other and the traditional constant ratio

    Simple Algorithms to Calculate Asymptotic Null Distributions of Robust Tests in Case-Control Genetic Association Studies in R

    Get PDF
    The case-control study is an important design for testing association between genetic markers and a disease. The Cochran-Armitage trend test (CATT) is one of the most commonly used statistics for the analysis of case-control genetic association studies. The asymptotically optimal CATT can be used when the underlying genetic model (mode of inheritance) is known. However, for most complex diseases, the underlying genetic models are unknown. Thus, tests robust to genetic model misspecification are preferable to the model-dependant CATT. Two robust tests, MAX3 and the genetic model selection (GMS), were recently proposed. Their asymptotic null distributions are often obtained by Monte-Carlo simulations, because they either have not been fully studied or involve multiple integrations. In this article, we study how components of each robust statistic are correlated, and find a linear dependence among the components. Using this new finding, we propose simple algorithms to calculate asymptotic null distributions for MAX3 and GMS, which greatly reduce the computing intensity. Furthermore, we have developed the R package Rassoc implementing the proposed algorithms to calculate the empirical and asymptotic p values for MAX3 and GMS as well as other commonly used tests in case-control association studies. For illustration, Rassoc is applied to the analysis of case-control data of 17 most significant SNPs reported in four genome-wide association studies.

    Experimentally Attainable Optimal Pulse Shapes Obtained with the Aid of Genetic Algorithms

    Full text link
    We propose a methodology to design optimal pulses for achieving quantum optimal control on molecular systems. Our approach constrains pulse shapes to linear combinations of a fixed number of experimentally relevant pulse functions. Quantum optimal control is obtained by maximizing a multi-target fitness function with genetic algorithms. As a first application of the methodology we generated an optimal pulse that successfully maximized the yield on a selected dissociation channel of a diatomic molecule. Our pulse is obtained as a linear combination of linearly chirped pulse functions. Data recorded along the evolution of the genetic algorithm contained important information regarding the interplay between radiative and diabatic processes. We performed a principal component analysis on these data to retrieve the most relevant processes along the optimal path. Our proposed methodology could be useful for performing quantum optimal control on more complex systems by employing a wider variety of pulse shape functions.Comment: 7 pages, 6 figure

    From Regular Expression Matching to Parsing

    Full text link
    Given a regular expression RR and a string QQ, the regular expression parsing problem is to determine if QQ matches RR and if so, determine how it matches, e.g., by a mapping of the characters of QQ to the characters in RR. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string QQ matches regular expression RR into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing

    Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression

    Full text link
    Developing mathematical models of dynamic systems is central to many disciplines of engineering and science. Models facilitate simulations, analysis of the system's behavior, decision making and design of automatic control algorithms. Even inherently model-free control techniques such as reinforcement learning (RL) have been shown to benefit from the use of models, typically learned online. Any model construction method must address the tradeoff between the accuracy of the model and its complexity, which is difficult to strike. In this paper, we propose to employ symbolic regression (SR) to construct parsimonious process models described by analytic equations. We have equipped our method with two different state-of-the-art SR algorithms which automatically search for equations that fit the measured data: Single Node Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In addition to the standard problem formulation in the state-space domain, we show how the method can also be applied to input-output models of the NARX (nonlinear autoregressive with exogenous input) type. We present the approach on three simulated examples with up to 14-dimensional state space: an inverted pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep neural networks and local linear regression shows that SR in most cases outperforms these commonly used alternative methods. We demonstrate on a real pendulum system that the analytic model found enables a RL controller to successfully perform the swing-up task, based on a model constructed from only 100 data samples

    Selection of Preprocessing Methodology for Multivariate Regression of Cellular FTIR and Raman Spectra in Radiobiological Analyses

    Get PDF
    Vibrational spectra of biological species suffer from the influence of many extraneous interfering factors that require removal through preprocessing before analysis. The present study was conducted to optimise the preprocessing methodology and variable subset selection during regression of and confocal Raman microspectroscopy (CRM) and Fourier Transform Infrared microspectroscopy (FTIRM) spectra against ionizing radiation dose. Skin cells were γ-irradiated in-vitro and their Raman and FTIRM spectra were used to retrospectively predict the radiation dose using linear and nonlinear partial least squares (PLS) regression algorithms in addition to support vector regression (SVR). The optimal preprocessing methodology (which comprised combinations of spectral filtering, baseline subtraction, scaling and normalization options) was selected using a genetic algorithm (GA) with the root mean squared error of prediction (RMSEP) used as the fitness criterion for selection of the preprocessing chromosome (where this was calculated on an independent set of test spectra randomly selected from the dataset on each pass of the algorithm). The results indicated that GA selection of the optimal preprocessing methodology substantially improved the predictive capacity of the regression algorithms over baseline methodologies, although the optimal preprocessing chromosomes were similar for various regression algorithms, suggesting an optimal preprocessing methodology for radiobiological analyses with biospectroscopy. Feature selection of both FTIRM and CRM spectra using genetic algorithms and multivariate regression provided further decreases in RMSEP, but only with non-linear multivariate regression algorithms
    corecore