4,683 research outputs found

    Incorporating ancestors' influence in genetic algorithms

    Get PDF
    A new criterion of fitness evaluation for Genetic Algorithms is introduced where the fitness value of an individual is determined by considering its own fitness as well as those of its ancestors. Some guidelines for selecting the weighting coefficients for quantifying the importance to be given to the fitness of the individual and its ancestors are provided. This is done both heuristically and automatically under fixed and adaptive frameworks. The Schema Theorem corresponding to the proposed concept is derived. The effectiveness of this new methodology is demonstrated extensively on the problems of optimizing complex functions including a noisy one and selecting optimal neural network parameters

    Scalability of Genetic Programming and Probabilistic Incremental Program Evolution

    Full text link
    This paper discusses scalability of standard genetic programming (GP) and the probabilistic incremental program evolution (PIPE). To investigate the need for both effective mixing and linkage learning, two test problems are considered: ORDER problem, which is rather easy for any recombination-based GP, and TRAP or the deceptive trap problem, which requires the algorithm to learn interactions among subsets of terminals. The scalability results show that both GP and PIPE scale up polynomially with problem size on the simple ORDER problem, but they both scale up exponentially on the deceptive problem. This indicates that while standard recombination is sufficient when no interactions need to be considered, for some problems linkage learning is necessary. These results are in agreement with the lessons learned in the domain of binary-string genetic algorithms (GAs). Furthermore, the paper investigates the effects of introducing utnnecessary and irrelevant primitives on the performance of GP and PIPE.Comment: Submitted to GECCO-200

    Making and breaking power laws in evolutionary algorithm population dynamics

    Get PDF
    Deepening our understanding of the characteristics and behaviors of population-based search algorithms remains an important ongoing challenge in Evolutionary Computation. To date however, most studies of Evolutionary Algorithms have only been able to take place within tightly restricted experimental conditions. For instance, many analytical methods can only be applied to canonical algorithmic forms or can only evaluate evolution over simple test functions. Analysis of EA behavior under more complex conditions is needed to broaden our understanding of this population-based search process. This paper presents an approach to analyzing EA behavior that can be applied to a diverse range of algorithm designs and environmental conditions. The approach is based on evaluating an individual’s impact on population dynamics using metrics derived from genealogical graphs.\ud From experiments conducted over a broad range of conditions, some important conclusions are drawn in this study. First, it is determined that very few individuals in an EA population have a significant influence on future population dynamics with the impact size fitting a power law distribution. The power law distribution indicates there is a non-negligible probability that single individuals will dominate the entire population, irrespective of population size. Two EA design features are however found to cause strong changes to this aspect of EA behavior: i) the population topology and ii) the introduction of completely new individuals. If the EA population topology has a long path length or if new (i.e. historically uncoupled) individuals are continually inserted into the population, then power law deviations are observed for large impact sizes. It is concluded that such EA designs can not be dominated by a small number of individuals and hence should theoretically be capable of exhibiting higher degrees of parallel search behavior

    Computing Individual Risks based on Family History in Genetic Disease in the Presence of Competing Risks

    Full text link
    When considering a genetic disease with variable age at onset (ex: diabetes , familial amyloid neuropathy, cancers, etc.), computing the individual risk of the disease based on family history (FH) is of critical interest both for clinicians and patients. Such a risk is very challenging to compute because: 1) the genotype X of the individual of interest is in general unknown; 2) the posterior distribution P(X|FH, T > t) changes with t (T is the age at disease onset for the targeted individual); 3) the competing risk of death is not negligible. In this work, we present a modeling of this problem using a Bayesian network mixed with (right-censored) survival outcomes where hazard rates only depend on the genotype of each individual. We explain how belief propagation can be used to obtain posterior distribution of genotypes given the FH, and how to obtain a time-dependent posterior hazard rate for any individual in the pedigree. Finally, we use this posterior hazard rate to compute individual risk, with or without the competing risk of death. Our method is illustrated using the Claus-Easton model for breast cancer (BC). This model assumes an autosomal dominant genetic risk factor such as non-carriers (genotype 00) have a BC hazard rate λ\lambda 0 (t) while carriers (genotypes 01, 10 and 11) have a (much greater) hazard rate λ\lambda 1 (t). Both hazard rates are assumed to be piecewise constant with known values (cuts at 20, 30,. .. , 80 years). The competing risk of death is derived from the national French registry

    Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

    Full text link
    We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose a tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Healthcare-associated outbreak of meticillin-resistant Staphylococcus aureus bacteraemia: role of a cryptic variant of an epidemic clone

    Get PDF
    BACKGROUND New strains of meticillin-resistant Staphylococcus aureus (MRSA) may be associated with changes in rates of disease or clinical presentation. Conventional typing techniques may not detect new clonal variants that underlie changes in epidemiology or clinical phenotype. AIM To investigate the role of clonal variants of MRSA in an outbreak of MRSA bacteraemia at a hospital in England. METHODS Bacteraemia isolates of the major UK lineages (EMRSA-15 and -16) from before and after the outbreak were analysed by whole-genome sequencing in the context of epidemiological and clinical data. For comparison, EMRSA-15 and -16 isolates from another hospital in England were sequenced. A clonal variant of EMRSA-16 was identified at the outbreak hospital and a molecular signature test designed to distinguish variant isolates among further EMRSA-16 strains. FINDINGS By whole-genome sequencing, EMRSA-16 isolates during the outbreak showed strikingly low genetic diversity (P < 1 × 10(-6), Monte Carlo test), compared with EMRSA-15 and EMRSA-16 isolates from before the outbreak or the comparator hospital, demonstrating the emergence of a clonal variant. The variant was indistinguishable from the ancestral strain by conventional typing. This clonal variant accounted for 64/72 (89%) of EMRSA-16 bacteraemia isolates at the outbreak hospital from 2006. CONCLUSIONS Evolutionary changes in epidemic MRSA strains not detected by conventional typing may be associated with changes in disease epidemiology. Rapid and affordable technologies for whole-genome sequencing are becoming available with the potential to identify and track the emergence of variants of highly clonal organisms

    Bayesian total evidence dating reveals the recent crown radiation of penguins

    Get PDF
    The total-evidence approach to divergence-time dating uses molecular and morphological data from extant and fossil species to infer phylogenetic relationships, species divergence times, and macroevolutionary parameters in a single coherent framework. Current model-based implementations of this approach lack an appropriate model for the tree describing the diversification and fossilization process and can produce estimates that lead to erroneous conclusions. We address this shortcoming by providing a total-evidence method implemented in a Bayesian framework. This approach uses a mechanistic tree prior to describe the underlying diversification process that generated the tree of extant and fossil taxa. Previous attempts to apply the total-evidence approach have used tree priors that do not account for the possibility that fossil samples may be direct ancestors of other samples. The fossilized birth-death (FBD) process explicitly models the diversification, fossilization, and sampling processes and naturally allows for sampled ancestors. This model was recently applied to estimate divergence times based on molecular data and fossil occurrence dates. We incorporate the FBD model and a model of morphological trait evolution into a Bayesian total-evidence approach to dating species phylogenies. We apply this method to extant and fossil penguins and show that the modern penguins radiated much more recently than has been previously estimated, with the basal divergence in the crown clade occurring at ~12.7 Ma and most splits leading to extant species occurring in the last 2 million years. Our results demonstrate that including stem-fossil diversity can greatly improve the estimates of the divergence times of crown taxa. The method is available in BEAST2 (v. 2.4) www.beast2.org with packages SA (v. at least 1.1.4) and morph-models (v. at least 1.0.4).Comment: 50 pages, 6 figure
    • …
    corecore