1,068 research outputs found

    Population genetics models of local ancestry

    Full text link
    Migrations have played an important role in shaping the genetic diversity of human populations. Understanding genomic data thus requires careful modeling of historical gene flow. Here we consider the effect of relatively recent population structure and gene flow, and interpret genomes of individuals that have ancestry from multiple source populations as mosaics of segments originating from each population. We propose general and tractable models for describing the evolution of these patterns of local ancestry and their impact on genetic diversity. We focus on the length distribution of continuous ancestry tracts, and the variance in total ancestry proportions among individuals. The proposed models offer improved agreement with Wright-Fisher simulation data when compared to state-of-the art models, and can be used to infer various demographic parameters in gene flow models. Considering HapMap African-American (ASW) data, we find that a model with two distinct phases of `European' gene flow significantly improves the modeling of both tract lengths and ancestry variances.Comment: 25 pages with 7 figures; Genetics: Published online before print April 4, 201

    Population Structure and Cryptic Relatedness in Genetic Association Studies

    Get PDF
    We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple ``island'' model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Data adaptive kernal discriminant analysis using information complexity criterion and genetic algorithm

    Get PDF
    This dissertation proposes a new hybrid approach which is computationally effective and easy-to-use for selecting the best subset of predictor variables in discriminant analysis under the assumption that data sets do not follow the normal distribution. Our approach incorporates the information-theoretic measure of complexity (ICOMP) criterion with the genetic algorithm and kernel density estimators in discriminant analysis. This approach enables researchers to find both the optimal bandwidth matrix for the kernel density estimate and the best model from several competing models, which was a severe obstacle for researchers to apply kernel density estimate for discriminant analysis. The proposed approach is applied to four real data sets and compared with linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k-Nearest Neighbor Discriminant Analysis (k-NNDA). Based on our application, we can conclude that our proposed approach performs better than LDA and QDA and performs as well as k-NNDA with respect to classification error rates. With our approach we can do all-possible-subset selection of variables for high-dimensional data to determine the best predictors discriminating between the groups

    Substructural local search in discrete estimation of distribution algorithms

    Get PDF
    Tese dout., Engenharia Electrónica e Computação, Universidade do Algarve, 2009SFRH/BD/16980/2004The last decade has seen the rise and consolidation of a new trend of stochastic optimizers known as estimation of distribution algorithms (EDAs). In essence, EDAs build probabilistic models of promising solutions and sample from the corresponding probability distributions to obtain new solutions. This approach has brought a new view to evolutionary computation because, while solving a given problem with an EDA, the user has access to a set of models that reveal probabilistic dependencies between variables, an important source of information about the problem. This dissertation proposes the integration of substructural local search (SLS) in EDAs to speedup the convergence to optimal solutions. Substructural neighborhoods are de ned by the structure of the probabilistic models used in EDAs, generating adaptive neighborhoods capable of automatic discovery and exploitation of problem regularities. Speci cally, the thesis focuses on the extended compact genetic algorithm and the Bayesian optimization algorithm. The utility of SLS in EDAs is investigated for a number of boundedly di cult problems with modularity, overlapping, and hierarchy, while considering important aspects such as scaling and noise. The results show that SLS can substantially reduce the number of function evaluations required to solve some of these problems. More importantly, the speedups obtained can scale up to the square root of the problem size O( p `).Fundação para a Ciência e Tecnologia (FCT

    Author Obfuscation on Indonesian News Articles Using Genetic Algorithms

    Get PDF
    Authorship attribution is a method for identifying the author of a text from a group of potential authors and can solve the anonymity of unknown authors. Such method threatens anyone’s privacy, especially those who wish to write anonymously. To address this issue, author obfuscation is proposed to modify a text to disguise its author.In this research, a genetic algorithm-based author obfuscation model was created to modify Indonesian news articles to avoid identification from authorship attribution while keeping its semantics. The model iteratively changed some words in the article using crossover and mutation techniques guided by a fitness function which involve identification probability and similarity to the original article.The model is evaluated based on safety, soundness, and sensibleness parameter. The model has good safety since it can reduce the given authorship attribution model's accuracy by 0.3018 but drops to 0.1179 when tested on different models. Its soundness is pretty good since the similarity of the modified to the original articles reaches 0.7817. The model obtained a score of 2.571 on a scale of 0 to 4 in terms of sensibleness which indicates that some articles are acceptable in terms of grammar, but not a few are messy

    Multiple-line inference of selection on quantitative traits

    Full text link
    Trait differences between species may be attributable to natural selection. However, quantifying the strength of evidence for selection acting on a particular trait is a difficult task. Here we develop a population-genetic test for selection acting on a quantitative trait which is based on multiple-line crosses. We show that using multiple lines increases both the power and the scope of selection inference. First, a test based on three or more lines detects selection with strongly increased statistical significance, and we show explicitly how the sensitivity of the test depends on the number of lines. Second, a multiple-line test allows to distinguish different lineage-specific selection scenarios. Our analytical results are complemented by extensive numerical simulations. We then apply the multiple-line test to QTL data on floral character traits in plant species of the Mimulus genus and on photoperiodic traits in different maize strains, where we find a signatures of lineage-specific selection not seen in a two-line test.Comment: 21 pages, 11 figures; to appear in Genetic
    • …
    corecore