1,068 research outputs found
Population genetics models of local ancestry
Migrations have played an important role in shaping the genetic diversity of
human populations. Understanding genomic data thus requires careful modeling of
historical gene flow. Here we consider the effect of relatively recent
population structure and gene flow, and interpret genomes of individuals that
have ancestry from multiple source populations as mosaics of segments
originating from each population. We propose general and tractable models for
describing the evolution of these patterns of local ancestry and their impact
on genetic diversity. We focus on the length distribution of continuous
ancestry tracts, and the variance in total ancestry proportions among
individuals. The proposed models offer improved agreement with Wright-Fisher
simulation data when compared to state-of-the art models, and can be used to
infer various demographic parameters in gene flow models. Considering HapMap
African-American (ASW) data, we find that a model with two distinct phases of
`European' gene flow significantly improves the modeling of both tract lengths
and ancestry variances.Comment: 25 pages with 7 figures; Genetics: Published online before print
April 4, 201
Population Structure and Cryptic Relatedness in Genetic Association Studies
We review the problem of confounding in genetic association studies, which
arises principally because of population structure and cryptic relatedness.
Many treatments of the problem consider only a simple ``island'' model of
population structure. We take a broader approach, which views population
structure and cryptic relatedness as different aspects of a single confounder:
the unobserved pedigree defining the (often distant) relationships among the
study subjects. Kinship is therefore a central concept, and we review methods
of defining and estimating kinship coefficients, both pedigree-based and
marker-based. In this unified framework we review solutions to the problem of
population structure, including family-based study designs, genomic control,
structured association, regression control, principal components adjustment and
linear mixed models. The last solution makes the most explicit use of the
kinships among the study subjects, and has an established role in the analysis
of animal and plant breeding studies. Recent computational developments mean
that analyses of human genetic association data are beginning to benefit from
its powerful tests for association, which protect against population structure
and cryptic kinship, as well as intermediate levels of confounding by the
pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Data adaptive kernal discriminant analysis using information complexity criterion and genetic algorithm
This dissertation proposes a new hybrid approach which is computationally effective and easy-to-use for selecting the best subset of predictor variables in discriminant analysis under the assumption that data sets do not follow the normal distribution. Our approach incorporates the information-theoretic measure of complexity (ICOMP) criterion with the genetic algorithm and kernel density estimators in discriminant analysis. This approach enables researchers to find both the optimal bandwidth matrix for the kernel density estimate and the best model from several competing models, which was a severe obstacle for researchers to apply kernel density estimate for discriminant analysis. The proposed approach is applied to four real data sets and compared with linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k-Nearest Neighbor Discriminant Analysis (k-NNDA). Based on our application, we can conclude that our proposed approach performs better than LDA and QDA and performs as well as k-NNDA with respect to classification error rates. With our approach we can do all-possible-subset selection of variables for high-dimensional data to determine the best predictors discriminating between the groups
Substructural local search in discrete estimation of distribution algorithms
Tese dout., Engenharia Electrónica e Computação, Universidade do Algarve, 2009SFRH/BD/16980/2004The last decade has seen the rise and consolidation of a new trend of stochastic
optimizers known as estimation of distribution algorithms (EDAs). In essence, EDAs
build probabilistic models of promising solutions and sample from the corresponding
probability distributions to obtain new solutions. This approach has brought a new
view to evolutionary computation because, while solving a given problem with an
EDA, the user has access to a set of models that reveal probabilistic dependencies
between variables, an important source of information about the problem.
This dissertation proposes the integration of substructural local search (SLS)
in EDAs to speedup the convergence to optimal solutions. Substructural neighborhoods
are de ned by the structure of the probabilistic models used in EDAs,
generating adaptive neighborhoods capable of automatic discovery and exploitation
of problem regularities. Speci cally, the thesis focuses on the extended compact
genetic algorithm and the Bayesian optimization algorithm. The utility of SLS in
EDAs is investigated for a number of boundedly di cult problems with modularity,
overlapping, and hierarchy, while considering important aspects such as scaling
and noise. The results show that SLS can substantially reduce the number of function
evaluations required to solve some of these problems. More importantly, the
speedups obtained can scale up to the square root of the problem size O(
p
`).Fundação para a Ciência e Tecnologia (FCT
Author Obfuscation on Indonesian News Articles Using Genetic Algorithms
Authorship attribution is a method for identifying the author of a text from a group of potential authors and can solve the anonymity of unknown authors. Such method threatens anyone’s privacy, especially those who wish to write anonymously. To address this issue, author obfuscation is proposed to modify a text to disguise its author.In this research, a genetic algorithm-based author obfuscation model was created to modify Indonesian news articles to avoid identification from authorship attribution while keeping its semantics. The model iteratively changed some words in the article using crossover and mutation techniques guided by a fitness function which involve identification probability and similarity to the original article.The model is evaluated based on safety, soundness, and sensibleness parameter. The model has good safety since it can reduce the given authorship attribution model's accuracy by 0.3018 but drops to 0.1179 when tested on different models. Its soundness is pretty good since the similarity of the modified to the original articles reaches 0.7817. The model obtained a score of 2.571 on a scale of 0 to 4 in terms of sensibleness which indicates that some articles are acceptable in terms of grammar, but not a few are messy
Multiple-line inference of selection on quantitative traits
Trait differences between species may be attributable to natural selection.
However, quantifying the strength of evidence for selection acting on a
particular trait is a difficult task. Here we develop a population-genetic test
for selection acting on a quantitative trait which is based on multiple-line
crosses. We show that using multiple lines increases both the power and the
scope of selection inference. First, a test based on three or more lines
detects selection with strongly increased statistical significance, and we show
explicitly how the sensitivity of the test depends on the number of lines.
Second, a multiple-line test allows to distinguish different lineage-specific
selection scenarios. Our analytical results are complemented by extensive
numerical simulations. We then apply the multiple-line test to QTL data on
floral character traits in plant species of the Mimulus genus and on
photoperiodic traits in different maize strains, where we find a signatures of
lineage-specific selection not seen in a two-line test.Comment: 21 pages, 11 figures; to appear in Genetic
- …