Search CORE

2,129 research outputs found

Shrinkage Effect in Ancestral Maximum Likelihood

Author: Mossel Elchanan
Roch Sebastien
Steel Mike
Publication venue
Publication date: 01/01/2008
Field of study

Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been known to be statistically consistent -- that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can `shrink' short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa

arXiv.org e-Print Archive

CiteSeerX

UC Research Repository

ScholarlyCommons@Penn

Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach

Author: Coram Marc
Tang Hua
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 12/12/2007
Field of study

Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS121 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Survival analysis of DNA mutation motifs with penalized proportional hazards

Author: Feng Jean
Matsen IV Frederick A.
Minin Vladimir N.
Shaw David A.
Simon Noah
Publication venue
Publication date: 21/09/2018
Field of study

Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with "mutation motifs", which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an

\ell_1

-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates

arXiv.org e-Print Archive

eScholarship - University of California

FigShare

Recommended from our members

Evolution of the eyes of vipers with and without infrared-sensing pit organs

Author: Douglas R. H.
Gower D. J.
Grace M. S.
Hart N. S.
Hunt D. M.
Loew E. R.
McLamb W.
Orlov N.
Partridge J. C.
Peichl L.
Sampaio F. L.
Simoes B. F.
Wagner H. J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

We examined lens and brille transmittance, photoreceptors, visual pigments, and visual opsin gene sequences of viperid snakes with and without infrared-sensing pit organs. Ocular media transmittance is high in both groups. Contrary to previous reports, small as well as large single cones occur in pit vipers. Non-pit vipers differ from pit vipers in having a twotiered retina, but few taxa have been examined for this poorly understood feature. All vipers sampled express rh1, sws1 and lws visual opsin genes. Opsin spectral tuning varies but not in accordance with the presence/absence of pit organs, and not always as predicted from gene sequences. The visual opsin genes were generally under purifying selection, with positive selection at spectral tuning amino acids in RH1 and SWS1 opsins, and at retinal pocket stabilization sites in RH1 or LWS (and without substantial differences between pit and nonpit vipers). Lack of evidence for sensory trade-off between viperid eyes (in the aspects examined) and pit organs might be explained by the high degree of neural integration of vision and infrared detection; the latter representing an elaboration of an existing sense with addition of a novel sense organ, rather than involving the evolution of a wholly novel sensory system

City Research Online

Adelaide Research & Scholarship

Publikationsserver der Universität Tübingen

Plymouth Electronic Archive and Research Library

Explore Bristol Research

Inferring dynamic genetic networks with low order independencies

Author: Lèbre Sophie
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the CRAN archive

arXiv.org e-Print Archive

CiteSeerX

HAL Evry

HAL Descartes

Bayesian estimation of species divergence times using correlated quantitative characters

Author: dos Reis M
Goswami A
Yang Z
Álvarez-Carretero S
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2019
Field of study

Discrete morphological data have been widely used to study species evolution, but the use of quantitative (or continuous) morphological characters is less common. Here, we implement a Bayesian method to estimate species divergence times using quantitative characters. Quantitative character evolution is modelled using Brownian diffusion with character correlation and character variation within populations. Through simulations, we demonstrate that ignoring the population variation (or population “noise”) and the correlation among characters leads to biased estimates of divergence times and rate, especially if the correlation and population noise are high. We apply our new method to the analysis of quantitative characters (cranium landmarks) and molecular data from carnivoran mammals. Our results show that time estimates are affected by whether the correlations and population noise are accounted for or ignored in the analysis. The estimates are also affected by the type of data analysed, with analyses of morphological characters only, molecular data only, or a combination of both; showing noticeable differences among the time estimates. Rate variation of morphological characters among the carnivoran species appears to be very high, with Bayesian model selection indicating that the independent-rates model fits the morphological data better than the autocorrelated-rates model. We suggest that using morphological continuous characters, together with molecular data, can bring a new perspective to the study of species evolution. Our new model is implemented in the MCMCtree computer program for Bayesian inference of divergence times

UCL Discovery

Queen Mary Research Online

Polyploidy breaks speciation barriers in Australian burrowing frogs Neobatrachus

Author: Booker William
Brennan Ian G.
Donnellan Stephen C.
Doughty Paul
Keogh J. Scott
Lemmon Alan R.
Mahony Michael
Moriarty Lemmon Emily
Novikova Polina
Roberts J. Dale
Van de Peer Yves
Yant Levi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Polyploidy has played an important role in evolution across the tree of life but it is still unclear how polyploid lineages may persist after their initial formation. While both common and well-studied in plants, polyploidy is rare in animals and generally less understood. The Australian burrowing frog genus Neobatrachus is comprised of six diploid and three polyploid species and offers a powerful animal polyploid model system. We generated exome-capture sequence data from 87 individuals representing all nine species of Neobatrachus to investigate species-level relationships, the origin and inheritance mode of polyploid species, and the population genomic effects of polyploidy on genus-wide demography. We describe rapid speciation of diploid Neobatrachus species and show that the three independently originated polyploid species have tetrasomic or mixed inheritance. We document higher genetic diversity in tetraploids, resulting from widespread gene flow between the tetraploids, asymmetric inter-ploidy gene flow directed from sympatric diploids to tetraploids, and isolation of diploid species from each other. We also constructed models of ecologically suitable areas for each species to investigate the impact of climate on differing ploidy levels. These models suggest substantial change in suitable areas compared to past climate, which correspond to population genomic estimates of demographic histories. We propose that Neobatrachus diploids may be suffering the early genomic impacts of climate-induced habitat loss, while tetraploids appear to be avoiding this fate, possibly due to widespread gene flow. Finally, we demonstrate that Neobatrachus is an attractive model to study the effects of ploidy on the evolution of adaptation in animals

Repository@Nottingham

Ghent University Academic Bibliography

Directory of Open Access Journals

The Australian National University

UPSpace at the University of Pretoria

Bayesian Statistical Methods for Genetic Association Studies with Case-Control and Cohort Design

Author: Tachmazidou Ioanna
Tachmazidou Ioanna
Publication venue: Epidemiology and Public Health, Imperial College London
Publication date: 01/03/2009
Field of study

Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. We propose a coalescent-based model for association mapping which potentially increases the power to detect disease-susceptibility variants in genetic association studies with case-control and cohort design. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions and we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium (LD) therein assuming a perfect phylogeny. The haplotype space is then partitioned into disjoint clusters within which the phenotype-haplotype association is assumed to be the same. The novelty of our approach consists in the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common mutation. Our approach is fully Bayesian and we develop Markov Chain Monte Carlo algorithms to sample efficiently over the space of possible partitions. We have also developed a Bayesian survival regression model for high-dimension and small sample size settings. We provide a Bayesian variable selection procedure and shrinkage tool by imposing shrinkage priors on the regression coefficients. We have developed a computationally efficient optimization algorithm to explore the posterior surface and find the maximum a posteriori estimates of the regression coefficients. We compare the performance of the proposed methods in simulation studies and using real datasets to both single-marker analyses and recently proposed multi-marker methods and show that our methods perform similarly in localizing the causal allele while yielding lower false positive rates. Moreover, our methods offer computational advantages over other multi-marker approaches

Spiral - Imperial College Digital Repository