292 research outputs found

    Birth-death prior on phylogeny and speed dating

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or <it>iid </it>substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies.</p> <p>Results</p> <p>We demonstrate that a hill-climbing maximum <it>a posteriori </it>(MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant <it>rbcL </it>gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on.</p> <p>Conclusion</p> <p>Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.</p

    Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests

    Get PDF
    Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PRO-CARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.</p

    GenPhyloData: realistic simulation of gene family evolution

    Full text link

    Genetic loci on chromosome 5 are associated with circulating levels of interleukin-5 and eosinophil count in a European population with high risk for cardiovascular disease

    Get PDF
    IL-5 is a Th2 cytokine which activates eosinophils and is suggested to have an atheroprotective role. Genetic variants in the IL5 locus have been associated with increased risk of CAD and ischemic stroke. In this study we aimed to identify genetic variants associated with IL-5 concentrations and apply a Mendelian randomisation approach to assess IL-5 levels for causal effect on intima-media thickness in a European population at high risk of coronary artery disease. We analysed SNPs within robustly associated candidate loci for immune, inflammatory, metabolic and cardiovascular traits. We identified 2 genetic loci for IL-5 levels (chromosome 5, rs56183820, BETA = 0.11, P = 6.73E−5 and chromosome 14, rs4902762, BETA = 0.12, P = 5.76E−6) and one for eosinophil count (rs72797327, BETA = −0.10, P = 1.41E−6). Both chromosome 5 loci were in the vicinity of the IL5 gene, however the association with IL-5 levels failed to replicate in a meta-analysis of 2 independent cohorts (rs56183820, BETA = 0.04, P = 0.2763, I2 = 24, I2 − P = 0.2516). No significant associations were observed between SNPs associated with IL-5 levels or eosinophil count and IMT measures. Expression quantitative trait analyses indicate effects of the IL-5 and eosinophil-associated SNPs on RAD50 mRNA expression levels (rs12652920 (r2 = 0.93 with rs56183820) BETA = −0.10, P = 8.64E−6 and rs11739623 (r2 = 0.96 with rs72797327) BETA = −0.23, P = 1.74E−29, respectively). Our data do not support a role for IL-5 levels and eosinophil count in intima-media thickness, however SNPs associated with IL-5 and eosinophils might influence stability of the atherosclerotic plaque via modulation of RAD50 levels

    Causal relevance of blood lipid fractions in the development of carotid atherosclerosis: Mendelian randomization analysis.

    Get PDF
    BACKGROUND: Carotid intima-media thickness (CIMT), a subclinical measure of atherosclerosis, is associated with risk of coronary heart disease events. Statins reduce progression of CIMT and coronary heart disease risk in proportion to the reduction in low-density lipoprotein cholesterol. However, interventions targeting triglycerides (TGs) or high-density lipoprotein cholesterol (HDL-C) have produced inconsistent effects on CIMT and coronary heart disease risk, making it uncertain whether such agents are ineffective for coronary heart disease prevention or whether CIMT is an inadequate marker of HDL-C or TG-mediated effects. We aimed to determine the causal association among the 3 major blood lipid fractions and common CIMT using mendelian randomization analysis. METHODS AND RESULTS: Genetic scores specific for low-density lipoprotein cholesterol, HDL-C, and TGs were derived based on single nucleotide polymorphisms from a gene-centric array in ≈5000 individuals (Cardiochip scores) and from a genome-wide association meta-analysis in >100 000 individuals (Global Lipids Genetic Consortium scores). These were used as instruments in a mendelian randomization analysis in 2 prospective cohort studies. A genetically predicted 1 mmol/L higher low-density lipoprotein cholesterol concentration was associated with a higher common CIMT by 0.03 mm (95% confidence interval, 0.01-0.04) and 0.04 mm (95% confidence interval, 0.02-0.06) based on the Cardiochip and Global Lipids Genetic Consortium scores, respectively. HDL-C and TGs were not causally associated with CIMT. CONCLUSIONS: Our findings confirm a causal relationship between low-density lipoprotein cholesterol and CIMT but not with HDL-C and TGs. At present, the suitability of CIMT as a surrogate marker in trials of cardiovascular therapies targeting HDL-C and TGs is questionable and requires further study

    Phylogeny of the plant genus Pachypodium (Apocynaceae)

    Get PDF
    Background. The genus Pachypodium contains 21 species of succulent, generally spinescent shrubs and trees found in southern Africa and Madagascar. Pachypodium has diversified mostly into arid and semi-arid habitats of Madagascar, and has been cited as an example of a plant group that links the highly diverse arid-adapted floras of Africa and Madagascar. However, a lack of knowledge about phylogenetic relationships within the genus has prevented testing of this and other hypotheses about the group.Methodology/Principal Findings. We use DNA sequence data from the nuclear ribosomal ITS and chloroplast trnL-F region for all 21 Pachypodium species to reconstruct evolutionary relationships within the genus. We compare phylogenetic results to previous taxonomic classifications and geography. Results support three infrageneric taxa from the most recent classification of Pachypodium, and suggest that a group of African species (P. namaquanum, P. succulentum and P. bispinosum) may deserve taxonomic recognition as an infrageneric taxon. However, our results do not resolve relationships among major African and Malagasy lineages of the genus.Conclusions/Significance. We present the first molecular phylogenetic analysis of Pachypodium. Our work has revealed five distinct lineages, most of which correspond to groups recognized in past taxonomic classifications. Our work also suggests that there is a complex biogeographic relationship between Pachypodium of Africa and Madagascar

    Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss

    Get PDF
    Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234
    • …
    corecore