44 research outputs found

    Modelling gene content across a phylogeny to determine when genes become associated

    Full text link
    In this work, we develop a stochastic model of gene gain and loss with the aim of inferring when (if at all) in evolutionary history and association between two genes arises. The data we consider is a species tree along with information on the presence or absence of two genes in each of the species. The biological motivation for our model is that if two genes are involved in the same biochemical pathway, i.e. they are both required for some function, then the rate of gain or loss of one gene in the pathway should depend upon the presence or absence of the other gene in the pathway. However, if the two genes are not functionally linked, then the rate of gain or loss of one gene should be independent of the state of another gene. We simulate data under this model to determine under what conditions a shift from the independent rates class to the dependent rates class can be detected. For example, how large a tree is required and how large a shift in the rates is needed before Akaike information criterion (AIC) supports a model with two rate classes over a simpler model with just one rate class? If a model with two rate classes is preferred, can it correctly detect where on the evolutionary tree the shift occurred?Comment: The Eleventh International Conference on Matrix-Analytic Methods in Stochastic Models (MAM11), 2022, Seoul, Republic of Kore

    Stochastic niche-based models for the evolution of species

    Full text link
    There have been many studies to examine whether one trait is correlated with another trait across a group of present-day species (for example, do species with larger brains tend to have longer gestation times. Since the introduction of the phylogenetic comparative method some authors have argued that it is necessary to have a biologically realistic model to generate evolutionary trees that incorporates information about the ecological niche occupied by species. Price presented a simple model along these lines in 1997. He defined a two-dimensional niche space formed by two continuous-valued traits, in which new niches arise with trait values drawn from a bivariate normal distribution. When a new niche arises, it is occupied by a descendant species of whichever current species is closest in ecological niche space. In sequence, more species are then evolved from already-existing species to which they are ecologically closest. Here we explore ways of extending Price's adaptive radiation model. One extension is to increase the dimensionality of the niche space by considering more than two continuous traits. A second extension is to allow both extinction of species (which may leave unoccupied niches) and removal of niches (which causes species occupying them to go extinct). To model this problem, we consider a continuous-time stochastic process which implicitly defines a phylogeny. To explore if trees generated under such a model (or under different parametrizations of the model) are realistic we can compute a variety of summary statistics that can be compared to those of empirically observed phylogenies. For example, there are existing statistics that aim to measure: tree balance, the relative rate of diversification, and phylogenetic signal of traits.Comment: The Eleventh International Conference on Matrix-Analytic Methods in Stochastic Models (MAM11), 2022, Seoul, Republic of Kore

    A Discontinuous Galerkin Method for Approximating the Stationary Distribution of Stochastic Fluid-Fluid Processes

    Get PDF
    The stochastic fluid-fluid model (SFFM) is a Markov process {(Xt,Yt,φt),t≥0}, where {φt,t≥0} is a continuous-time Markov chain, the first fluid, {Xt,t≥0}, is a classical stochastic fluid process driven by {φt,t≥0}, and the second fluid, {Yt,t≥0}, is driven by the pair {(Xt,φt),t≥0}. Operator-analytic expressions for the stationary distribution of the SFFM, in terms of the infinitesimal generator of the process {(Xt,φt),t≥0}, are known. However, these operator-analytic expressions do not lend themselves to direct computation. In this paper the discontinuous Galerkin (DG) method is used to construct approximations to these operators, in the form of finite dimensional matrices, to enable computation. The DG approximations are used to construct approximations to the stationary distribution of the SFFM, and results are verified by simulation. The numerics demonstrate that the DG scheme can have a superior rate of convergence compared to other methods

    A Stochastic Fluid Model Approach to the Stationary Distribution of the Maximum Priority Process

    Full text link
    In traditional priority queues, we assume that every customer upon arrival has a fixed, class-dependent priority, and that a customer may not commence service if a customer with a higher priority is present in the queue. However, in situations where a performance target in terms of the tails of the class-dependent waiting time distributions has to be met, such models of priority queueing may not be satisfactory. In fact, there could be situations where high priority classes easily meet their performance target for the maximum waiting time, while lower classes do not. Here, we are interested in the stationary distribution at the times of commencement of service of this maximum priority process. Until now, there has been no explicit expression for this distribution. We construct a mapping of the maximum priority process to a tandem fluid queue, which enables us to find expressions for this stationary distribution. We derive the results for the stationary distribution of the maximum priority process at the times of the commencement of service.Comment: The Eleventh International Conference on Matrix-Analytic Methods in Stochastic Models (MAM11), 2022, Seoul, Republic of Kore

    Matrix-analytic methods for the evolution of species trees, gene trees, and their reconciliation

    Full text link
    We consider the reconciliation problem, in which the task is to find a mapping of a gene tree into a species tree, so as to maximize the likelihood of such fitting, given the available data. We describe a model for the evolution of the species tree, a subfunctionalisation model for the evolution of the gene tree, and provide an algorithm to compute the likelihood of the reconciliation. We derive our results using the theory of matrix-analytic methods and describe efficient algorithms for the computation of a range of useful metrics. We illustrate the theory with examples and provide the physical interpretations of the discussed quantities, with a focus on the practical applications of the theory to incomplete data

    Models for the retention of duplicate genes and their biological underpinnings [version 2; peer review: 2 approved]

    Get PDF
    Gene content in genomes changes through several different processes, with gene duplication being an important contributor to such changes. Gene duplication occurs over a range of scales from individual genes to whole genomes, and the dynamics of this process can be context dependent. Still, there are rules by which genes are retained or lost from genomes after duplication, and probabilistic modeling has enabled characterization of these rules, including their context-dependence. Here, we describe the biology and corresponding mathematical models that are used to understand duplicate gene retention and its contribution to the set of biochemical functions encoded in a genome

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio

    Contributions of mean and shape of blood pressure distribution to worldwide trends and variations in raised blood pressure: A pooled analysis of 1018 population-based measurement studies with 88.6 million participants

    Get PDF
    © The Author(s) 2018. Background: Change in the prevalence of raised blood pressure could be due to both shifts in the entire distribution of blood pressure (representing the combined effects of public health interventions and secular trends) and changes in its high-blood-pressure tail (representing successful clinical interventions to control blood pressure in the hypertensive population). Our aim was to quantify the contributions of these two phenomena to the worldwide trends in the prevalence of raised blood pressure. Methods: We pooled 1018 population-based studies with blood pressure measurements on 88.6 million participants from 1985 to 2016. We first calculated mean systolic blood pressure (SBP), mean diastolic blood pressure (DBP) and prevalence of raised blood pressure by sex and 10-year age group from 20-29 years to 70-79 years in each study, taking into account complex survey design and survey sample weights, where relevant. We used a linear mixed effect model to quantify the association between (probittransformed) prevalence of raised blood pressure and age-group- and sex-specific mean blood pressure. We calculated the contributions of change in mean SBP and DBP, and of change in the prevalence-mean association, to the change in prevalence of raised blood pressure. Results: In 2005-16, at the same level of population mean SBP and DBP, men and women in South Asia and in Central Asia, the Middle East and North Africa would have the highest prevalence of raised blood pressure, and men and women in the highincome Asia Pacific and high-income Western regions would have the lowest. In most region-sex-age groups where the prevalence of raised blood pressure declined, one half or more of the decline was due to the decline in mean blood pressure. Where prevalence of raised blood pressure has increased, the change was entirely driven by increasing mean blood pressure, offset partly by the change in the prevalence-mean association. Conclusions: Change in mean blood pressure is the main driver of the worldwide change in the prevalence of raised blood pressure, but change in the high-blood-pressure tail of the distribution has also contributed to the change in prevalence, especially in older age groups

    Rising rural body-mass index is the main driver of the global obesity epidemic in adults

    Get PDF
    Body-mass index (BMI) has increased steadily in most countries in parallel with a rise in the proportion of the population who live in cities(.)(1,2) This has led to a widely reported view that urbanization is one of the most important drivers of the global rise in obesity(3-6). Here we use 2,009 population-based studies, with measurements of height and weight in more than 112 million adults, to report national, regional and global trends in mean BMI segregated by place of residence (a rural or urban area) from 1985 to 2017. We show that, contrary to the dominant paradigm, more than 55% of the global rise in mean BMI from 1985 to 2017-and more than 80% in some low- and middle-income regions-was due to increases in BMI in rural areas. This large contribution stems from the fact that, with the exception of women in sub-Saharan Africa, BMI is increasing at the same rate or faster in rural areas than in cities in low- and middle-income regions. These trends have in turn resulted in a closing-and in some countries reversal-of the gap in BMI between urban and rural areas in low- and middle-income countries, especially for women. In high-income and industrialized countries, we noted a persistently higher rural BMI, especially for women. There is an urgent need for an integrated approach to rural nutrition that enhances financial and physical access to healthy foods, to avoid replacing the rural undernutrition disadvantage in poor countries with a more general malnutrition disadvantage that entails excessive consumption of low-quality calories.Peer reviewe
    corecore