195 research outputs found

    A primer for microbiome time-series analysis

    Get PDF
    © The Author(s), 2020. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Coenen, A. R., Hu, S. K., Luo, E., Muratore, D., & Weitz, J. S. A primer for microbiome time-series analysis. Frontiers in Genetics, 11, (2020): 310, doi:10.3389/fgene.2020.00310.Time-series can provide critical insights into the structure and function of microbial communities. The analysis of temporal data warrants statistical considerations, distinct from comparative microbiome studies, to address ecological questions. This primer identifies unique challenges and approaches for analyzing microbiome time-series. In doing so, we focus on (1) identifying compositionally similar samples, (2) inferring putative interactions among populations, and (3) detecting periodic signals. We connect theory, code and data via a series of hands-on modules with a motivating biological question centered on marine microbial ecology. The topics of the modules include characterizing shifts in community structure and activity, identifying expression levels with a diel periodic signal, and identifying putative interactions within a complex community. Modules are presented as self-contained, open-access, interactive tutorials in R and Matlab. Throughout, we highlight statistical considerations for dealing with autocorrelated and compositional data, with an eye to improving the robustness of inferences from microbiome time-series. In doing so, we hope that this primer helps to broaden the use of time-series analytic methods within the microbial ecology research community.This work was supported by the Simons Foundation (SCOPE award ID 329108) and the National Science Foundation (NSF Bio Oc 1829636)

    Neighborhood VAR: Efficient estimation of multivariate timeseries with neighborhood information

    Full text link
    In data science, vector autoregression (VAR) models are popular in modeling multivariate time series in the environmental sciences and other applications. However, these models are computationally complex with the number of parameters scaling quadratically with the number of time series. In this work, we propose a so-called neighborhood vector autoregression (NVAR) model to efficiently analyze large-dimensional multivariate time series. We assume that the time series have underlying neighborhood relationships, e.g., spatial or network, among them based on the inherent setting of the problem. When this neighborhood information is available or can be summarized using a distance matrix, we demonstrate that our proposed NVAR method provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study

    Systems Biology of the human microbiome

    Get PDF
    © The Author(s), 2017. This is the author's version of the work and is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Current Opinion in Biotechnology 51 (2018): 146-153, doi:10.1016/j.copbio.2018.01.018.Recent research has shown that the microbiome—a collection of microorganisms, including bacteria, fungi, and viruses, living on and in a host—are of extraordinary importance in human health, even from conception and development in the uterus. Therefore, to further our ability to diagnose disease, to predict treatment outcomes, and to identify novel therapeutics, it is essential to include microbiome and microbial metabolic biomarkers in Systems Biology investigations. In clinical studies or, more precisely, Systems Medicine approaches, we can use the diversity and individual characteristics of the personal microbiome to enhance our resolution for patient stratification. In this review, we explore several Systems Medicine approaches, including Microbiome Wide Association Studies to understand the role of the human microbiome in health and disease, with a focus on ‘preventive medicine’ or P4 (i.e., personalized, predictive, preventive, participatory) medicine.BPB is funded by the Arnold and Mabel Beckman Foundation (Arnold O. Beckman Postdoctoral Fellow)2019-02-1

    A non-linear Granger-causality framework to investigate climate-vegetation dynamics

    Get PDF
    Satellite Earth observation has led to the creation of global climate data records of many important environmental and climatic variables. These come in the form of multivariate time series with different spatial and temporal resolutions. Data of this kind provide new means to further unravel the influence of climate on vegetation dynamics. However, as advocated in this article, commonly used statistical methods are often too simplistic to represent complex climate-vegetation relationships due to linearity assumptions. Therefore, as an extension of linear Granger-causality analysis, we present a novel non-linear framework consisting of several components, such as data collection from various databases, time series decomposition techniques, feature construction methods, and predictive modelling by means of random forests. Experimental results on global data sets indicate that, with this framework, it is possible to detect non-linear patterns that are much less visible with traditional Granger-causality methods. In addition, we discuss extensive experimental results that highlight the importance of considering non-linear aspects of climate-vegetation dynamics

    Do large-scale associations in birds imply biotic interactions or environmental filtering?

    Get PDF
    Aim There has been a wide interest in the effect of biotic interactions on species' occurrences and abundances at large spatial scales, coupled with a vast development of the statistical methods to study them. Still, evidence for whether the effects of within-trophic-level biotic interactions (e.g. competition and heterospecific attraction) are discernible beyond local scales remains inconsistent. Here, we present a novel hypothesis-testing framework based on joint dynamic species distribution models and functional trait similarity to dissect between environmental filtering and biotic interactions. Location France and Finland. Taxon Birds. Methods We estimated species-to-species associations within a trophic level, independent of the main environmental variables (mean temperature and total precipitation) for common species at large spatial scale with joint dynamic species distribution (a multivariate spatiotemporal delta model) models. We created hypotheses based on species' functionality (morphological and/or diet dissimilarity) and habitat preferences about the sign and strength of the pairwise spatiotemporal associations to estimate the extent to which they result from biotic interactions (competition, heterospecific attraction) and/or environmental filtering. Results Spatiotemporal associations were mostly positive (80%), followed by random (15%), and only 5% were negative. Where detected, negative spatiotemporal associations in different communities were due to a few species. The relationship between spatiotemporal association and functional dissimilarity among species was negative, which fulfils the predictions of both environmental filtering and heterospecific attraction. Main conclusions We showed that processes leading to species aggregation (mixture between environmental filtering and heterospecific attraction) seem to dominate assembly rules, and we did not find evidence for competition. Altogether, our hypothesis-testing framework based on joint dynamic species distribution models and functional trait similarity is beneficial in ecological interpretation of species-to-species associations from data covering several decades and biogeographical regions.Peer reviewe

    Data-driven causal analysis of observational biological time series

    Get PDF
    Complex systems are challenging to understand, especially when they defy manipulative experiments for practical or ethical reasons. Several fields have developed parallel approaches to infer causal relations from observational time series. Yet, these methods are easy to misunderstand and often controversial. Here, we provide an accessible and critical review of three statistical causal discovery approaches (pairwise correlation, Granger causality, and state space reconstruction), using examples inspired by ecological processes. For each approach, we ask what it tests for, what causal statement it might imply, and when it could lead us astray. We devise new ways of visualizing key concepts, describe some novel pathologies of existing methods, and point out how so-called ‘model-free’ causality tests are not assumption-free. We hope that our synthesis will facilitate thoughtful application of methods, promote communication across different fields, and encourage explicit statements of assumptions. A video walkthrough is available (Video 1 or https://youtu.be/AIV0ttQrjK8)

    Complex models for genetic sequence data

    Get PDF
    PhD ThesisIn this thesis, the aim is to develop biologically motivated Bayesian models in two areas: molecular phylogenetics and time-series metagenomics. In molecular phylogenetics, the goal is generally to learn about the evolutionary history of a collection of species using molecular sequence data, for example, DNA. Evolutionary history is represented graphically using evolutionary trees, where the root of a tree represents the most recent common ancestor of all species in the tree. Substitutions in sequences are modelled through a continuous time Markov process, characterised by an instantaneous rate matrix, which standard models assume is stationary and time-reversible. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree’s root position. This is detrimental to inference, since a tree’s biological interpretation depends on where it is rooted. By relaxing both assumptions, we introduce two new models whose likelihoods can distinguish between rooted trees. These models are non-stationary, with step changes in the rate matrix on each branch. Each rate matrix belongs to a non-reversible family of Lie Markov models, which are closed under matrix multiplication. The two models differ in that a different non-reversible Lie Markov model is used in each. We perform our analysis in the Bayesian framework using Markov chain Monte Carlo methods. We assess the performance of our models using a simulation study, before considering an application to a Drosophila data set, where most models fail to identify a plausible root position. In time-series metagenomics, counts of operational taxonomic units (OTUs), which are pragmatic proxies for microbial species, are modelled over time. We have weekly counts of different OTUs from two tanks in a wastewater treatment plant. We develop a Bayesian hierarchical vector autoregressive model to model the dynamics of the OTUs, whilst also incorporating environmental and chemical data. Clustering methods are explored to reduce the dimensionality of our data and mitigate the issue of large proportions of zero-counts in the data. We use a seasonal phase-based clustering approach and a symmetric, circulant, tri-diagonal error structure. The autoregressive coefficient matrix is assumed to be sparse, so we explore different priors that allow for sparsity by analysing simulated data sets before selecting the regularised horseshoe prior for our hierarchical model. The chemical and environmental covariates are incorporated through a time varying mean. Finally, we fit the model to the data from each tank using Hamiltonian Monte Carlo

    Inferring ecological interactions from dynamics in phage-bacteria communities

    Get PDF
    Characterizing how viruses interact with microbial hosts is critical to understanding microbial community structure and function. However, existing methods for quantifying bacteria-phage interactions are not widely applicable to natural communities. First, many bacteria are not culturable, preventing direct experimental testing. Second, “-omics” based methods, while high in accuracy and specificity, have been shown to be extremely low in power. Third, inference methods based on time-series or co-occurrence data, while promising, have for the most part not been rigorously tested. This thesis work focuses on this final category of quantification strategies: inference methods. In this thesis, we further our understanding of both the potential and limitations of several inference methods, focusing primarily on time-series data with high time resolution. We emphasize the quantification of efficacy by using time-series data from multi-strain bacteria-phage communities with known infection networks. We employ both in silico simulated bacteria-phage communities as well as an in vitro community experiment. We review existing correlation-based inference methods, extend theory and characterize tradeoffs for model-based inference which uses convex optimization, characterize pairwise interactions in a 5x5 virus-microbe community experiment using Markov chain Monte Carlo, and present analytic tools for microbiome time-series analysis when a dynamical model is unknown. Together, these chapters bridge gaps in existing literature in inference of ecological interactions from time-series data.Ph.D
    • 

    corecore