3,895 research outputs found

    APPLICATIONS OF MACHINE LEARNING IN MICROBIAL FORENSICS

    Get PDF
    Microbial ecosystems are complex, with hundreds of members interacting with each other and the environment. The intricate and hidden behaviors underlying these interactions make research questions challenging – but can be better understood through machine learning. However, most machine learning that is used in microbiome work is a black box form of investigation, where accurate predictions can be made, but the inner logic behind what is driving prediction is hidden behind nontransparent layers of complexity. Accordingly, the goal of this dissertation is to provide an interpretable and in-depth machine learning approach to investigate microbial biogeography and to use micro-organisms as novel tools to detect geospatial location and object provenance (previous known origin). These contributions follow with a framework that allows extraction of interpretable metrics and actionable insights from microbiome-based machine learning models. The first part of this work provides an overview of machine learning in the context of microbial ecology, human microbiome studies and environmental monitoring – outlining common practice and shortcomings. The second part of this work demonstrates a field study to demonstrate how machine learning can be used to characterize patterns in microbial biogeography globally – using microbes from ports located around the world. The third part of this work studies the persistence and stability of natural microbial communities from the environment that have colonized objects (vessels) and stay attached as they travel through the water. Finally, the last part of this dissertation provides a robust framework for investigating the microbiome. This framework provides a reasonable understanding of the data being used in microbiome-based machine learning and allows researchers to better apprehend and interpret results. Together, these extensive experiments assist an understanding of how to carry an in-silico design that characterizes candidate microbial biomarkers from real world settings to a rapid, field deployable diagnostic assay. The work presented here provides evidence for the use of microbial forensics as a toolkit to expand our basic understanding of microbial biogeography, microbial community stability and persistence in complex systems, and the ability of machine learning to be applied to downstream molecular detection platforms for rapid and accurate detection

    Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring

    Get PDF
    Advances in nucleic acid sequencing technology have enabled expansion of our ability to profile microbial diversity. These large datasets of taxonomic and functional diversity are key to better understanding microbial ecology. Machine learning has proven to be a useful approach for analyzing microbial community data and making predictions about outcomes including human and environmental health. Machine learning applied to microbial community profiles has been used to predict disease states in human health, environmental quality and presence of contamination in the environment, and as trace evidence in forensics. Machine learning has appeal as a powerful tool that can provide deep insights into microbial communities and identify patterns in microbial community data. However, often machine learning models can be used as black boxes to predict a specific outcome, with little understanding of how the models arrived at predictions. Complex machine learning algorithms often may value higher accuracy and performance at the sacrifice of interpretability. In order to leverage machine learning into more translational research related to the microbiome and strengthen our ability to extract meaningful biological information, it is important for models to be interpretable. Here we review current trends in machine learning applications in microbial ecology as well as some of the important challenges and opportunities for more broad application of machine learning to understanding microbial communities

    Oak forest carbon and water simulations:Model intercomparisons and evaluations against independent data

    Get PDF
    Models represent our primary method for integration of small-scale, process-level phenomena into a comprehensive description of forest-stand or ecosystem function. They also represent a key method for testing hypotheses about the response of forest ecosystems to multiple changing environmental conditions. This paper describes the evaluation of 13 stand-level models varying in their spatial, mechanistic, and temporal complexity for their ability to capture intra- and interannual components of the water and carbon cycle for an upland, oak-dominated forest of eastern Tennessee. Comparisons between model simulations and observations were conducted for hourly, daily, and annual time steps. Data for the comparisons were obtained from a wide range of methods including: eddy covariance, sapflow, chamber-based soil respiration, biometric estimates of stand-level net primary production and growth, and soil water content by time or frequency domain reflectometry. Response surfaces of carbon and water flux as a function of environmental drivers, and a variety of goodness-of-fit statistics (bias, absolute bias, and model efficiency) were used to judge model performance. A single model did not consistently perform the best at all time steps or for all variables considered. Intermodel comparisons showed good agreement for water cycle fluxes, but considerable disagreement among models for predicted carbon fluxes. The mean of all model outputs, however, was nearly always the best fit to the observations. Not surprisingly, models missing key forest components or processes, such as roots or modeled soil water content, were unable to provide accurate predictions of ecosystem responses to short-term drought phenomenon. Nevertheless, an inability to correctly capture short-term physiological processes under drought was not necessarily an indicator of poor annual water and carbon budget simulations. This is possible because droughts in the subject ecosystem were of short duration and therefore had a small cumulative impact. Models using hourly time steps and detailed mechanistic processes, and having a realistic spatial representation of the forest ecosystem provided the best predictions of observed data. Predictive ability of all models deteriorated under drought conditions, suggesting that further work is needed to evaluate and improve ecosystem model performance under unusual conditions, such as drought, that are a common focus of environmental change discussions

    Modelling plant trait variability in changing arid environments

    Get PDF
    Modellierung der VariabilitĂ€t von Pflanzen-Traits auf Populations- und Lebensgemeinschaftsebene in ariden Gebieten mit UmweltverĂ€nderungen. Lebensgemeinschaften in ariden Gebieten sind angesichts globaler UmweltverĂ€nderungen besonders anfĂ€llig, da sie höchst unvorhersagbaren Umweltbedingungen ausgesetzt sind. Das Schicksal von Gemeinschaften in einer ungewissen Zukunft kann durch das VerstĂ€ndnis der TriebkrĂ€fte dieser Gemeinschaften aufgeklĂ€rt werden. Das Zusammenspiel der TriebkrĂ€fte der Gemeinschaften kann mit Hilfe von AnsĂ€tzen entschlĂŒsselt werden, die auf funktionalen Merkmalen (Traits) basieren, weil sie Pflanzenstrategien und die Reaktionen der Gemeinschaften auf UmweltverĂ€nderungen beschreiben können. DarĂŒber hinaus liefert die inter- und intraspezifische VariabilitĂ€t der Traits die notwendigen Anhaltspunkte fĂŒr die Identifizierung von Überlebensstrategien von WĂŒstenpflanzen unter wechselhaften Umweltbedingungen. Die Erforschung von WĂŒstenpflanzengemeinschaften könnte jedoch aufgrund der rĂ€umlichen und zeitlichen HeterogenitĂ€t der ariden Umweltbedingungen eine Herausforderung darstellen. ModellierungsansĂ€tze unterstĂŒtzen und ergĂ€nzen empirische, trait-basierte AnsĂ€tze bei der Erforschung von WĂŒstenpflanzengemeinschaften und ihrer TriebkrĂ€fte und Dynamik in sich verĂ€ndernden ariden Gebieten. Das Gesamtziel dieser Arbeit war es, die intra- und interspezifische VariabilitĂ€t der funktionalen Traits in ariden Umgebungen zu erforschen und zu untersuchen, wie sich diese VariabilitĂ€t auf die FĂ€higkeit von Pflanzen auswirkt, Trockenstress zu tolerieren und in der Konkurrenz mit ihren Nachbarn erfolgreich zu sein. Um dieses Ziel zu erreichen, habe ich ein rĂ€umlich-explizites individuen- und trait-basiertes Simulationsmodell entwickelt, implementiert und analysiert, ein Simulationsexperiment durchgefĂŒhrt, Daten aus empirischen Experimenten analysiert und einen Überblick der Literatur zu trait-basierten Modellen und MetamodellierungsansĂ€tzen zusammengestellt. Meine Forschung basiert auf Daten zu annuellen Pflanzengemeinschaften in der WĂŒste Negev in Israel, die von der Echte Rose von Jericho (Anastatica hierochuntica) dominiert werden. Die Literaturzusammenschau in Kapitel 1 offenbart, dass trait-basierte Modelle eine geeignete Methode sind, um VerĂ€nderungen in den Mustern von Gemeinschaften unter globalen VerĂ€nderungen vorherzusagen und die zugrunde liegenden Mechanismen der Zusammensetzung und Dynamik von Lebensgemeinschaften zu verstehen. Durch die Kombination von Modellierung und trait-basierten AnsĂ€tzen lassen sich technische Herausforderungen, Skalierungsprobleme und Datenknappheit ĂŒberwinden. Insbesondere wurde eine Kombination aus trait-basierten AnsĂ€tzen und individuenbasierter Modellierung empfohlen, um die Parametrisierung der Modelle zu vereinfachen, Interaktionen zwischen Pflanzen auf individueller Ebene zu erfassen und die Gemeinschaftsdynamik zu erklĂ€ren. Eine Forderung aus Kapitel 1 umsetzend wurde in Kapitel 2 das rĂ€umlich-explizite, trait- und individuenbasierte ATID-Modell entwickelt, implementiert und analysiert, um zu untersuchen, wie Gemeinschaftsdynamiken aus Pflanzentraits und Interaktionen von Pflanzen untereinander und mit ihrer Umwelt entstehen. Die SensitivitĂ€tsanalyse des Modells hob die funktionalen Traits von Pflanzen als SchlĂŒsselfaktoren der Gemeinschaftsdynamik hervor, wobei den Umweltfaktoren im Modell eine relativ geringere Bedeutung zugewiesen wurde. Die sensitivitĂ€tverursachenden Traits umfassten sowohl solche Traits, die an den Pflanze-Pflanze-Interaktionen beteiligt waren, wie zum Beispiel die relative Wachstumsrate und maximale Biomasse, als auch solche, die die Toleranz gegenĂŒber abiotischem Stress fördern, wie die Keimruhe und Keimungswahrscheinlichkeit. Unter den Umweltfaktoren waren die VerfĂŒgbarkeit von Bodenwasser und Niederschlag die einflussreichsten Faktoren. Die besondere Rolle von funktionalen Traits in der Gemeinschaftsdynamik einjĂ€hriger WĂŒstenpflanzen zeigt die Bedeutung trait-basierter Strategien als Anpassung an die harschen Bedingungen in ariden Gebieten. Kapitel 3 befasst sich mit den Ergebnissen eines Simulationsexperiments, das mit dem ATID-Modell durchgefĂŒhrt wurde. Dieses Experiment untersuchte den Einfluss funktionaler Traits auf die Gemeinschaftsdynamik, die bei zwei Überlebensstrategien eine Rolle spielen, die in der Studie in einem neuen Strategiekonzept als "Schutz-Konkurrenz"- und "Flucht-Kolonisierungs"-Strategien definiert wurden. Diese Strategien unterschieden sich nicht nur in der SamengrĂ¶ĂŸe und der Anzahl der Samen, sondern auch in bestimmten Pflanzentraits, die mit Konkurrenz und Überleben zusammenhĂ€ngen und die in der SensitivitĂ€tsanalyse des Modells aus Kapitel 2 hervorgehoben worden waren. Die Integration der Konzepte des Kolonisierung-Konkurrenz-Trade-offs und des Entkommens in Zeit und Raum in einem neuen Strategiekonzept ergab eine realistischere Darstellung der Arten, da die integrierten Strategien den gesamten Lebenszyklus der Pflanze berĂŒcksichtigen. Um ein besseres VerstĂ€ndnis empirischer Trait-Verteilungen zu erlangen, wurden in Kapitel 4 Daten zur intraspezifischen TraitvariabilitĂ€t und zu Trait-RĂ€umen der annuellen WĂŒstenpflanze A. hierochutica aus einem GewĂ€chshausversuch analysiert. Hohe Salzkonzentrationen hatten signifikante Auswirkungen auf die Durchschnittswerte der funktionalen Traits der Pflanzen. ZusĂ€tzlich beeinflusste Salzstress die intraspezifischen Trait-RĂ€ume unterschiedlich in Bezug auf die Umweltbedingungen des Ursprungsortes der Pflanzen. Die Trait-RĂ€ume der Populationen, die vom gleichen Standort stammten, aber unterschiedlichen Salzstress-Niveaus ausgesetzt waren, wurden mit zunehmender AriditĂ€t unĂ€hnlicher. Daher erwiesen sich die intraspezifische Trait-VariabilitĂ€t und die Salzeffekte als wesentlich fĂŒr die Aufdeckung von Prozessen auf Populations- und Lebensgemeinschaftsebene in WĂŒsten und sollten in zukĂŒnftigen Versionen des ATID-Modells berĂŒcksichtigt werden. Zur UnterstĂŒtzung der zukĂŒnftigen Entwicklung des in Kapitel 2 entwickelten ATID-Modells wurden in Kapitel 5 Metamodelltypen und ihre Anwendungsbereiche in der individuenbasierten Modellierung ĂŒberprĂŒft und bewertet. Die ÜberprĂŒfung berĂŒcksichtigte 40 Metamodelle, die fĂŒr die SensitivitĂ€tsanalyse, Kalibrierung, Vorhersage und Skalierung von individuenbasierten Modellen eingesetzt werden können und als Leitfaden fĂŒr die Implementierung und Validierung von Metamodellen dienen können. Insgesamt beleuchtet diese Arbeit und insbesondere die Analysen des ATID-Modells, wie trait-basierte ModellierungsansĂ€tze zum VerstĂ€ndnis des Zusammenspiels der SchlĂŒsseltriebkrĂ€fte von WĂŒstenpflanzengemeinschaften in ariden Umgebungen beitragen können. Die begleitende Analyse des GewĂ€chshausexperiments und die kritischen LiteraturĂŒbersichten dienen als Grundlage fĂŒr zukĂŒnftige Erweiterungen des Modells und die in dieser Arbeit identifizierten Wege zur Überwindung technischer Herausforderungen und Datenknappheit. DarĂŒber hinaus empfiehlt diese Dissertation eine intensivere Untersuchung der Strategien annueller WĂŒstenpflanzen fĂŒr das Überleben unter zeitlich und rĂ€umlich heterogenen Umweltbedingungen mit besonderem Schwerpunkt auf funktionalen Pflanzen-Traits. Somit bietet das in dieser Arbeit vorgestellte Grundmodell die Basis fĂŒr zukĂŒnftige Forschungen ĂŒber das Schicksal von Lebensgemeinschaften in ariden Gebieten unter dem Einfluss globaler UmweltverĂ€nderungen.Communities in arid environments are especially vulnerable to global change because they experience highly unpredictable environmental conditions. The fate of communities in an uncertain future may be elucidated by understanding the drivers of these communities. The interplay between community drivers may be unravelled by using approaches based on functional traits because traits describe plant strategies and the responses of communities to environmental changes. Furthermore, inter- and intraspecific trait variability provides the necessary cues to identify survival strategies of desert plants under fluctuating environmental conditions. However, studying desert plant communities is challenging due to the spatial and temporal heterogeneity of arid environments. Modelling approaches support and complement empirical trait-based approaches in exploring desert plant communities and their drivers and dynamics in changing arid environments. The overarching aim of this thesis was to explore intra- and inter-specific variability of functional traits in arid environments and to investigate how this variability affects the ability of plants to tolerate aridity stress and succeed in competition with their neighbours. To address this aim, I developed, implemented and analysed a spatially explicit individual- and trait-based simulation model, conducted a simulation experiment, analysed data from model simulations and empirical experiments and synthesized the literature on trait-based models and metamodelling approaches. My research was focused on annual plant communities dominated by the True Rose of Jericho (Anastatica hierochuntica L.) in the Negev desert in Israel. According to the review in chapter 1, trait-based models are a suitable method to predict changes in community patterns under global change and to understand the underlying mechanisms of community assembly and dynamics. Combining modelling and trait-based approaches overcomes technical challenges, scaling problems, and data scarcity. Specifically, a combination of trait-based approaches and individual-based modelling was recommended to simplify the parameterization of models and to capture plant-plant interactions at the individual level, and to explain community dynamics. In chapter 2, in line with the major claim of chapter 1, the spatially explicit trait- and individual-based ATID-model was developed, implemented and analysed to explore how community dynamics arise from plant traits and the interactions among plants and with their environment. The sensitivity analysis of the model highlighted plant functional traits as key drivers of community dynamics and indicated that environmental factors were less important in the model. The outlined traits included both those traits that are involved in plant-plant interactions, such as relative growth rate and maximum biomass, and those that promote tolerance to abiotic stress, such as dormancy and germination probability. Among the environmental factors, the most influential factors were soil water availability and precipitation. The special role of functional traits in the community dynamics of desert annual plants indicates the importance of trait-based strategies as an adaptation to the stressful arid environment. Chapter 3 addresses the results from a simulation experiment that was conducted in the ATID-model. This experiment explored the influence of functional traits involved in two survival strategies defined in the study as ‘protective-competition’ and ‘escape-colonization’ strategies on community dynamics. These strategies differed not only in seed size and the number of seeds, but also in the plant functional traits related to competition and survival, which were highlighted in the sensitivity analysis of the model from chapter 2. Merging the colonization-competition trade-off with escape in time and space into one strategy set provided a more realistic representation of species because the merged strategies related to the entire plant life cycle. To gain more understanding on empirical trait distributions, in chapter 4 data on intraspecific trait variability and trait spaces of the desert annual plant A. hierochutica from a nethouse experiment were analysed. High salinity had significant effects on the average values of plant functional traits. Additionally, salinity stress affected the intraspecific trait spaces differentially with respect to the environmental conditions of the site of origin. Trait spaces of the populations originating from the same site but exposed to different salt stress levels became more dissimilar with increasing environmental aridity. Thus, intraspecific trait variability and salinity effects turned out to be essential in revealing population- and community-level processes in deserts and should be considered in future versions of the ATID-model. In support of the future development of the ATID-model developed in chapter 2, common metamodel types and the purposes of their usage for individual-based models were reviewed and evaluated in chapter 5. The review considered 40 metamodels applied for sensitivity analysis, calibration, prediction and scaling-up of individual-based models and can be used as a guide for the implementation and validation of metamodels. Overall, this thesis, and particularly the ATID-model analyses, highlights how trait-based modelling approaches can contribute to understanding the interplay between key drivers of desert plant communities in arid environments. The accompanying analysis of the nethouse experiment and critical literature reviews outline future extensions of the model and the ways to overcome the technical challenges and data scarcity identified in this thesis. Moreover, this thesis advocates for more intensive studies of the strategies of desert annual plants to survive in temporally and spatially heterogeneous environments with a focus on plant functional traits. Thus, the modelling framework presented in this thesis provides the basis for future research on the fate of communities in arid environments under global change

    Control and optimization methods in biomedical systems: from cells to humans

    Full text link
    Optimization and control theory are well developed techniques to quantize, model, understand and optimize real world systems and they have been widely used in engineering, economics, and science. In this thesis, we focus on applications in biomedical systems ranging from cells to microbial communities, and to something as complex as the human body. The first problem we consider is that of medication dosage control for drugs delivered intravenously to the patient. We focus specifically on a blood thinner (called bivalirudin) used in the post cardiac surgery Intensive Care Unit (ICU). We develop two approaches (a model-free and a model-based one) that predict the effect of bivalirudin. After obtaining the model and its best fit parameters by solving a non-linear optimization problem, we develop automatic dosage controllers that adaptively regulate its effect to desired levels. Our algorithms are validated using actual data from a large hospital in the Boston area. In the second problem, we introduce a cellular objective function inference mechanism in metabolic networks. We develop an inverse optimization method, called InvFBA (Inverse Flux Balance Analysis), to infer the objective functions of growing cells by using their reaction fluxes. InvFBA can be seen as an inverse version of FBA (Flux Balance Analysis) which predicts the distribution of the cell's reaction fluxes by using a hypothetical objective function. The objective functions can be linear, quadratic and non-parametric. The efficiency of the InvFBA approach matches the structure of the FBA and ensures scalability to large networks and optimality of the solution. After testing our algorithm on simulated E. coli data and time-dependent S. oneidensis fluxes inferred from gene expression data, we apply our inverse approach to flux measurements in long-term evolved E. coli strains, revealing objective functions that provide insight into metabolic adaptation trajectories. In the final problem in this thesis, we formulate a novel resource allocation problem in microbial ecosystems. We consider a given number of microbial species living symbiotically in a community and a list of all metabolic reactions present in the community, expressed in terms of the metabolite proportions involved in each reaction. We are interested in allocating reactions to organisms so that each organism maintains a minimal level of growth and the community optimizes certain objectives, such as maximizing growth and/or the uptake of specific compounds from the common environment. We leverage tools from Flux Balance Analysis (FBA) and formulate the problem as a mixed integer linear programming problem. We test our method in a toy model involving two organisms that can only survive through cross-feeding, demonstrating that the method can recover this interaction. We also test the method in a community of two simplified bacteria described in terms of their core, simplified metabolic network. We demonstrate that the method can obtain syntrophic cross-feeding species that would be very difficult to design manually

    Temporal and Causal Inference with Longitudinal Multi-omics Microbiome Data

    Get PDF
    Microbiomes are communities of microbes inhabiting an environmental niche. Thanks to next generation sequencing technologies, it is now possible to study microbial communities, their impact on the host environment, and their role in specific diseases and health. Technology has also triggered the increased generation of multi-omics microbiome data, including metatranscriptomics (quantitative survey of the complete metatranscriptome of the microbial community), metabolomics (quantitative profile of the entire set of metabolites present in the microbiome\u27s environmental niche), and host transcriptomics (gene expression profile of the host). Consequently, another major challenge in microbiome data analysis is the integration of multi-omics data sets and the construction of unified models. Finally, since microbiomes are inherently dynamic, to fully understand the complex interactions that take place within these communities, longitudinal studies are critical. Although the analysis of longitudinal microbiome data has been attempted, these approaches do not attempt to probe interactions between taxa, do not offer holistic analyses, and do not investigate causal relationships. In this work we propose approaches to address all of the above challenges. We propose novel analysis pipelines to analyze multi-omic longitudinal microbiome data, and to infer temporal and causal relationships between the different entities involved. As a first step, we showed how to deal with longitudinal metagenomic data sets by building a pipeline, PRIMAL, which takes microbial abundance data as input and outputs a dynamic Bayesian network model that is highly predictive, suggests significant interactions between the different microbes, and proposes important connections from clinical variables. A significant innovation of our work is its ability to deal with differential rates of the internal biological processes in different individuals. Second, we showed how to analyze longitudinal multi-omic microbiome datasets. Our pipeline, PALM, significantly extends the previous state of the art by allowing for the integration of longitudinal metatranscriptomics, host transcriptomics, and metabolomics data in additional to longitudinal metagenomics data. PALM achieves prediction powers comparable to the PRIMAL pipeline while discovering a web of interactions between the entities of far greater complexity. An important innovation of PALM is the use of a multi-omic Skeleton framework that incorporates prior knowledge in the learning of the models. Another major innovation of this work is devising a suite of validation methods, both in silico and in vitro, enhancing the utility and validity of PALM. Finally, we propose a suite of novel methods (unrolling and de-confounding), called METALICA, consisting of tools and techniques that make it possible to uncover significant details about the nature of microbial interactions. We also show methods to validate such interactions using ground truth databases. The proposed methods were tested using an IBD multi-omics dataset
    • 

    corecore