952 research outputs found

    A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters

    Get PDF
    Biolog phenotype microarrays enable simultaneous, high throughput analysis of cell cultures in different environments. The output is high-density time-course data showing redox curves (approximating growth) for each experimental condition. The software provided with the Omnilog incubator/reader summarizes each time-course as a single datum, so most of the information is not used. However, the time courses can be extremely varied and often contain detailed qualitative (shape of curve) and quantitative (values of parameters) information. We present a novel, Bayesian approach to estimating parameters from Phenotype Microarray data, fitting growth models using Markov Chain Monte Carlo methods to enable high throughput estimation of important information, including length of lag phase, maximal ``growth'' rate and maximum output. We find that the Baranyi model for microbial growth is useful for fitting Biolog data. Moreover, we introduce a new growth model that allows for diauxic growth with a lag phase, which is particularly useful where Phenotype Microarrays have been applied to cells grown in complex mixtures of substrates, for example in industrial or biotechnological applications, such as worts in brewing. Our approach provides more useful information from Biolog data than existing, competing methods, and allows for valuable comparisons between data series and across different models

    Statistical analysis tools for metabolic and genomic bacterial data

    Get PDF
    This thesis introduces statistical analysis methods for two types of bacterial data: metabolic data produced by phenotype microarray technology, and genomic data produced by sequencing technologies. As both technologies produce vast amounts of data, as well as have special features, there is a need for bioinformatics tools that adequately process and analyze the information produced. Similar to all biomolecular data analyses, the interplay between biological components poses an additional challenge to the method development. A specific complication, regarding the metabolic data, is the lack of larger quantities of replicates due to the high expenses of performing the experiments. In terms of the sequence data, genome-wide analysis tools are desired, since such methods have not yet been widely developed for bacteria, even though they exist for eukaryotic genetics. The thesis briefly reviews the current methods, and introduces new approaches tackling the above mentioned problems.Tässä väitöskirjassa kehitetään uusia tilastollisia analysointimenetelmiä fenotyyppimikrosiru- sekä geenisekvenssidatalle, joista ensimmäinen kuvaa solujen aineenvaihdunnan aktiivisuutta ja jälkimmäinen avaa solun geneettisen koodin. Tilastollisia menetelmiä tarvitaan, kun kyseisillä mittaustekniikoilla tuotettua tietoa halutaan hyödyntää esimerkiksi lääketieteen tarpeisiin vaikkapa uusia hoitomuotoja kehitettäessä. Nykyaikaisille molekyylitason mittauslaitteille on ominaista, että ne tuottavat suuren määrän havaintoja. Lisäksi jokaiseen menetelmään liittyy omat erityispiirteensä, jotka on huomioitava dataa tulkittaessa. Esimerkiksi fenotyyppimikrosirudataa analysoitaessa on huomioitava datan moniulotteinen luonne: yhdellä kokeella voidaan tutkia tuhansia fenotyyppejä yli ajan. Tilastollisten menetelmien kehittämistä ja luotettavaa tilastollista testaamista vaikeuttavat lisäksi pienet toistomäärät sekä datan vähäinen saatavuus, mikä on puolestaan seurausta siitä, että fenotyyppimikrosiruteknologia on vielä melko tuntematon, vähän käytetty menetelmä, joka koetaan hankalaksi tulkita. Geenisekvenssejä analysoitaessa on puolestaan huomioitava esimerkiksi tutkittavan organismin erityispiirteet, sillä eri organismit poikkeavat toisistaan geneettisiltä ominaisuuksiltaan. Ihmisillä geneettisten ominaisuuksien yhteyttä moniin sairauksiin kuten syöpiin on tutkittu esimerkiksi koko genominlaajuisilla assosiaatioanalyysimenetelmillä. Tässä väitöskirjassa esittelemme bakteerien geenisekvenssien analysointia varten kehitetyn koko genominlaajuisen menetelmän, jolla voidaan esimerkiksi kartoittaa bakteerien antibioottiresistenssiin vaikuttavia geneettisiä tekijöitä

    Gene Regulatory Network Analysis and Web-based Application Development

    Get PDF
    Microarray data is a valuable source for gene regulatory network analysis. Using earthworm microarray data analysis as an example, this dissertation demonstrates that a bioinformatics-guided reverse engineering approach can be applied to analyze time-series data to uncover the underlying molecular mechanism. My network reconstruction results reinforce previous findings that certain neurotransmitter pathways are the target of two chemicals - carbaryl and RDX. This study also concludes that perturbations to these pathways by sublethal concentrations of these two chemicals were temporary, and earthworms were capable of fully recovering. Moreover, differential networks (DNs) analysis indicates that many pathways other than those related to synaptic and neuronal activities were altered during the exposure phase. A novel differential networks (DNs) approach is developed in this dissertation to connect pathway perturbation with toxicity threshold setting from Live Cell Array (LCA) data. Findings from this proof-of-concept study suggest that this DNs approach has a great potential to provide a novel and sensitive tool for threshold setting in chemical risk assessment. In addition, a web-based tool “Web-BLOM” was developed for the reconstruction of gene regulatory networks from time-series gene expression profiles including microarray and LCA data. This tool consists of several modular components: a database, the gene network reconstruction model and a user interface. The Bayesian Learning and Optimization Model (BLOM), originally implemented in MATLAB, was adopted by Web-BLOM to provide an online reconstruction of large-scale gene regulation networks. Compared to other network reconstruction models, BLOM can infer larger networks with compatible accuracy, identify hub genes and is much more computationally efficient

    Genetic algorithm based two-mode clustering of metabolomics data

    Get PDF
    Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources

    Integration of host, pathogen and microbiome -omics data for studying infectious diseases

    Get PDF
    In an ever-growing worldwide population, human infectious diseases are an increasingly serious problem for public health. In particular, more than a million deaths and millions of infectious disease cases per year caused by fungal pathogens have been reported globally in recent years. Hence, more investments must be put into fungal research to overcome the problem. The opportunistic pathogen Candida albicans and the airborne Aspergillus fumigatus are the two most prevalent fungal pathogens causing serious issues in medical care units. Despite the recent advances in fungal research, there is little knowledge about the role of fungal metabolism in developing the infection when coexisting within the human body with microbial community members in different organs. This dissertation applied computational tools, and implemented systems biology approaches to uncover key factors in the colonization of the pathogens, especially C. albicans and A. fumigatus, from a systems biology perspective and unseen by wet-lab experiments alone. Next to multi-omics data analysis, a major effort was put into genome-scale metabolic models (GEMs) generation and analysis as a promising approach to shed light on the role of metabolism in developing the infection. In brief, this thesis sheds light on key factors leading to the inhibition or promotion of fungal growth. This especially includes the first available GEM reconstruction of C. albicans to theoretically study the intricate interaction of the fungus with the human host and the microbial community members. Lastly, a platform of 252 A. fumigatus GEMs at the strain resolution was generated. It revealed the phenotypic diversity of A. fumigatus strains isolated from different hospitals and farms in Germany and explained the contribution of the fungus to the shaping of the metabolic landscape of the lung microbiome in a favorable manner for fungal growth

    Multivariable association discovery in population-scale meta-omics studies.

    Get PDF
    It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2\u27s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles

    Identifying Multiple Potential Metabolic Cycles in Time-Series from Biolog Experiments

    Get PDF
    Biolog Phenotype Microarray (PM) is a technology allowing simultaneous screening of the metabolic behaviour of bacteria under a large number of different conditions. Bacteria may often undergo several cycles of metabolic activity during a Biolog experiment. We introduce a novel algorithm to identify these metabolic cycles in PM experimental data, thus increasing the potential of PM technology in microbiology. Our method is based on a statistical decomposition of the time-series measurements into a set of growth models. We show that the method is robust to measurement noise and captures accurately the biologically relevant signals from the data. Our implementation is made freely available as a part of an R package for PM data analysis and can be found at www.helsinki.fi/bsg/software/Biolog_Decomposition.Peer reviewe

    Bayesian Network Modeling and Inference in Plant Gene Networks And Analysis of Sequencing and Imaging Data

    Get PDF
    Scientific and technological advancements over the years have made curing, preventing or managing all diseases, a goal that seems to be within reach. The approach to manipulating biological systems is multifaceted. This dissertation focuses on two problems that pose fundamental challenges in developing methods to control biological systems: the first is to model complex interactions in biological systems; the second is faithful representation and analysis of biological data obtained from scientific equipments. The first part of this dissertation is a discussion on modeling and inference in gene networks, and Bayesian inference. Then we describe the application of Bayesian network modeling to represent interactions among genes, and integrating gene expression data in order to identify potential points of intervention in the gene network. We conclude with a summary of evolving directions for modeling gene interactions. The second topic this dissertation focuses on is taming biological data to obtain actionable insights. We introduce the challenges in representation and analysis of high throughput sequencing data and proceeds to describe the analysis of imaging data in the dynamic environment of cancer cells. Then we discuss tackling the problem of analyzing high throughput RNA sequencing data in order to pinpoint genes that exhibit different behaviors under monitored experimental conditions. Then we address the interesting problem of deciphering and quantifying gene-level activity from epifluorescent imaging data
    • …
    corecore