784 research outputs found

    Improving data extraction methods for large molecular biology datasets.

    Get PDF
    In the past, an experiment involving a pair wise comparison normally involved one or a few dependant variables. Now, 1000s of dependent variables can be measured simultaneously in a single experiment, be it detecting genes via a microarray experiment, sequencing genomes, or detecting microbial species based on DNA fragments using molecular techniques. How we analyze such large collections of data will be a major scientific focus over the next decade. Statistical methods that were once acceptable for comparing a few conditions are being revised to handle 1000?s of experiments. Molecular biology techniques that explored 1 gene or species have evolved and are now capable of generating complex datasets requiring new strategies and ways of thinking in order to discover biologically meaningful results. The central theme of this dissertation is to develop strategies that deal with a number of issues that are present in these large scale datasets. In chapter 1, I describe a microarray analytical method that can be applied to low replicate experiments. In chapter?s 2-4, the focus is how to best analyze data from ARISA (a PCR based molecular method for rapidly generating a finger print of microbial diversity). Chapter 2 focuses on qualifying ARISA data so that data will best represent its biological source, prior to further analysis. Chapter 3 focuses on how to best compare ARISA profiles to one another. Chapter 4 focuses on developing a software tool that implements the data processing and clustering strategies from chapter?s 2 and 3. The findings described herein provide the scientific community with improved analytical strategies in both the microarray and ARISA research areas

    Bacterial Community Reconstruction Using A Single Sequencing Reaction

    Full text link
    Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. While current methods enable in-depth study of a small number of communities, a simple tool for breadth studies of bacterial population composition in a large number of samples is lacking. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. This method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of the known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture. Using simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA gene sequence may provide enough information for reconstruction of mixtures containing tens of species, out of tens of thousands, even in the presence of realistic measurement noise. Finally, we show initial promising results when applying our method for the reconstruction of a toy experimental mixture with five species. Our approach may have a potential for a practical and efficient way for identifying bacterial species compositions in biological samples.Comment: 28 pages, 12 figure

    Multivariate Analysis in Metabolomics

    Get PDF
    Metabolomics aims to provide a global snapshot of all small-molecule metabolites in cells and biological fluids, free of observational biases inherent to more focused studies of metabolism. However, the staggeringly high information content of such global analyses introduces a challenge of its own; efficiently forming biologically relevant conclusions from any given metabolomics dataset indeed requires specialized forms of data analysis. One approach to finding meaning in metabolomics datasets involves multivariate analysis (MVA) methods such as principal component analysis (PCA) and partial least squares projection to latent structures (PLS), where spectral features contributing most to variation or separation are identified for further analysis. However, as with any mathematical treatment, these methods are not a panacea; this review discusses the use of multivariate analysis for metabolomics, as well as common pitfalls and misconceptions

    Microbial eukaryotic distributions and diversity patterns in a deep-sea methane seep ecosystem

    Get PDF
    Although chemosynthetic ecosystems are known to support diverse assemblages of microorganisms, the ecological and environmental factors that structure microbial eukaryotes (heterotrophic protists and fungi) are poorly characterized. In this study, we examined the geographic, geochemical and ecological factors that influence microbial eukaryotic composition and distribution patterns within Hydrate Ridge, a methane seep ecosystem off the coast of Oregon using a combination of high-throughput 18S rRNA tag sequencing, terminal restriction fragment length polymorphism fingerprinting, and cloning and sequencing of full-length 18S rRNA genes. Microbial eukaryotic composition and diversity varied as a function of substrate (carbonate versus sediment), activity (low activity versus active seep sites), sulfide concentration, and region (North versus South Hydrate Ridge). Sulfide concentration was correlated with changes in microbial eukaryotic composition and richness. This work also revealed the influence of oxygen content in the overlying water column and water depth on microbial eukaryotic composition and diversity, and identified distinct patterns from those previously observed for bacteria, archaea and macrofauna in methane seep ecosystems. Characterizing the structure of microbial eukaryotic communities in response to environmental variability is a key step towards understanding if and how microbial eukaryotes influence seep ecosystem structure and function

    Computational analysis of microbial flow cytometry data

    Get PDF
    Flow cytometry is an important technology for the study of microbial communities. It grants the ability to rapidly generate phenotypic single-cell data that are both quantitative, multivariate and of high temporal resolution. The complexity and amount of data necessitate an objective and streamlined data processing workflow that extends beyond commercial instrument software. No full overview of the necessary steps regarding the computational analysis of microbial flow cytometry data currently exists. In this review, we provide an overview of the full data analysis pipeline, ranging from measurement to data interpretation, tailored toward studies in microbial ecology. At every step, we highlight computational methods that are potentially useful, for which we provide a short nontechnical description. We place this overview in the context of a number of open challenges to the field and offer further motivation for the use of standardized flow cytometry in microbial ecology research

    Multivariate Analysis in Metabolomics

    Get PDF
    Metabolomics aims to provide a global snapshot of all small-molecule metabolites in cells and biological fluids, free of observational biases inherent to more focused studies of metabolism. However, the staggeringly high information content of such global analyses introduces a challenge of its own; efficiently forming biologically relevant conclusions from any given metabolomics dataset indeed requires specialized forms of data analysis. One approach to finding meaning in metabolomics datasets involves multivariate analysis (MVA) methods such as principal component analysis (PCA) and partial least squares projection to latent structures (PLS), where spectral features contributing most to variation or separation are identified for further analysis. However, as with any mathematical treatment, these methods are not a panacea; this review discusses the use of multivariate analysis for metabolomics, as well as common pitfalls and misconceptions

    Differences in Fecal Metabolite Profiles from Geographically Distinct Populations of Adolescents

    Get PDF
    Microbiota of the gastrointestinal tract have a variety of functions within the human body. They participate in protection of the host from pathogens, aid in immune system development and regulation, and carry out a variety of metabolic functions. This study focuses on the ability of gut microbiota to create metabolites through the degradation of food products. Using 1H NMR on fecal water extracts, I compared the metabolite profiles of two geographically distinct cohorts: healthy adolescents from Egypt (n=28) and healthy adolescents from the United States (n=14). Multivariate statistical analyses of binned NMR data confirmed that samples separated into groups corresponding to sample class. Quantification of metabolites revealed several metabolites that differed between groups. For example, levels of short chain fatty acids were higher in the Egyptian adolescents and most quantified amino acids were higher in the US adolescents. Multivariate statistical analyses of the quantified metabolite data showed separation based on the variability within the samples and placed samples into the correct class. Therefore, I concluded that fecal metabolite profiles differ between Egyptian and United States adolescents, and that these differences in metabolite levels may be linked to dietary differences between these two studied cohorts

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table
    • …
    corecore