784 research outputs found
Improving data extraction methods for large molecular biology datasets.
In the past, an experiment involving a pair wise comparison normally involved one or a few dependant variables. Now, 1000s of dependent variables can be measured simultaneously in a single experiment, be it detecting genes via a microarray experiment, sequencing genomes, or detecting microbial species based on DNA fragments using molecular techniques. How we analyze such large collections of data will be a major scientific focus over the next decade. Statistical methods that were once acceptable for comparing a few conditions are being revised to handle 1000?s of experiments. Molecular biology techniques that explored 1 gene or species have evolved and are now capable of generating complex datasets requiring new strategies and ways of thinking in order to discover biologically meaningful results. The central theme of this dissertation is to develop strategies that deal with a number of issues that are present in these large scale datasets. In chapter 1, I describe a microarray analytical method that can be applied to low replicate experiments. In chapter?s 2-4, the focus is how to best analyze data from ARISA (a PCR based molecular method for rapidly generating a finger print of microbial diversity). Chapter 2 focuses on qualifying ARISA data so that data will best represent its biological source, prior to further analysis. Chapter 3 focuses on how to best compare ARISA profiles to one another. Chapter 4 focuses on developing a software tool that implements the data processing and clustering strategies from chapter?s 2 and 3. The findings described herein provide the scientific community with improved analytical strategies in both the microarray and ARISA research areas
Bacterial Community Reconstruction Using A Single Sequencing Reaction
Bacteria are the unseen majority on our planet, with millions of species and
comprising most of the living protoplasm. While current methods enable in-depth
study of a small number of communities, a simple tool for breadth studies of
bacterial population composition in a large number of samples is lacking. We
propose a novel approach for reconstruction of the composition of an unknown
mixture of bacteria using a single Sanger-sequencing reaction of the mixture.
This method is based on compressive sensing theory, which deals with
reconstruction of a sparse signal using a small number of measurements.
Utilizing the fact that in many cases each bacterial community is comprised of
a small subset of the known bacterial species, we show the feasibility of this
approach for determining the composition of a bacterial mixture. Using
simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA
gene sequence may provide enough information for reconstruction of mixtures
containing tens of species, out of tens of thousands, even in the presence of
realistic measurement noise. Finally, we show initial promising results when
applying our method for the reconstruction of a toy experimental mixture with
five species. Our approach may have a potential for a practical and efficient
way for identifying bacterial species compositions in biological samples.Comment: 28 pages, 12 figure
Multivariate Analysis in Metabolomics
Metabolomics aims to provide a global snapshot of all small-molecule metabolites in cells and biological fluids, free of observational biases inherent to more focused studies of metabolism. However, the staggeringly high information content of such global analyses introduces a challenge of its own; efficiently forming biologically relevant conclusions from any given metabolomics dataset indeed requires specialized forms of data analysis. One approach to finding meaning in metabolomics datasets involves multivariate analysis (MVA) methods such as principal component analysis (PCA) and partial least squares projection to latent structures (PLS), where spectral features contributing most to variation or separation are identified for further analysis. However, as with any mathematical treatment, these methods are not a panacea; this review discusses the use of multivariate analysis for metabolomics, as well as common pitfalls and misconceptions
Microbial eukaryotic distributions and diversity patterns in a deep-sea methane seep ecosystem
Although chemosynthetic ecosystems are known to support diverse assemblages of microorganisms, the ecological and environmental factors that structure microbial eukaryotes (heterotrophic protists and fungi) are poorly characterized. In this study, we examined the geographic, geochemical and ecological factors that influence microbial eukaryotic composition and distribution patterns within Hydrate Ridge, a methane seep ecosystem off the coast of Oregon using a combination of high-throughput 18S rRNA tag sequencing, terminal restriction fragment length polymorphism fingerprinting, and cloning and sequencing of full-length 18S rRNA genes. Microbial eukaryotic composition and diversity varied as a function of substrate (carbonate versus sediment), activity (low activity versus active seep sites), sulfide concentration, and region (North versus South Hydrate Ridge). Sulfide concentration was correlated with changes in microbial eukaryotic composition and richness. This work also revealed the influence of oxygen content in the overlying water column and water depth on microbial eukaryotic composition and diversity, and identified distinct patterns from those previously observed for bacteria, archaea and macrofauna in methane seep ecosystems. Characterizing the structure of microbial eukaryotic communities in response to environmental variability is a key step towards understanding if and how microbial eukaryotes influence seep ecosystem structure and function
Computational analysis of microbial flow cytometry data
Flow cytometry is an important technology for the study of microbial communities. It grants the ability to rapidly generate phenotypic single-cell data that are both quantitative, multivariate and of high temporal resolution. The complexity and amount of data necessitate an objective and streamlined data processing workflow that extends beyond commercial instrument software. No full overview of the necessary steps regarding the computational analysis of microbial flow cytometry data currently exists. In this review, we provide an overview of the full data analysis pipeline, ranging from measurement to data interpretation, tailored toward studies in microbial ecology. At every step, we highlight computational methods that are potentially useful, for which we provide a short nontechnical description. We place this overview in the context of a number of open challenges to the field and offer further motivation for the use of standardized flow cytometry in microbial ecology research
Multivariate Analysis in Metabolomics
Metabolomics aims to provide a global snapshot of all small-molecule metabolites in cells and biological fluids, free of observational biases inherent to more focused studies of metabolism. However, the staggeringly high information content of such global analyses introduces a challenge of its own; efficiently forming biologically relevant conclusions from any given metabolomics dataset indeed requires specialized forms of data analysis. One approach to finding meaning in metabolomics datasets involves multivariate analysis (MVA) methods such as principal component analysis (PCA) and partial least squares projection to latent structures (PLS), where spectral features contributing most to variation or separation are identified for further analysis. However, as with any mathematical treatment, these methods are not a panacea; this review discusses the use of multivariate analysis for metabolomics, as well as common pitfalls and misconceptions
Differences in Fecal Metabolite Profiles from Geographically Distinct Populations of Adolescents
Microbiota of the gastrointestinal tract have a variety of functions within the human body. They participate in protection of the host from pathogens, aid in immune system development and regulation, and carry out a variety of metabolic functions. This study focuses on the ability of gut microbiota to create metabolites through the degradation of food products. Using 1H NMR on fecal water extracts, I compared the metabolite profiles of two geographically distinct cohorts: healthy adolescents from Egypt (n=28) and healthy adolescents from the United States (n=14). Multivariate statistical analyses of binned NMR data confirmed that samples separated into groups corresponding to sample class. Quantification of metabolites revealed several metabolites that differed between groups. For example, levels of short chain fatty acids were higher in the Egyptian adolescents and most quantified amino acids were higher in the US adolescents. Multivariate statistical analyses of the quantified metabolite data showed separation based on the variability within the samples and placed samples into the correct class. Therefore, I concluded that fecal metabolite profiles differ between Egyptian and United States adolescents, and that these differences in metabolite levels may be linked to dietary differences between these two studied cohorts
Updates in metabolomics tools and resources: 2014-2015
Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table
- …