Assessing 16S rRNA Marker-Gene Survey Measurement Process Using Mixtures of Environmental Samples

Abstract

Microbial communities play a fundamental role in environmental and human health. Targeted sequencing of the 16S rRNA gene, 16S rRNA marker-gene surveys, is used to measure and thus characterize these communities. The 16S rRNA marker- gene survey measurement process includes a number of molecular laboratory and computational steps. A rigorous measurement assessment framework can evaluate measurement method performance, in turn improving the validity of marker-gene survey study conclusions. In this dissertation, I present a novel framework and mixture dataset for assessing 16S rRNA marker-gene survey bioinformatic methods. Additionally, I developed software to facilitate working with 16S rRNA reference sequence databases and 16S rRNA marker-gene survey feature data. Computational steps, collectively referred to as bioinformatic pipelines, combine multiple algorithms to convert raw sequence data into a count table, which is subsequently used to test biological hypotheses. Algorithm choice and parameters can significantly impact pipeline results. The assessment framework and software developed for this dissertation improve upon existing assessment methods and can be used to evaluate new computational methods and optimize existing pipelines. Furthermore, the assessment framework presented here can be applied to other microbial community measurement methods such as shotgun metagenomics

    Similar works