Studying microbial community diversities by developing high-throughput experimental techniques and computational tools

Abstract

Since the advent of high-throughput technologies, the understanding of microbial biodiversity has rapidly transformed. Amplicon sequencing of phylogenetic makers, especially 16S rRNA genes has now become a well-adopted tool to discover microbial taxonomic diversities in virtually all habitats, aquatic, terrestrial, local or global ecosystems. Although high-throughput sequencing, such as Illumina-based technologies (e.g. MiSeq), has revolutionized microbial ecology, the adoption of amplicon sequencing for environmental microbial community analysis is challenging due to the problem of low base diversity of the target region. In this study, a new phasing amplicon sequencing approach (PAS) was developed by shifting sequencing phases among different community samples from both directions via adding various numbers of bases (0–7) as spacers to both forward and reverse primers. Our results first indicated that the PAS method substantially ameliorated the problem of unbalanced base composition. Second, the PAS method substantially improved the sequence read base quality (an average of 10 % higher of bases above Q30). Third, the PAS method effectively increased raw sequence throughput (~15 % more raw reads). In addition, the PAS method significantly increased effective reads (9–47 %) and the effective read sequence length (16–96 more bases) after quality trim at Q30 with window 5. In addition, the PAS method reduced half of the sequencing errors (0.54–1.1 % less). Finally, two-step PCR amplification of the PAS method effectively ameliorated the amplification biases introduced by the long-barcoded PCR primers. The developed strategy is robust for 16S rRNA gene amplicon sequencing, and a similar strategy could also be used for sequencing other genes important to ecosystem functional processes. To facilitate the analysis of the data produced from the amplicon sequencing technologies, a data analysis pipeline is developed and is running to serve more than 200 users with the data processing and preliminary analysis for the amplicon sequences. The publicly available pipelines, such as QIIME(Caporaso, Kuczynski et al. 2010, Caporaso, Lauber et al. 2012) and MOTHUR (Schloss, Westcott et al. 2009), are mostly standalone services and need minimum program skills to perform the analysis. Our pipeline provides a more user-friendly interface through webpage and users will only need to click buttons rather than type command lines to perform the basic data analysis. Besides the convenient operations, the Galaxy platform provides an organized way to upload, store, track and share the data histories from different projects. The pipeline is also flexible to add new programs that are developed by others and the data source is not limited to 16S rRNAs but also functional gene amplicon sequences. The pipeline has served the research community for several years, and more than a dozen papers are published using this pipeline. A practical application of amplicon sequencing was followed to discover the biodiversity of microbial fungal communities in six North American forests soils. The biodiversity of fungi has been studied across many habitats, but the spatial patterns of fungi diversity and the possible mechanisms behind them still need exploration. In this study, the soil fungal samples were collected from six forest sites across a wide range of latitudes in North America with a nested design in each site to uncover the diversity pattern of the soil fungal communities in forest systems. The richness of fungi follows a clear latitudinal gradient, where temperature, precipitation, pH and nitrogen concentration also contribute to the prediction of the richness of the soil fungal communities. The compositions of fungal communities are distinct from each other across six forest sites. The main drivers of alpha diversity of fungi in forest soil are latitude, along with the mean annual temperature, precipitation, soil pH, soil total carbon, and soil total nitrogen. These seven variables can be used to predict the α-diversity of the soil fungal communities, and more than 70% variance can be explained by these variables only. As for the β-diversity, the dissimilarities among the fungal communities increases significantly as the distance between the sampling sites become larger. The distance-decay curve explains this pattern and indicates that the turnover rates of the fungal species are different in the local and continental scales. We further proved that the key drivers of the difference in fungal community composition highly depends on the spatial scale, and the geographic distance is the major contributor to explain these differences. In summary, this study of the fungal communities in the North American forest soils has shown several patterns along with the possible drivers behind them, which presents insights into the nature of soil fungal communities. When the advanced high-throughput technologies have enabled researchers to gain unprecedented insights of the diversity of microbial communities without culturing and identify individuals, the merely knowing the answer to “who is there” is no longer enough, the question now is to link the ‘measurable’ community structures to the ecosystem functioning. If this connection can be set up, then it is possible to understand that how the disturbances brought by the human activities and global climate change will change the ecosystem functioning carried out by microbial communities. Functional diversity, which measures the range of things that organisms do in the surrounding ecosystem has shown its power in linking the microbial communities to the dynamics of ecosystems. In the final part of this study, we provide a framework using Rao’s entropy to quantify microbial functional diversity based on GeoChip (a high-throughput functional gene array), and the phylogenetic distances between each probe are considered in the calculation. This index falls into the category of trait-based functional diversity, with the advantages of pre-selected key functional traits related to functional ecosystem designed in GeoChip. This functional diversity index can be partitioned into α- and β- diversity, which extends the understanding of functional diversity pattern into different temporal or spatial scales. The functional redundancy can also be defined following the definition of the functional diversity, which is more like a measure of gene similarity or convergence, rather than the traditionally defined ‘functional redundancy’ for multiple functionalities in an ecosystem. Given the hypothesis that sequence similarity leads to function similarity, the new definition of functional redundancy can reveal the redundant level of functional traits in the same gene. We applied this functional diversity framework to study the dynamic changes over a 9-month period of microbial communities in a contaminated groundwater system (with U(VI), SO42-, NO3-, etc.,) after a one-time EVO (emulsified vegetable oil) amendment, which has been proven that it can effectively reduce U(VI) for a considerable time period (around one year). Using the acetate production as the measurement of EVO degradation process, the functional diversity of the key gene responsible for degradation of EVO significantly correlate with the function itself (R2 = 0.685, p-0.021), where the other functional indices such as the gene richness did not show such a strong relationship. When using functional diversity to profile the whole community functional structure, statistical tests also proved that the change of environmental variables does shift the community functional structure, while this connection is not as clear if using other indices to represent the community functional structures. In summary, the new framework of function diversity integrates both functional traits and their phylogenetic signals, which has been proven to be a more sensitive indicator of ecological functions than traditionally used gene richness

    Similar works