500 research outputs found

    Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era

    Get PDF
    Metagenomics has become one of the indispensable tools in microbial ecology for the last few decades, and a new revolution in metagenomic studies is now about to begin, with the help of recent advances of sequencing techniques. The massive data production and substantial cost reduction in next-generation sequencing have led to the rapid growth of metagenomic research both quantitatively and qualitatively. It is evident that metagenomics will be a standard tool for studying the diversity and function of microbes in the near future, as fingerprinting methods did previously. As the speed of data accumulation is accelerating, bioinformatic tools and associated databases for handling those datasets have become more urgent and necessary. To facilitate the bioinformatics analysis of metagenomic data, we review some recent tools and databases that are used widely in this field and give insights into the current challenges and future of metagenomics from a bioinformatics perspective.

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

    Full text link
    We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

    Gut microbiome diversity detected by high-coverage 16S and shotgun sequencing of paired stool and colon sample

    Get PDF
    The gut microbiome has a fundamental role in human health and disease. However, studying the complex structure and function of the gut microbiome using next generation sequencing is challenging and prone to reproducibility problems. Here, we obtained cross-sectional colon biopsies and faecal samples from nine participants in our COLSCREEN study and sequenced them in high coverage using Illumina pair-end shotgun (for faecal samples) and IonTorrent 16S (for paired feces and colon biopsies) technologies. The metagenomes consisted of between 47 and 92 million reads per sample and the targeted sequencing covered more than 300 k reads per sample across seven hypervariable regions of the 16S gene. Our data is freely available and coupled with code for the presented metagenomic analysis using up-to-date bioinformatics algorithms. These results will add up to the informed insights into designing comprehensive microbiome analysis and also provide data for further testing for unambiguous gut microbiome analysis

    A Comparison of rpoB and 16S rRNA as Markers in Pyrosequencing Studies of Bacterial Diversity

    Get PDF
    Background: The 16S rRNA gene is the gold standard in molecular surveys of bacterial and archaeal diversity, but it has the disadvantages that it is often multiple-copy, has little resolution below the species level and cannot be readily interpreted in an evolutionary framework. We compared the 16S rRNA marker with the single-copy, protein-coding rpoB marker by amplifying and sequencing both from a single soil sample. Because the higher genetic resolution of the rpoB gene prohibits its use as a universal marker, we employed consensus-degenerate primers targeting the Proteobacteria. <p/>Methodology/Principal Findings: Pyrosequencing can be problematic because of the poor resolution of homopolymer runs. As these erroneous runs disrupt the reading frame of protein-coding sequences, removal of sequences containing nonsense mutations was found to be a valuable filter in addition to flowgram-based denoising. Although both markers gave similar estimates of total diversity, the rpoB marker revealed more species, requiring an order of magnitude fewer reads to obtain 90% of the true diversity. The application of population genetic methods was demonstrated on a particularly abundant sequence cluster. <p/>Conclusions/Significance: The rpoB marker can be a complement to the 16S rRNA marker for high throughput microbial diversity studies focusing on specific taxonomic groups. Additional error filtering is possible and tests for recombination or selection can be employed

    The hidden diversity of archaea and bacteria in the human microbiome : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Microbiology at Massey University, Manawatū, New Zealand

    Get PDF
    Figures are re-used with permission.Current methods of metagenomic analysis require deep sequencing to identify microorganisms that are present at low abundance in complex microbiomes, including the human gut microbiome. The few known archaeal taxa present in the human gut are low in abundance in comparison to bacteria. This raises the question about whether the full diversity of human gut-associated archaea is known. To increase the resolution of metagenomic analysis, a new DNA normalization technique utilizing duplex specific nuclease (DSN) was used to enrich for DNA from “rare” archaeal and bacterial taxa isolated from two human metagenomic faecal samples. This DSN based normalization method failed to enrich for archaeal DNA, as it was digested by the DSN, however, it succeeded in enriching for low abundance bacterial DNA. This indicated that further optimization of the normalization method is required to enrich for low abundance archaeal DNA in human metagenomic samples. Whole metagenome shotgun sequencing was also used to identify a microbial community composition of participants gut microbiota including archaea. WGS identified a higher than anticipated diversity of archaeal taxa in gut microbiomes from both participants. Regardless of higher diversity, the low abundance of archaea in the human gut still render them as a part of rare biosphere. We envisage that with further optimization of DSN-based normalization, enrichment of “rare” taxa will improve detection resolution and therefore enhance our current understanding of the diversity of both archaeal and bacterial species in human gut microbiome

    차세대 염기서열 분석 장비로 생성한 메타지놈 데이터 분석을 위한 최적의 생물정보학 시스템 개발

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 천종식.Metagenome is total DNA directly extracted from environment, and the purpose of metagenomics is to reveal the function of the metagenome as well as the taxonomic structure in the metagenome. There are two analysis approaches for metagenomics, namely amplicon based approach and random shotgun based approach. Both approaches require large scale sequencing reads which could not be satisfied through Sanger sequencing. However, high throughput sequencing of reads at relatively low cost by Next Generation Sequencing (NGS) technologies meets the requirement of metagenomics. In addition, the advent of NGS technologies gave rise to the development of bioinformatic algorithms necessary for processing this large and complex sequencing data. Consequently, the large amount of sequencing data obtained from NGS and corresponding proper bioinformatic algorithms facilitated the metagenomics to become essential tool for microbiology. However, limitations incurred by NGS sequencing errors, short read length, and lack of analysis system still hinder accurate metagenome analysis. Therefore, evaluation of currently used NGS error handling algorithms and development of systematic pipeline with more efficient algorithms are required to improve the accuracy of analysis. In this study, bioinformatic pipelines were constructed for both metagenome analysis approaches. The pipelines were dedicated to improve the accuracy of the final end result by minimizing the effect of errors and short read length. For the amplicon based metagenomics, two different analysis pipelines were developed for both 454 pyrosequencing and Illumina MiSeq. During the construction of 454 pyrosequencing pipeline, new error handling algorithm was developed to treat homo-polymer and PCR errors. Upon completion of the pipeline construction, household microbial community was analyzed using 454 pyrosequencing data as a case study. As for Illumina MiSeq data, the most appropriate sequencing conditions and sequencing target region were settled. Paired end merging programs were evaluated and correlation of the sequencing errors and quality was studied to correct the errors within 3 overlap regions. Novel iterative consensus clustering method was developed to correct the errors occurring ubiquitously in a single read. For shotgun metagenomics approach, bioinformatic analysis system for Illumina MiSeq paired end data was constructed. Unlike the targeted amplicon sequencing reads, most of the shotgun sequencing reads are not mergedthus short reads are used for both functional and taxonomical profiling. However, a short read has less information than longer contigs, so the use of short reads is likely to cause biased characterization of the metagenome. Therefore, the development of analysis system did focus on creating longer contigs by means of mapping and de novo assembly. For raw read mapping, a dynamic mapping genome set construction method was developed. A list of mapping genomes was selected from the taxonomic profile inferred from the ribosomal RNA profiles. The genome sequence of the selected genomes were downloaded from Ezbiocloud. By mapping raw reads to the genome sequences, the longer contigs can be obtained in case of the relatively simple metagenome such as fecal matter. However in case of the complex metagenomes such as soil sample, both mapping and de novo assembly did not perform properly due to a lack of sequencing coverage and numerousity of uncultured microorganisms in the metagenome. In addition to the pipeline construction, visualization tools were also developed to display resultant taxonomic and functional profile at the same time. Newly developed JAVA-based standalone sequence alignment editing application was named as EzEditor. As both, conserved functional coding sequences and 16S rRNA gene have been used copiously in bacterial molecular phylogenetics, the codon-based sequence alignment editing functions are required for the coding genes. EzEditor provides simultaneous DNA and protein sequence alignment editing interface which enables us with the robust sequence alignment for both protein and rRNA sequences. EzEditor can be applied to various molecular sequence involved analysis not only as a basic sequence editor but also for phylogenetic application.ABSTRACT I TABLE OF CONTENTS IV ABBREVIATIONS VI FIGURE LIST VII TABLE LIST XII Chapter 1 General Introduction 1 1.1 Bioinformatics 2 1.2 Next Generation Sequencing 5 1.3 Metagenomics 11 1.4 Objectives of This Study 21 Chapter 2 Amplicon-based Metagenome Analysis Systems 23 2.1 Introduction 24 2.2 Analysis System for 454 Pyrosequencing 35 2.2.1 Methods 36 2.2.2 Results 39 2.3 Analysis System for Illumina MiSeq 60 2.3.1 Methods 62 2.3.2 Results 68 2.4 Summary and Discussion 93 Chapter 3 Shotgun-based Metagenome Analysis System 99 3.1 Introduction 100 3.1.1 Tools for Metagenomics 101 3.2 Methods 118 3.3 Results 125 3.4 Summary and Discussion 165 Chapter 4 EzEditor: A versatile Molecular Sequence Editor for Both Ribosomal RNA and Protein Coding Genes 169 4.1 Overview 170 4.2 Features of EzEditor 172 4.2.1 Algorithms and Models Implemented in EzEditor 177 4.2.2 Miscellaneous Functions 178 4.3 Summary and Discussion 181 Conclusions 183 References 187 APPENDIX I. Estimated Diversity Index of Household Microbiome 217 국문 초록 (Abstract in Korean) 221Docto

    Investigations on microbiome of the used clinical device revealed many uncultivable newer bacterial species associated with persistent chronic infections

    Get PDF
    Introduction. Chronic persistent device-related infections (DRIs) often give culture-negative results in a microbiological investigation. In such cases, investigations on the device metagenome might have a diagnostic value. Materials and Methods. The 16SrRNA gene sequence analysis and next-generation sequencing (NGS) of clinical metagenome were performed to detect bacterial diversity on invasive medical devices possibly involved in culture-negative DRIs. Device samples were first subjected to microbiological investigation followed by metagenome analysis. Environmental DNA (e-DNA) isolated from device samples was subjected to 16SrRNA gene amplification followed by Sanger sequencing (n=14). In addition, NGS of the device metagenome was also performed (n=12). Five samples were only common in both methods. Results. Microbial growth was observed in only nine cases; among these, five cases were considered significant growth, and in the remaining four cases, growth was considered either insignificant or contaminated. Culture and sequencing analysis yielded identical results only in six cases. In culture-negative cases, Sanger sequencing of 16SrRNA gene and NGS of 16SrDNA microbiome was able to identify the presence of rarely described human pathogens, namely Streptococcus infantis, Gemella haemolysans, Meiothermus silvanus, Schlegelella aquatica, Rothia mucilaginosa, Serratia nematodiphila, and Enterobacter asburiae, along with some known common nosocomial pathogens. Bacterial species such as M. silvanus and S. nematodiphila that are never reported in human infection were also identified. Conclusions. Results of a small number of diverse samples of this pilot study might lead to a path to study a large number of device samples that may validate the diversity witnessed. The study shows that a culture free, a holistic metagenomic approach using NGS could help identify the pathogens in culture-negative chronic DRIs

    Oral pathobiont induces systemic inflammation and metabolic changes associated with alteration of gut microbiota.

    Get PDF
    Periodontitis has been implicated as a risk factor for metabolic disorders such as type 2 diabetes, atherosclerotic vascular diseases, and non-alcoholic fatty liver disease. Although bacteremias from dental plaque and/or elevated circulating inflammatory cytokines emanating from the inflamed gingiva are suspected mechanisms linking periodontitis and these diseases, direct evidence is lacking. We hypothesize that disturbances of the gut microbiota by swallowed bacteria induce a metabolic endotoxemia leading metabolic disorders. To investigate this hypothesis, changes in the gut microbiota, insulin and glucose intolerance, and levels of tissue inflammation were analysed in mice after oral administration of Porphyromonas gingivalis, a representative periodontopathogens. Pyrosequencing revealed that the population belonging to Bacteroidales was significantly elevated in P. gingivalis-administered mice which coincided with increases in insulin resistance and systemic inflammation. In P. gingivalis-administered mice blood endotoxin levels tended to be higher, whereas gene expression of tight junction proteins in the ileum was significantly decreased. These results provide a new paradigm for the interrelationship between periodontitis and systemic diseases

    Mining for the rumen rare biosphere : a thesis presented in partial fulfilment of the requirement for the degree of Masters in Microbiology, Massey University, Manawatu, New Zealand

    Get PDF
    The microbial diversity present in the gut microbiome of ruminant animals is of great interest due to its effect on the New Zealand economy. The rumen, a forestomach of ruminants, is a large fermentation chamber. The microbiome within the rumen influences production of milk and meat, and additionally impacts on climate change through the emission of enteric methane. Although, the core microbiome has been studied intensely, the rare biosphere, which is comprised of the rare microorganisms present in less than 0.1% of the abundance, is still largely unknown. Recent developments in methods for subtraction, or normalisation, of the dominant microorganisms from analysis of complex microbiomes, including treatment with duplex-specific nuclease (DSN), have enabled the increase of the number of sequences from low abundance microorganisms. Decreasing presence of dominant species and simultaneously increasing low abundant allows the exploration of the rare biosphere and discovery of taxa which otherwise would not have been identified. By applying DSN-based normalisation to a metagenomic DNA isolated from the rumen microbiome, we have demonstrated that the low abundance microorganisms, can be amplified to a detectable level while decreasing the abundance of sequences from dominant species. The outcome of DNA normalisation, primarily taxonomic assignment and phylogeny was assessed by using the gene encoding the β subunit of bacterial RNA polymerase, rpoB, as well as the “gold standard” 16S rRNA as phylogenetic markers. We have demonstrated that rpoB could be effectively used for determining the rumen microbial community profile and could become by broader adoption from researchers, a valuable resource for microbial ecology studies. We suggest that DSN-based normalisation could be utilised for in-depth exploration of the rare biosphere as a whole, resulting in the discovery of new species, new genes and increasing understanding of the role that these rare microorganisms play in the rumen microbiome. The inclusion of rpoB, alone or in combination with 16S rRNA marker, in microbial ecology studies could lead to more accurate classification of the taxa
    corecore