Search CORE

13,304 research outputs found

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Author: Lee Byunghan
Moon Taesup
Weissman Tsachy
Yoon Sungroh
Publication venue
Publication date: 01/01/2017
Field of study

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

A Quantitative Sequencing Framework for Absolute Abundance Measurements of Mucosal and Lumenal Microbial Communities

Author: Barlow Jacob T.
Bogatyrev Said R.
Ismagilov Rustem F.
Publication venue: Nature Publishing Group
Publication date: 18/02/2020
Field of study

A fundamental goal in microbiome studies is determining which microbes affect host physiology. Standard methods for determining changes in microbial taxa measure relative, rather than absolute abundances. Moreover, studies often analyze only stool, despite microbial diversity differing substantially among gastrointestinal (GI) locations. Here, we develop a quantitative framework to measure absolute abundances of individual bacterial taxa by combining the precision of digital PCR with the high-throughput nature of 16S rRNA gene amplicon sequencing. In a murine ketogenic-diet study, we compare microbial loads in lumenal and mucosal samples along the GI tract. Quantitative measurements of absolute (but not relative) abundances reveal decreases in total microbial loads on the ketogenic diet and enable us to determine the differential effects of diet on each taxon in stool and small-intestine mucosa samples. This rigorous quantitative microbial analysis framework, appropriate for diverse GI locations enables mapping microbial biogeography of the mammalian GI tract and more accurate analyses of changes in microbial taxa in microbiome studies

Caltech Authors

CaltechDATA (California Institute of Technology Research Data Repository)

Bioinformatics tools for analysing viral genomic data

Author: Davison A.
Gu Q.
Hughes J.
Maabar M.
Modha S.
Orton R.J.
Vattipally Sreenu
Wilkie G.S.
Publication venue: 'O.I.E (World Organisation for Animal Health)'
Publication date: 01/04/2016
Field of study

The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

Enlighten

Recommended from our members

Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads.

Author: Arthur Timothy D
Bankevich Anton
Boland Brigid S
Brennan Caitriona
Chang John T
Chen Feng
Conrad Douglas J
Dang Jason W
Dorrestein Pieter C
Fedarko Marcus
Gaffney James
Green Cliff
Humphrey Greg C
Jepsen Kristen
Khosroheidari Mahdieh
Knight Rob
Liyanage Marlon
Martino Cameron
Minich Jeremiah
Nurk Sergey
Pevzner Pavel A
Phelan Vanessa V
Quinn Robert A
Rana Tariq M
Salido Rodolfo A
Sandborn William J
Sanders Jon G
Sanders Karenina
Smarr Larry
Xu Zhenjiang Z
Zhu Qiyun
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing

eScholarship - University of California

mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

Author: Arron Shiffer
Benjamin Wolfe
Corinne F. Maurice
J. Gregory Caporaso
Jai Ram Rideout
Josh D. Neufeld
Nicholas A. Bokulich
Peter J. Turnbaugh
Rachel J. Dutton
Rob Knight
William G. Mercurio
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community

Repository for Publications and Research Data

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Prospecting environmental mycobacteria: combined molecular approaches reveal unprecedented diversity

Background: Environmental mycobacteria (EM) include species commonly found in various terrestrial and aquatic environments, encompassing animal and human pathogens in addition to saprophytes. Approximately 150 EM species can be separated into fast and slow growers based on sequence and copy number differences of their 16S rRNA genes. Cultivation methods are not appropriate for diversity studies; few studies have investigated EM diversity in soil despite their importance as potential reservoirs of pathogens and their hypothesized role in masking or blocking M. bovis BCG vaccine. Methods: We report here the development, optimization and validation of molecular assays targeting the 16S rRNA gene to assess diversity and prevalence of fast and slow growing EM in representative soils from semi tropical and temperate areas. New primer sets were designed also to target uniquely slow growing mycobacteria and used with PCR-DGGE, tag-encoded Titanium amplicon pyrosequencing and quantitative PCR. Results: PCR-DGGE and pyrosequencing provided a consensus of EM diversity; for example, a high abundance of pyrosequencing reads and DGGE bands corresponded to M. moriokaense, M. colombiense and M. riyadhense. As expected pyrosequencing provided more comprehensive information; additional prevalent species included M. chlorophenolicum, M. neglectum, M. gordonae, M. aemonae. Prevalence of the total Mycobacterium genus in the soil samples ranged from 2.3×107 to 2.7×108 gene targets g−1; slow growers prevalence from 2.9×105 to 1.2×107 cells g−1. Conclusions: This combined molecular approach enabled an unprecedented qualitative and quantitative assessment of EM across soil samples. Good concordance was found between methods and the bioinformatics analysis was validated by random resampling. Sequences from most pathogenic groups associated with slow growth were identified in extenso in all soils tested with a specific assay, allowing to unmask them from the Mycobacterium whole genus, in which, as minority members, they would have remained undetected

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

FigShare

Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

arXiv.org e-Print Archive

Crossref