Search CORE

2,861 research outputs found

Pseudoalignment for metagenomic read assignment

Author: Bray N.
Melsted P.
Pachter L.
Pimentel H.
Schaeffer L.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/07/2017
Field of study

Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects

Caltech Authors

Recommended from our members

Deconvolute individual genomes from metagenome sequences through short read clustering.

Author: Deng Li
Li Kexue
Lu Yakang
Shi Lizhen
Wang Lili
Wang Zhong
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality

eScholarship - University of California

Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease

Author: Barrientos Somarribas Mauricio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/09/2019
Field of study

The human microbiome is a very active area of research due to its potential to explain health and disease. Advances in high throughput DNA sequencing in the last decade have catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective method to characterize entire microbial communities directly, including unculturable microbes which were previously difficult to study. 16S rRNA sequencing and shotgun metagenomics, coupled with bioinformatics methods have powered the characterization of the human microbiome in different parts of the body. This has led to the discovery of novel links between the microbiome and diseases such as allergies, cancer, and autoimmune diseases. This thesis focuses on the application of both 16S rRNA sequencing and shotgun metagenomics for the characterization of the human microbiome and its relationship with health and disease. We established two methodologies to address these questions. The first methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens involved in disease using shotgun metagenomics technology. In paper I, we apply the proposed pipeline to explore the hypothesis of viral infection as a putative cause of childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary method to the pipeline to improve the detection of unknown viruses, especially those with little or no homology to currently known viruses. We applied this method on a collection of viral-enriched libraries which resulted in the characterization of a new viral-like genome. The second methodology was developed to explore and generate hypothesis from a human skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration of the dataset to discover different aspects on how the microbiome is linked to both diseases. Paper IV follows up from the results of paper III but focuses on characterizing the skin site microbiome variability in Atopic Dermatitis

Publications from Karolinska Institutet

Statistical methods for analyzing sequencing data with applications in modern biomedical analysis and personalized medicine

Author: Manimaran Solaiappan
Publication venue
Publication date: 13/03/2017
Field of study

There has been tremendous advancement in sequencing technologies; the rate at which sequencing data can be generated has increased multifold while the cost of sequencing continues on a downward descent. Sequencing data provide novel insights into the ecological environment of microbes as well as human health and disease status but challenge investigators with a variety of computational issues. This thesis focuses on three common problems in the analysis of high-throughput data. The goals of the first project are to (1) develop a statistical framework and a complete software pipeline for metagenomics that identifies microbes to the strain level and thus facilitating a personalized drug treatment targeting the strain; and (2) estimate the relative content of microbes in a sample as accurately and as quickly as possible. The second project focuses on the analysis of the microbiome variation across multiple samples. Studying the variation of microbiomes under different conditions within an organism or environment is the key to diagnosing diseases and providing personalized treatments. The goals are to (1) identify various statistical diversity measures; (2) develop confidence regions for the relative abundance estimates; (3) perform multi-dimensional and differential expression analysis; and (4) develop a complete pipeline for multi-sample microbiome analysis. The third project is focused on batch effect analysis. When analyzing high dimensional data, non-biological experimental variation or “batch effects” confound the true associations between the conditions of interest and the outcome variable. Batch effects exist even after normalization. Hence, unless the batch effects are identified and corrected, any attempts for downstream analyses, will likely be error prone and may lead to false positive results. The goals are to (1) analyze the effect of correlation of the batch adjusted data and develop new techniques to account for correlation in two step hypothesis testing approach; (2) develop a software pipeline to identify whether batch effects are present in the data and adjust for batch effects in a suitable way. In summary, we developed software pipelines called PathoScope, PathoStat and BatchQC as part of these projects and validated our techniques using simulation and real data sets

Boston University Institutional Repository (OpenBU)

Legume-rhizobia interactions in a complex microbiome

Author: Chia Ming-Dao
Publication venue
Publication date: 01/01/2018
Field of study

Biological nitrogen fixation is important for agriculture, carbon sequestration, and ecosystem restoration. This is primarily conducted by rhizobia (nitrogen fixing bacteria) in association with legume plants. Most research in improving rhizobial strains involves single strain experiments. However, improved metagenomics methods have demonstrated considerable differences between single strain inoculations and strain behaviour when exposed to a complex microbiome. To identify some these differences, this experiment applies two treatment factors in a controlled environment of containers with autoclaved sand. All main experimental containers were inoculated with several strains of Bradyrhizobia japonicum. The first treatment factor was the planting of surface-sterilised seedlings of host plant Acacia acuminata; the second treatment factor was inoculation with an external soil microbiome. Several negative controls without planting or inoculation were also present. A novel method of whole genome metagenomic sequencing to observe known strain abundance, without amplification or culturing, was developed. Using this method, abundance patterns of these B. japonicum strains were compared between initial inoculation and the end of a growth period of several weeks. Analysis reveals a single strain as the preferred nodulation strain within this experiment, but also shows that all strains inoculated continued to persist in the substrate at detectable levels. The use of long reads with the MinION DNA sequencer also allowed the potential of identification of horizontal gene transfer events. None were detected in an initial screen, but a framework for further inspection of this dataset for such events is described

The Australian National University

Recommended from our members

The fecal resistome of dairy cattle is associated with diet during nursing.

Author: DePeters Edward J
Johnson Daisy
Lemay Danielle G
Liu Jinxin
Maldonado-Gomez Maria X
Mills David A
Taft Diana H
Treiber Michelle L
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Antimicrobial resistance is a global public health concern, and livestock play a significant role in selecting for resistance and maintaining such reservoirs. Here we study the succession of dairy cattle resistome during early life using metagenomic sequencing, as well as the relationship between resistome, gut microbiota, and diet. In our dataset, the gut of dairy calves serves as a reservoir of 329 antimicrobial resistance genes (ARGs) presumably conferring resistance to 17 classes of antibiotics, and the abundance of ARGs declines gradually during nursing. ARGs appear to co-occur with antibacterial biocide or metal resistance genes. Colostrum is a potential source of ARGs observed in calves at day 2. The dynamic changes in the resistome are likely a result of gut microbiota assembly, which is closely associated with diet transition in dairy calves. Modifications in the resistome may be possible via early-life dietary interventions to reduce overall antimicrobial resistance

eScholarship - University of California

Recommended from our members

A pipeline for targeted metagenomics of environmental bacteria.

Author: Bowers Robert M
Fuchs Bernhard M
Goudeau Danielle
Grieb Anissa
Lee Janey
Malmstrom Rex R
Oggerin Monike
Woyke Tanja
Publication venue: eScholarship, University of California
Publication date: 15/02/2020
Field of study

BackgroundMetagenomics and single cell genomics provide a window into the genetic repertoire of yet uncultivated microorganisms, but both methods are usually taxonomically untargeted. The combination of fluorescence in situ hybridization (FISH) and fluorescence activated cell sorting (FACS) has the potential to enrich taxonomically well-defined clades for genomic analyses.MethodsCells hybridized with a taxon-specific FISH probe are enriched based on their fluorescence signal via flow cytometric cell sorting. A recently developed FISH procedure, the hybridization chain reaction (HCR)-FISH, provides the high signal intensities required for flow cytometric sorting while maintaining the integrity of the cellular DNA for subsequent genome sequencing. Sorted cells are subjected to shotgun sequencing, resulting in targeted metagenomes of low diversity.ResultsPure cultures of different taxonomic groups were used to (1) adapt and optimize the HCR-FISH protocol and (2) assess the effects of various cell fixation methods on both the signal intensity for cell sorting and the quality of subsequent genome amplification and sequencing. Best results were obtained for ethanol-fixed cells in terms of both HCR-FISH signal intensity and genome assembly quality. Our newly developed pipeline was successfully applied to a marine plankton sample from the North Sea yielding good quality metagenome assembled genomes from a yet uncultivated flavobacterial clade.ConclusionsWith the developed pipeline, targeted metagenomes at various taxonomic levels can be efficiently retrieved from environmental samples. The resulting metagenome assembled genomes allow for the description of yet uncharacterized microbial clades. Video abstract

eScholarship - University of California

MPG.PuRe