2,861 research outputs found
Pseudoalignment for metagenomic read assignment
Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains.
Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects
Recommended from our members
Deconvolute individual genomes from metagenome sequences through short read clustering.
Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality
Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease
The human microbiome is a very active area of research due to its potential to explain
health and disease. Advances in high throughput DNA sequencing in the last decade have
catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective
method to characterize entire microbial communities directly, including unculturable
microbes which were previously difficult to study. 16S rRNA sequencing and shotgun
metagenomics, coupled with bioinformatics methods have powered the characterization of
the human microbiome in different parts of the body. This has led to the discovery of novel
links between the microbiome and diseases such as allergies, cancer, and autoimmune
diseases.
This thesis focuses on the application of both 16S rRNA sequencing and shotgun
metagenomics for the characterization of the human microbiome and its relationship with
health and disease. We established two methodologies to address these questions. The first
methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens
involved in disease using shotgun metagenomics technology. In paper I, we apply the
proposed pipeline to explore the hypothesis of viral infection as a putative cause of
childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary
method to the pipeline to improve the detection of unknown viruses, especially those with
little or no homology to currently known viruses. We applied this method on a collection of
viral-enriched libraries which resulted in the characterization of a new viral-like genome.
The second methodology was developed to explore and generate hypothesis from a human
skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the
analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration
of the dataset to discover different aspects on how the microbiome is linked to both
diseases. Paper IV follows up from the results of paper III but focuses on characterizing
the skin site microbiome variability in Atopic Dermatitis
Statistical methods for analyzing sequencing data with applications in modern biomedical analysis and personalized medicine
There has been tremendous advancement in sequencing technologies; the rate at which sequencing data can be generated has increased multifold while the cost of sequencing continues on a downward descent. Sequencing data provide novel insights into the ecological environment of microbes as well as human health and disease status but challenge investigators with a variety of computational issues. This thesis focuses on three common problems in the analysis of high-throughput data. The goals of the first project are to (1) develop a statistical framework and a complete software pipeline for metagenomics that identifies microbes to the strain level and thus facilitating a personalized drug treatment targeting the strain; and (2) estimate the relative content of microbes in a sample as accurately and as quickly as possible.
The second project focuses on the analysis of the microbiome variation across multiple samples. Studying the variation of microbiomes under different conditions within an organism or environment is the key to diagnosing diseases and providing personalized treatments. The goals are to (1) identify various statistical diversity measures; (2) develop confidence regions for the relative abundance estimates; (3) perform multi-dimensional and differential expression analysis; and (4) develop a complete pipeline for multi-sample microbiome analysis.
The third project is focused on batch effect analysis. When analyzing high dimensional data, non-biological experimental variation or “batch effects” confound the true associations between the conditions of interest and the outcome variable. Batch effects exist even after normalization. Hence, unless the batch effects are identified and corrected, any attempts for downstream analyses, will likely be error prone and may lead to false positive results. The goals are to (1) analyze the effect of correlation of the batch adjusted data and develop new techniques to account for correlation in two step hypothesis testing approach; (2) develop a software pipeline to identify whether batch effects are present in the data and adjust for batch effects in a suitable way.
In summary, we developed software pipelines called PathoScope, PathoStat and BatchQC as part of these projects and validated our techniques using simulation and real data sets
Legume-rhizobia interactions in a complex microbiome
Biological nitrogen fixation is important for agriculture, carbon
sequestration, and ecosystem restoration. This is primarily
conducted by rhizobia (nitrogen fixing bacteria) in association
with legume plants.
Most research in improving rhizobial strains involves single
strain experiments. However, improved metagenomics methods have
demonstrated considerable differences between single strain
inoculations and strain behaviour when exposed to a complex
microbiome.
To identify some these differences, this experiment applies two
treatment factors in a controlled environment of containers with
autoclaved sand. All main experimental containers were inoculated
with several strains of Bradyrhizobia japonicum. The first
treatment factor was the planting of surface-sterilised seedlings
of host plant Acacia acuminata; the second treatment factor was
inoculation with an external soil microbiome. Several negative
controls without planting or inoculation were also present.
A novel method of whole genome metagenomic sequencing to observe
known strain abundance, without amplification or culturing, was
developed. Using this method, abundance patterns of these B.
japonicum strains were compared between initial inoculation and
the end of a growth period of several weeks. Analysis reveals a
single strain as the preferred nodulation strain within this
experiment, but also shows that all strains inoculated continued
to persist in the substrate at detectable levels.
The use of long reads with the MinION DNA sequencer also allowed
the potential of identification of horizontal gene transfer
events. None were detected in an initial screen, but a framework
for further inspection of this dataset for such events is
described
Recommended from our members
The fecal resistome of dairy cattle is associated with diet during nursing.
Antimicrobial resistance is a global public health concern, and livestock play a significant role in selecting for resistance and maintaining such reservoirs. Here we study the succession of dairy cattle resistome during early life using metagenomic sequencing, as well as the relationship between resistome, gut microbiota, and diet. In our dataset, the gut of dairy calves serves as a reservoir of 329 antimicrobial resistance genes (ARGs) presumably conferring resistance to 17 classes of antibiotics, and the abundance of ARGs declines gradually during nursing. ARGs appear to co-occur with antibacterial biocide or metal resistance genes. Colostrum is a potential source of ARGs observed in calves at day 2. The dynamic changes in the resistome are likely a result of gut microbiota assembly, which is closely associated with diet transition in dairy calves. Modifications in the resistome may be possible via early-life dietary interventions to reduce overall antimicrobial resistance
Recommended from our members
A pipeline for targeted metagenomics of environmental bacteria.
BackgroundMetagenomics and single cell genomics provide a window into the genetic repertoire of yet uncultivated microorganisms, but both methods are usually taxonomically untargeted. The combination of fluorescence in situ hybridization (FISH) and fluorescence activated cell sorting (FACS) has the potential to enrich taxonomically well-defined clades for genomic analyses.MethodsCells hybridized with a taxon-specific FISH probe are enriched based on their fluorescence signal via flow cytometric cell sorting. A recently developed FISH procedure, the hybridization chain reaction (HCR)-FISH, provides the high signal intensities required for flow cytometric sorting while maintaining the integrity of the cellular DNA for subsequent genome sequencing. Sorted cells are subjected to shotgun sequencing, resulting in targeted metagenomes of low diversity.ResultsPure cultures of different taxonomic groups were used to (1) adapt and optimize the HCR-FISH protocol and (2) assess the effects of various cell fixation methods on both the signal intensity for cell sorting and the quality of subsequent genome amplification and sequencing. Best results were obtained for ethanol-fixed cells in terms of both HCR-FISH signal intensity and genome assembly quality. Our newly developed pipeline was successfully applied to a marine plankton sample from the North Sea yielding good quality metagenome assembled genomes from a yet uncultivated flavobacterial clade.ConclusionsWith the developed pipeline, targeted metagenomes at various taxonomic levels can be efficiently retrieved from environmental samples. The resulting metagenome assembled genomes allow for the description of yet uncharacterized microbial clades. Video abstract
- …