2,861 research outputs found

    Pseudoalignment for metagenomic read assignment

    Get PDF
    Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects

    Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease

    Get PDF
    The human microbiome is a very active area of research due to its potential to explain health and disease. Advances in high throughput DNA sequencing in the last decade have catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective method to characterize entire microbial communities directly, including unculturable microbes which were previously difficult to study. 16S rRNA sequencing and shotgun metagenomics, coupled with bioinformatics methods have powered the characterization of the human microbiome in different parts of the body. This has led to the discovery of novel links between the microbiome and diseases such as allergies, cancer, and autoimmune diseases. This thesis focuses on the application of both 16S rRNA sequencing and shotgun metagenomics for the characterization of the human microbiome and its relationship with health and disease. We established two methodologies to address these questions. The first methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens involved in disease using shotgun metagenomics technology. In paper I, we apply the proposed pipeline to explore the hypothesis of viral infection as a putative cause of childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary method to the pipeline to improve the detection of unknown viruses, especially those with little or no homology to currently known viruses. We applied this method on a collection of viral-enriched libraries which resulted in the characterization of a new viral-like genome. The second methodology was developed to explore and generate hypothesis from a human skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration of the dataset to discover different aspects on how the microbiome is linked to both diseases. Paper IV follows up from the results of paper III but focuses on characterizing the skin site microbiome variability in Atopic Dermatitis

    Statistical methods for analyzing sequencing data with applications in modern biomedical analysis and personalized medicine

    Full text link
    There has been tremendous advancement in sequencing technologies; the rate at which sequencing data can be generated has increased multifold while the cost of sequencing continues on a downward descent. Sequencing data provide novel insights into the ecological environment of microbes as well as human health and disease status but challenge investigators with a variety of computational issues. This thesis focuses on three common problems in the analysis of high-throughput data. The goals of the first project are to (1) develop a statistical framework and a complete software pipeline for metagenomics that identifies microbes to the strain level and thus facilitating a personalized drug treatment targeting the strain; and (2) estimate the relative content of microbes in a sample as accurately and as quickly as possible. The second project focuses on the analysis of the microbiome variation across multiple samples. Studying the variation of microbiomes under different conditions within an organism or environment is the key to diagnosing diseases and providing personalized treatments. The goals are to (1) identify various statistical diversity measures; (2) develop confidence regions for the relative abundance estimates; (3) perform multi-dimensional and differential expression analysis; and (4) develop a complete pipeline for multi-sample microbiome analysis. The third project is focused on batch effect analysis. When analyzing high dimensional data, non-biological experimental variation or “batch effects” confound the true associations between the conditions of interest and the outcome variable. Batch effects exist even after normalization. Hence, unless the batch effects are identified and corrected, any attempts for downstream analyses, will likely be error prone and may lead to false positive results. The goals are to (1) analyze the effect of correlation of the batch adjusted data and develop new techniques to account for correlation in two step hypothesis testing approach; (2) develop a software pipeline to identify whether batch effects are present in the data and adjust for batch effects in a suitable way. In summary, we developed software pipelines called PathoScope, PathoStat and BatchQC as part of these projects and validated our techniques using simulation and real data sets

    Legume-rhizobia interactions in a complex microbiome

    Get PDF
    Biological nitrogen fixation is important for agriculture, carbon sequestration, and ecosystem restoration. This is primarily conducted by rhizobia (nitrogen fixing bacteria) in association with legume plants. Most research in improving rhizobial strains involves single strain experiments. However, improved metagenomics methods have demonstrated considerable differences between single strain inoculations and strain behaviour when exposed to a complex microbiome. To identify some these differences, this experiment applies two treatment factors in a controlled environment of containers with autoclaved sand. All main experimental containers were inoculated with several strains of Bradyrhizobia japonicum. The first treatment factor was the planting of surface-sterilised seedlings of host plant Acacia acuminata; the second treatment factor was inoculation with an external soil microbiome. Several negative controls without planting or inoculation were also present. A novel method of whole genome metagenomic sequencing to observe known strain abundance, without amplification or culturing, was developed. Using this method, abundance patterns of these B. japonicum strains were compared between initial inoculation and the end of a growth period of several weeks. Analysis reveals a single strain as the preferred nodulation strain within this experiment, but also shows that all strains inoculated continued to persist in the substrate at detectable levels. The use of long reads with the MinION DNA sequencer also allowed the potential of identification of horizontal gene transfer events. None were detected in an initial screen, but a framework for further inspection of this dataset for such events is described
    • …
    corecore