14 research outputs found

    Pseudoalignment for metagenomic read assignment

    Get PDF
    Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects

    Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice

    Get PDF
    The study of microorganisms is a field of great interest due to their environmental (e.g., soil contamination) and biomedical (e.g., parasitic diseases, autism) importance. The advent of revolutionary next-generation sequencing techniques, and their application to the hypervariable regions of the 16S, 18S or 23S ribosomal subunits, have allowed the research of a large variety of organisms more in-depth, including bacteria, archaea, eukaryotes and fungi. Additionally, together with the development of analysis software, the creation of specific databases (e.g., SILVA or RDP) has boosted the enormous growth of these studies. As the cost of sequencing per sample has continuously decreased, new protocols have also emerged, such as shotgun sequencing, which allows the profiling of all taxonomic domains in a sample. The sequencing of hypervariable regions and shotgun sequencing are technologies that enable the taxonomic classification of microorganisms from the DNA present in microbial communities. However, they are not capable of measuring what is actively expressed. Conversely, we advocate that metatranscriptomics is a “new” technology that makes the identification of the mRNAs of a microbial community possible, quantifying gene expression levels and active biological pathways. Furthermore, it can be also used to characterise symbiotic interactions between the host and its microbiome. In this manuscript, we examine the three technologies above, and discuss the implementation of different software and databases, which greatly impact the obtaining of reliable results. Finally, we have developed two easy-to-use pipelines leveraging Nextflow technology. These aim to provide everything required for an average user to perform a metagenomic analysis of marker genes with QIMME2 and a metatranscriptomic study using Kraken2/Bracken.regional Andalusian GovernmentPOSTDOC_21 _0039

    Ανάλυση δεδομένων μεταγονιδιωματικής (metagenomics) και μικροβιώματος (microbiome) από πειράματα NGS με την τεχνική του quasi-mapping

    Get PDF
    Η ανάπτυξη των τεχνολογιών αλληλούχισης επόμενης γενιάς (Next Generation Sequencing - NGS) έχουν μετατρέψει την ικανότητα μας για διερεύνηση της σύνθεσης και δυναμικής των μικροβιακών κοινοτήτων που κατοικούν στα χερσαία και υδάτινα οικοσυστήματα, καθώς επίσης και στο ανθρώπινο δέρμα, το έντερο και το στόμα. Τα δεδομένα μεταγονιδιωματικής (metagenomics) που προκύπτουν από πειράματα NGS, συνήθως περιλαμβάνουν ένα μεγάλο αριθμό από μικροοργανισμούς (βακτήρια, ιούς κ.τ.λ.) και ως εκ τούτου, συνήθως, παράγουν αρχεία πολύ μεγάλου μεγέθους. Σκοπός της παρούσας εργασίας ήταν ο σχεδιασμός και η ανάπτυξη ενός υπολογιστικού εργαλείου το οποίο να έχει τη δυνατότητα να εντοπίζει και να ποσοτικοποιεί τους οργανισμούς σε επίπεδο υποειδών (subspecies, strains) σε σύνθετα δείγματα μεταγονιδιωματικής και μικροβιώματος (microbiome) από πειράματα NGS. Το εργαλείο έχει τη δυνατότητα να χρησιμοποιηθεί σε δείγματα που προέρχονται από πειράματα όπως τα 16S rRNA Sequencing και Shotgun Metagenomic Sequencing καθώς επίσης και τη δυνατότητα του εντοπισμού και ποσοτικοποιήσεις του μικροβιώματος σε μικτά δείγματα ιστού (tissue-specific) DNA/RNA όπου αποτελούνται από τον ξενιστή (άνθρωπος, ποντίκι, άλλα θηλαστικά είδη) και το μικροβίωμα του. Οι βασικές λειτουργίες του εργαλείου που αναπτύχθηκε είναι ο εντοπισμός και ποσοτικοποίηση του μικροβιώματος σε δείγματα NGS, ο υπολογισμός της αφθονίας των μικροβίων στις ταξινομικές βαθμίδες (taxonomic ranks) της οικογένειας, του γένους, του είδους και των υποειδών και τέλος το φιλτράρισμα των αποτελεσμάτων με κριτήρια που ορίζονται από τον χρήστη. Η εργασία παρουσιάζει τα αποτελέσματα που προέκυψαν από το εργαλείο σε συνθετικά αλλά και πραγματικά δεδομένα. Από την σύγκριση που έγινε με άλλα εργαλεία μεταγονιδιωματικής, φαίνεται, πως, σε όλες τις περιπτώσεις, το εργαλείο που αναπτύχθηκε παράγει πιο ακριβή αποτελέσματα και σε πολλές περιπτώσεις είναι ταχύτερο. Το metaHost είναι ένα γρήγορο και με μεγάλη ακρίβεια εργαλείο το οποίο εντοπίζει και ποσοτικοποιεί μικροβιακούς οργανισμούς σε σύνθετα NGS δείγματα μεταγονιδιωματικής, με πλήρης αυτόματο τρόπο και με μεγάλη προσαρμοστικότητα στις ανάγκες κάθε χρήστη.The development of high-throughput sequencing technologies has transformed our capacity to investigate the composition and dynamics of the microbial communities that populate terrestrial and aquatic ecosystems as well as the human skin, gut and oral cavity. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial and viral communities, and hence tend to result in vast file sizes. The purpose of the present study was the design and implementation of a computational tool - pipeline which has the ability to identify and quantify organisms at strain level in complex Microbiomic, Metagenomic, and Metatranscriptomic Next Generation Sequencing (NGS) samples. The pipeline has the ability to be used as a metagenome classifier in 16S rRNA Sequencing and Shotgun Metagenomic Sequencing datasets as well as the ability to analyze mixed tissue-specific DNA/RNA NGS samples consisting of the host (Human, Mouse, other mammalian species) and its microbiome. The main functions of the implemented pipeline are the identification and quantification of the microbiome in NGS samples, the abundance estimation in Family, Genus, Species and Subspecies taxonomic ranks and the filtering of the estimated results based on user-specific criteria. This study presents the results obtained by applying the pipeline to analyze both microbiome and mixed host-microbiome simulated NGS datasets as well as real tissue-specific Mus Musculus RNA datasets obtained from NCBI’s GEO. The comparison between the implemented pipeline and state-of-the-art metagenomic classification tools, showed, that in every case, the pipeline produces more accurate results in terms of abundance estimation and in many cases, is faster too. metaHost is a rapid and accurate pipeline that identifies and quantifies microbiome organisms at strain level in complex Metagenomic – Metatranscriptomic NGS samples based on a fully automated workflow which is easily adaptable to the needs of its users

    Obligate biotroph downy mildew consistently induces near-identical protective microbiomes in Arabidopsis thaliana

    Get PDF
    Hyaloperonospora arabidopsidis (Hpa) is an obligately biotrophic downy mildew that is routinely cultured on Arabidopsis thaliana hosts that harbour complex microbiomes. We hypothesized that the culturing procedure proliferates Hpa-associated microbiota (HAM) in addition to the pathogen and exploited this model system to investigate which microorganisms consistently associate with Hpa. Using amplicon sequencing, we found nine bacterial sequence variants that are shared between at least three out of four Hpa cultures in the Netherlands and Germany and comprise 34% of the phyllosphere community of the infected plants. Whole-genome sequencing showed that representative HAM bacterial isolates from these distinct Hpa cultures are isogenic and that an additional seven published Hpa metagenomes contain numerous sequences of the HAM. Although we showed that HAM benefit from Hpa infection, HAM negatively affect Hpa spore formation. Moreover, we show that pathogen-infected plants can selectively recruit HAM to both their roots and shoots and form a soil-borne infection-associated microbiome that helps resist the pathogen. Understanding the mechanisms by which infection-associated microbiomes are formed might enable breeding of crop varieties that select for protective microbiomes

    Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease

    Get PDF
    The human microbiome is a very active area of research due to its potential to explain health and disease. Advances in high throughput DNA sequencing in the last decade have catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective method to characterize entire microbial communities directly, including unculturable microbes which were previously difficult to study. 16S rRNA sequencing and shotgun metagenomics, coupled with bioinformatics methods have powered the characterization of the human microbiome in different parts of the body. This has led to the discovery of novel links between the microbiome and diseases such as allergies, cancer, and autoimmune diseases. This thesis focuses on the application of both 16S rRNA sequencing and shotgun metagenomics for the characterization of the human microbiome and its relationship with health and disease. We established two methodologies to address these questions. The first methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens involved in disease using shotgun metagenomics technology. In paper I, we apply the proposed pipeline to explore the hypothesis of viral infection as a putative cause of childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary method to the pipeline to improve the detection of unknown viruses, especially those with little or no homology to currently known viruses. We applied this method on a collection of viral-enriched libraries which resulted in the characterization of a new viral-like genome. The second methodology was developed to explore and generate hypothesis from a human skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration of the dataset to discover different aspects on how the microbiome is linked to both diseases. Paper IV follows up from the results of paper III but focuses on characterizing the skin site microbiome variability in Atopic Dermatitis
    corecore