14 research outputs found
Pseudoalignment for metagenomic read assignment
Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains.
Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects
Bioactive glycans in a microbiome-directed food for children with malnutrition
Evidence is accumulating that perturbed postnatal development of the gut microbiome contributes to childhood malnutritio
Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice
The study of microorganisms is a field of great interest due to their environmental (e.g., soil
contamination) and biomedical (e.g., parasitic diseases, autism) importance. The advent of revolutionary
next-generation sequencing techniques, and their application to the hypervariable regions
of the 16S, 18S or 23S ribosomal subunits, have allowed the research of a large variety of organisms
more in-depth, including bacteria, archaea, eukaryotes and fungi. Additionally, together with the
development of analysis software, the creation of specific databases (e.g., SILVA or RDP) has boosted
the enormous growth of these studies. As the cost of sequencing per sample has continuously decreased,
new protocols have also emerged, such as shotgun sequencing, which allows the profiling of
all taxonomic domains in a sample. The sequencing of hypervariable regions and shotgun sequencing
are technologies that enable the taxonomic classification of microorganisms from the DNA present
in microbial communities. However, they are not capable of measuring what is actively expressed.
Conversely, we advocate that metatranscriptomics is a “new” technology that makes the identification
of the mRNAs of a microbial community possible, quantifying gene expression levels and active
biological pathways. Furthermore, it can be also used to characterise symbiotic interactions between
the host and its microbiome. In this manuscript, we examine the three technologies above, and
discuss the implementation of different software and databases, which greatly impact the obtaining of
reliable results. Finally, we have developed two easy-to-use pipelines leveraging Nextflow technology.
These aim to provide everything required for an average user to perform a metagenomic analysis of
marker genes with QIMME2 and a metatranscriptomic study using Kraken2/Bracken.regional Andalusian GovernmentPOSTDOC_21 _0039
Ανάλυση δεδομένων μεταγονιδιωματικής (metagenomics) και μικροβιώματος (microbiome) από πειράματα NGS με την τεχνική του quasi-mapping
Η ανάπτυξη των τεχνολογιών αλληλούχισης επόμενης γενιάς (Next Generation Sequencing - NGS) έχουν μετατρέψει την ικανότητα μας για διερεύνηση της σύνθεσης και δυναμικής των μικροβιακών κοινοτήτων που κατοικούν στα χερσαία και υδάτινα οικοσυστήματα, καθώς επίσης και στο ανθρώπινο δέρμα, το έντερο και το στόμα. Τα δεδομένα μεταγονιδιωματικής (metagenomics) που προκύπτουν από πειράματα NGS, συνήθως περιλαμβάνουν ένα μεγάλο αριθμό από μικροοργανισμούς (βακτήρια, ιούς κ.τ.λ.) και ως εκ τούτου, συνήθως, παράγουν αρχεία πολύ μεγάλου μεγέθους.
Σκοπός της παρούσας εργασίας ήταν ο σχεδιασμός και η ανάπτυξη ενός υπολογιστικού εργαλείου το οποίο να έχει τη δυνατότητα να εντοπίζει και να ποσοτικοποιεί τους οργανισμούς σε επίπεδο υποειδών (subspecies, strains) σε σύνθετα δείγματα μεταγονιδιωματικής και μικροβιώματος (microbiome) από πειράματα NGS. Το εργαλείο έχει τη δυνατότητα να χρησιμοποιηθεί σε δείγματα που προέρχονται από πειράματα όπως τα 16S rRNA Sequencing και Shotgun Metagenomic Sequencing καθώς επίσης και τη δυνατότητα του εντοπισμού και ποσοτικοποιήσεις του μικροβιώματος σε μικτά δείγματα ιστού (tissue-specific) DNA/RNA όπου αποτελούνται από τον ξενιστή (άνθρωπος, ποντίκι, άλλα θηλαστικά είδη) και το μικροβίωμα του.
Οι βασικές λειτουργίες του εργαλείου που αναπτύχθηκε είναι ο εντοπισμός και ποσοτικοποίηση του μικροβιώματος σε δείγματα NGS, ο υπολογισμός της αφθονίας των μικροβίων στις ταξινομικές βαθμίδες (taxonomic ranks) της οικογένειας, του γένους, του είδους και των υποειδών και τέλος το φιλτράρισμα των αποτελεσμάτων με κριτήρια που ορίζονται από τον χρήστη.
Η εργασία παρουσιάζει τα αποτελέσματα που προέκυψαν από το εργαλείο σε συνθετικά αλλά και πραγματικά δεδομένα. Από την σύγκριση που έγινε με άλλα εργαλεία μεταγονιδιωματικής, φαίνεται, πως, σε όλες τις περιπτώσεις, το εργαλείο που αναπτύχθηκε παράγει πιο ακριβή αποτελέσματα και σε πολλές περιπτώσεις είναι ταχύτερο.
Το metaHost είναι ένα γρήγορο και με μεγάλη ακρίβεια εργαλείο το οποίο εντοπίζει και ποσοτικοποιεί μικροβιακούς οργανισμούς σε σύνθετα NGS δείγματα μεταγονιδιωματικής, με πλήρης αυτόματο τρόπο και με μεγάλη προσαρμοστικότητα στις ανάγκες κάθε χρήστη.The development of high-throughput sequencing technologies has transformed our capacity to investigate the composition and dynamics of the microbial communities that populate terrestrial and aquatic ecosystems as well as the human skin, gut and oral cavity. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial and viral communities, and hence tend to result in vast file sizes.
The purpose of the present study was the design and implementation of a computational tool - pipeline which has the ability to identify and quantify organisms at strain level in complex Microbiomic, Metagenomic, and Metatranscriptomic Next Generation Sequencing (NGS) samples. The pipeline has the ability to be used as a metagenome classifier in 16S rRNA Sequencing and Shotgun Metagenomic Sequencing datasets as well as the ability to analyze mixed tissue-specific DNA/RNA NGS samples consisting of the host (Human, Mouse, other mammalian species) and its microbiome.
The main functions of the implemented pipeline are the identification and quantification of the microbiome in NGS samples, the abundance estimation in Family, Genus, Species and Subspecies taxonomic ranks and the filtering of the estimated results based on user-specific criteria.
This study presents the results obtained by applying the pipeline to analyze both microbiome and mixed host-microbiome simulated NGS datasets as well as real tissue-specific Mus Musculus RNA datasets obtained from NCBI’s GEO. The comparison between the implemented pipeline and state-of-the-art metagenomic classification tools, showed, that in every case, the pipeline produces more accurate results in terms of abundance estimation and in many cases, is faster too.
metaHost is a rapid and accurate pipeline that identifies and quantifies microbiome organisms at strain level in complex Metagenomic – Metatranscriptomic NGS samples based on a fully automated workflow which is easily adaptable to the needs of its users
Obligate biotroph downy mildew consistently induces near-identical protective microbiomes in Arabidopsis thaliana
Hyaloperonospora arabidopsidis (Hpa) is an obligately biotrophic downy mildew that is routinely cultured on Arabidopsis thaliana hosts that harbour complex microbiomes. We hypothesized that the culturing procedure proliferates Hpa-associated microbiota (HAM) in addition to the pathogen and exploited this model system to investigate which microorganisms consistently associate with Hpa. Using amplicon sequencing, we found nine bacterial sequence variants that are shared between at least three out of four Hpa cultures in the Netherlands and Germany and comprise 34% of the phyllosphere community of the infected plants. Whole-genome sequencing showed that representative HAM bacterial isolates from these distinct Hpa cultures are isogenic and that an additional seven published Hpa metagenomes contain numerous sequences of the HAM. Although we showed that HAM benefit from Hpa infection, HAM negatively affect Hpa spore formation. Moreover, we show that pathogen-infected plants can selectively recruit HAM to both their roots and shoots and form a soil-borne infection-associated microbiome that helps resist the pathogen. Understanding the mechanisms by which infection-associated microbiomes are formed might enable breeding of crop varieties that select for protective microbiomes
Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease
The human microbiome is a very active area of research due to its potential to explain
health and disease. Advances in high throughput DNA sequencing in the last decade have
catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective
method to characterize entire microbial communities directly, including unculturable
microbes which were previously difficult to study. 16S rRNA sequencing and shotgun
metagenomics, coupled with bioinformatics methods have powered the characterization of
the human microbiome in different parts of the body. This has led to the discovery of novel
links between the microbiome and diseases such as allergies, cancer, and autoimmune
diseases.
This thesis focuses on the application of both 16S rRNA sequencing and shotgun
metagenomics for the characterization of the human microbiome and its relationship with
health and disease. We established two methodologies to address these questions. The first
methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens
involved in disease using shotgun metagenomics technology. In paper I, we apply the
proposed pipeline to explore the hypothesis of viral infection as a putative cause of
childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary
method to the pipeline to improve the detection of unknown viruses, especially those with
little or no homology to currently known viruses. We applied this method on a collection of
viral-enriched libraries which resulted in the characterization of a new viral-like genome.
The second methodology was developed to explore and generate hypothesis from a human
skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the
analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration
of the dataset to discover different aspects on how the microbiome is linked to both
diseases. Paper IV follows up from the results of paper III but focuses on characterizing
the skin site microbiome variability in Atopic Dermatitis