824 research outputs found
Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics
Whole genome sequencing (WGS) provides the highest resolution for genome-based species identification and can provide insight into the antimicrobial resistance and virulence potential of a single microbiological isolate during the diagnostic process. In contrast, metagenomic sequencing allows the analysis of DNA segments from multiple microorganisms within a community, either using an amplicon- or shotgun-based approach. However, WGS and shotgun metagenomic data are rarely combined, although such an approach may generate additive or synergistic information, critical for, e.g., patient management, infection control, and pathogen surveillance. To produce a combined workflow with actionable outputs, we need to understand the pre-to-post analytical process of both technologies. This will require specific databases storing interlinked sequencing and metadata, and also involves customized bioinformatic analytical pipelines. This review article will provide an overview of the critical steps and potential clinical application of combining WGS and metagenomics together for microbiological diagnosis
Literature on applied machine learning in metagenomic classification: A scoping review
Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement
Use of Whole Genome Shotgun Sequencing for the Analysis of Microbial Communities in Arabidopsis thaliana Leaves
Microorganisms, such as all Bacteria, Archaeae, and some Eukaryotes, inhabit all
imaginable habitats in the planet, from water vents in the deep ocean to extreme environments of
high temperature and salinity. Microbes also constitute the most diverse group of organisms in terms
if genetic information, metabolic function, and taxonomy. Furthermore, many of these microbes
establish complex interactions with each others and with many other multicellular organisms. The
collection of microbes that share a body space with a plant or animal is called the microbiota, and
their genetic information is called the microbiome.
The microbiota has emerged as a crucial determinant of a host’s overall health and
understanding it has become crucial in many biological fields. In mammals, the gut microbiota has
been linked to important diseases such as diabetes, inflammatory bowel disease, and dementia. In
plants, the microbiota can provide protection against certain pathogens or confer resistance against
harsh environmental conditions such as drought. Furthermore, the leaves of plants represent one of
the largest surface areas that can potentially be colonized by microbes.
The advent of sequencing technologies has let researchers to study microbial communities
at unprecedented resolution and scale. By targeting individual loci such as the 16S rDNA locus in
bacteria, many species can be studied simultaneously, as well as their properties such as relative
abundance without the need of individual isolation of target taxa. Decreasing costs of DNA
sequencing has also led to whole shotgun sequencing where instead of targeting a single or a
number of loci, random fragments of DNA are sequenced. This effectively renders the entire
microbiome accessible to study, referred to as metagenomics. Consequently many more areas of
investigation are open, such as the exploration of within host genetic diversity, functional analysis, or
assembly of individual genomes from metagenomes.
In this study, I described the analysis of metagenomic sequencing data from microbial
11
communities in leaves of wild Arabidopsis thaliana individuals from southwest Germany. As a model
organisms, A. thaliana not only is accessible in the wild but also has a rich body of previous research
in plant-microbe interactions. In the first section, I describe how whole shotgun sequencing of leaf
DNA extracts can be used to accurately describe the taxonomic composition of the microbial
community of individual hosts. The nature of whole shotgun sequencing is used to estimate true
microbial abundances which can not be done with amplicons sequencing. I show how this
community varies across hosts, but some trends are seen, such as the dominance of the bacterial
genera Pseudomonas and Sphingomonas . Moreover, even though there is variation between
individuals, I explore the influence of site of origin and host genotype. Finally, metagenomic
assembly is applied to individual samples, showing the limitations of WGS in plant leaves.
In the second section, I explore the genomic diversity of the most abundant genera:
Pseudomonas and Sphingomonas . I use a core genome approach where a set of common genes is
obtained from previously sequenced and assembled genomes. Thereafter, the gene sequences of
the core genome is used as a reference for short genome mapping. Based on these mappings,
individual strain mixtures are inferred based on the frequency distribution of non reference bases at
each detected single nucleotide polymorphism (SNP). Finally, SNP’s are then used to derive
population structure of strain mixtures across samples and with known reference genomes.
In conclusion, this thesis provides insights into the use of metagenomic sequencing to study
microbial populations in wild plants. I identify the strengths and weaknesses of using whole genome
sequencing for this purpose. As well as a way to study strain level dynamics of prevalent taxa within
a single host
Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era
Metagenomics has become one of the indispensable tools in microbial ecology for the last few decades, and a new revolution in metagenomic studies is now about to begin, with the help of recent advances of sequencing techniques. The massive data production and substantial cost reduction in next-generation sequencing have led to the rapid growth of metagenomic research both quantitatively and qualitatively. It is evident that metagenomics will be a standard tool for studying the diversity and function of microbes in the near future, as fingerprinting methods did previously. As the speed of data accumulation is accelerating, bioinformatic tools and associated databases for handling those datasets have become more urgent and necessary. To facilitate the bioinformatics analysis of metagenomic data, we review some recent tools and databases that are used widely in this field and give insights into the current challenges and future of metagenomics from a bioinformatics perspective.
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
Tailoring bioinformatics strategies for the characterization of the human microbiome in health and disease
The human microbiome is a very active area of research due to its potential to explain
health and disease. Advances in high throughput DNA sequencing in the last decade have
catalyzed the growth of microbiome research; DNA sequencing allows for a cost-effective
method to characterize entire microbial communities directly, including unculturable
microbes which were previously difficult to study. 16S rRNA sequencing and shotgun
metagenomics, coupled with bioinformatics methods have powered the characterization of
the human microbiome in different parts of the body. This has led to the discovery of novel
links between the microbiome and diseases such as allergies, cancer, and autoimmune
diseases.
This thesis focuses on the application of both 16S rRNA sequencing and shotgun
metagenomics for the characterization of the human microbiome and its relationship with
health and disease. We established two methodologies to address these questions. The first
methodology is a bench-to-bioinformatics pipeline to discover putative viral pathogens
involved in disease using shotgun metagenomics technology. In paper I, we apply the
proposed pipeline to explore the hypothesis of viral infection as a putative cause of
childhood Acute Lymphoblastic Leukemia. In paper II, we propose a complementary
method to the pipeline to improve the detection of unknown viruses, especially those with
little or no homology to currently known viruses. We applied this method on a collection of
viral-enriched libraries which resulted in the characterization of a new viral-like genome.
The second methodology was developed to explore and generate hypothesis from a human
skin microbiome dataset of Psoriasis and Atopic Dermatitis patients. The results of the
analysis are presented in Paper III and Paper IV. Paper III is a pure data-driven exploration
of the dataset to discover different aspects on how the microbiome is linked to both
diseases. Paper IV follows up from the results of paper III but focuses on characterizing
the skin site microbiome variability in Atopic Dermatitis
Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards
The challenges of defining the human nasopharyngeal resistome
The nasopharynx is an important microbial reservoir for the emergence and spread of antibiotic-resistant organisms. The nasopharyngeal resistome is an extensive, adaptable reservoir of antibiotic-resistance genes (ARGs) within this niche. Metagenomic sequencing decodes the genetic material of all organisms within a sample using next-generation technologies, permitting unbiased discovery of novel ARGs and associated mobile genetic elements (MGEs). The challenges of sequencing a low-biomass bacterial sample have limited exploration of the nasopharyngeal resistome. Here, we explore the current understanding of the nasopharyngeal resistome, particularly the role of MGEs in propagating antimicrobial resistance (AMR), explore the advantages and limitations of metagenomic sequencing technologies and bioinformatic pipelines for nasopharyngeal resistome analysis, and highlight the key outstanding questions for future research
Understanding host-microbe interactions in maize kernel and sweetpotato leaf metagenomic profiles.
Functional and quantitative metagenomic profiling remains challenging and limits our understanding of host-microbe interactions. This body of work aims to mediate these challenges by using a novel quantitative reduced representation sequencing strategy (OmeSeq-qRRS), development of a fully automated software for quantitative metagenomic/microbiome profiling (Qmatey: quantitative metagenomic alignment and taxonomic identification using exact-matching) and implementing these tools for understanding plant-microbe-pathogen interactions in maize and sweetpotato. The next generation sequencing-based OmeSeq-qRRS leverages the strengths of shotgun whole genome sequencing and costs lower that the more affordable amplicon sequencing method. The novel FASTQ data compression/indexing and enhanced-multithreading of the MegaBLAST in Qmatey allows for computational speeds several thousand-folds faster than typical runs. Regardless of sample number, the analytical pipeline can be completed within days for genome-wide sequence data and provides broad-spectrum taxonomic profiling (virus to eukaryotes). As a proof of concept, these protocols and novel analytical pipelines were implemented to characterize the viruses within the leaf microbiome of a sweetpotato population that represents the global genetic diversity and the kernel microbiomes of genetically modified (GMO) and nonGMO maize hybrids. The metagenome profiles and high-density SNP data were integrated to identify host genetic factors (disease resistance and intracellular transport candidate genes) that underpin sweetpotato-virus interactions Additionally, microbial community dynamics were observed in the presence of pathogens, leading to the identification of multipartite interactions that modulate disease severity through co-infection and species competition. This study highlights a low-cost, quantitative and strain/species-level metagenomic profiling approach, new tools that complement the assay’s novel features and provide fast computation, and the potential for advancing functional metagenomic studies
- …