440 research outputs found
A base composition analysis of natural patterns for the preprocessing of metagenome sequences
Background: On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Results: Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. Conclusions: We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code
Metagenome – Processing and Analysis
Metagenome means “multiple genomes” and the study of culture independent genomic content in environment is called metagenomics. Because of the advent of powerful and economic next generation sequencing technology, sequencing has become cheaper and faster and thus the study of genes and phenotypes is transitioning from single organism to that of a community present in the natural environmental sample. Once sequence data are obtained from an environmental sample, the challenge is to process, assemble and bin the metagenome data in order to get as accurate and complete a representation of the populations present in the community or to get high confident draft assembly. In this paper we describe the existing bioinformatics workflow to process the metagenomic data. Next, we examine one way of parallelizing the sequence similarity program on a High Performance Computing (HPC) cluster since sequence similarity is the most common and frequently used technique throughout the metagenome data processing and analyzing steps. In order to address the challenges involved in analyzing the result file obtained from sequence similarity program, we developed a web application tool called Contig Analysis Tool (CAT). Later, we applied the tools and techniques to the real world virome metagenomic data i.e., to the genomes of all the viruses present in the environmental sample obtained from microbial mats derived from hot springs in Yellowstone National Park. There are several challenges associated with the assembly and binning of virome data particularly because of the following reasons: 1. Not many viral sequence data in the existing databases for sequence similarity. 2. No reference genome 3. No phylogenetic marker genes like the ones present in the bacteria and archaea. We will see how we overcame these problems by performing sequence similarity using CRISPR data and sequence composition using tetranucleotide analysis
Microbial community dynamics and coexistence in a sulfide-driven phototrophic bloom
© The Author(s), 2020. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Bhatnagar, S., Cowley, E. S., Kopf, S. H., Pérez Castro, S., Kearney, S., Dawson, S. C., Hanselmann, K., & Ruff, S. E. Microbial community dynamics and coexistence in a sulfide-driven phototrophic bloom. Environmental Microbiome, 15(1),(2020): 3, doi:10.1186/s40793-019-0348-0.Background: Lagoons are common along coastlines worldwide and are important for biogeochemical element cycling, coastal biodiversity, coastal erosion protection and blue carbon sequestration. These ecosystems are frequently disturbed by weather, tides, and human activities. Here, we investigated a shallow lagoon in New England. The brackish ecosystem releases hydrogen sulfide particularly upon physical disturbance, causing blooms of anoxygenic sulfur-oxidizing phototrophs. To study the habitat, microbial community structure, assembly and function we carried out in situ experiments investigating the bloom dynamics over time.
Results: Phototrophic microbial mats and permanently or seasonally stratified water columns commonly contain multiple phototrophic lineages that coexist based on their light, oxygen and nutrient preferences. We describe similar coexistence patterns and ecological niches in estuarine planktonic blooms of phototrophs. The water column showed steep gradients of oxygen, pH, sulfate, sulfide, and salinity. The upper part of the bloom was dominated by aerobic phototrophic Cyanobacteria, the middle and lower parts by anoxygenic purple sulfur bacteria (Chromatiales) and green sulfur bacteria (Chlorobiales), respectively. We show stable coexistence of phototrophic lineages from five bacterial phyla and present metagenome-assembled genomes (MAGs) of two uncultured Chlorobaculum and Prosthecochloris species. In addition to genes involved in sulfur oxidation and photopigment biosynthesis the MAGs contained complete operons encoding for terminal oxidases. The metagenomes also contained numerous contigs affiliating with Microviridae viruses, potentially affecting Chlorobi. Our data suggest a short sulfur cycle within the bloom in which elemental sulfur produced by sulfide-oxidizing phototrophs is most likely reduced back to sulfide by Desulfuromonas sp.
Conclusions: The release of sulfide creates a habitat selecting for anoxygenic sulfur-oxidizing phototrophs, which in turn create a niche for sulfur reducers. Strong syntrophism between these guilds apparently drives a short sulfur cycle that may explain the rapid development of the bloom. The fast growth and high biomass yield of Chlorobi-affiliated organisms implies that the studied lineages of green sulfur bacteria can thrive in hypoxic habitats. This oxygen tolerance is corroborated by oxidases found in MAGs of uncultured Chlorobi. The findings improve our understanding of the ecology and ecophysiology of anoxygenic phototrophs and their impact on the coupled biogeochemical cycles of sulfur and carbon.This work was carried out at the Microbial Diversity summer course at the Marine Biological Laboratory in Woods Hole, MA. The course was supported by grants from National Aeronautics and Space Administration, the US Department of Energy, the Simons Foundation, the Beckman Foundation, and the Agouron Institute. Additional funding for SER was provided by the Marine Biological Laboratory
The MGX framework for microbial community analysis
Jaenicke S. The MGX framework for microbial community analysis. Bielefeld: Universität Bielefeld; 2020
A reservoir of 'historical' antibiotic resistance genes in remote pristine Antarctic soils
Background: Soil bacteria naturally produce antibiotics as a competitive mechanism, with a concomitant evolution, and exchange by horizontal gene transfer, of a range of antibiotic resistance mechanisms. Surveys of bacterial resistance elements in edaphic systems have originated primarily from human-impacted environments, with relatively little information from remote and pristine environments, where the resistome may comprise the ancestral gene diversity.
Methods: We used shotgun metagenomics to assess antibiotic resistance gene (ARG) distribution in 17 pristine and remote Antarctic surface soils within the undisturbed Mackay Glacier region. We also interrogated the phylogenetic placement of ARGs compared to environmental ARG sequences and tested for the presence of horizontal gene transfer elements flanking ARGs.
Results: In total, 177 naturally occurring ARGs were identified, most of which encoded single or multi-drug efflux pumps. Resistance mechanisms for the inactivation of aminoglycosides, chloramphenicol and beta-lactam antibiotics were also common. Gram-negative bacteria harboured most ARGs (71%), with fewer genes from Gram-positive Actinobacteria and Bacilli (Firmicutes) (9%), reflecting the taxonomic composition of the soils. Strikingly, the abundance of ARGs per sample had a strong, negative correlation with species richness (r=-0.49, P < 0.05). This result, coupled with a lack of mobile genetic elements flanking ARGs, suggests that these genes are ancient acquisitions of horizontal transfer events.
Conclusions: ARGs in these remote and uncontaminated soils most likely represent functional efficient historical genes that have since been vertically inherited over generations. The historical ARGs in these pristine environments carry a strong phylogenetic signal and form a monophyletic group relative to ARGs from other similar environments
Use of Whole Genome Shotgun Sequencing for the Analysis of Microbial Communities in Arabidopsis thaliana Leaves
Microorganisms, such as all Bacteria, Archaeae, and some Eukaryotes, inhabit all
imaginable habitats in the planet, from water vents in the deep ocean to extreme environments of
high temperature and salinity. Microbes also constitute the most diverse group of organisms in terms
if genetic information, metabolic function, and taxonomy. Furthermore, many of these microbes
establish complex interactions with each others and with many other multicellular organisms. The
collection of microbes that share a body space with a plant or animal is called the microbiota, and
their genetic information is called the microbiome.
The microbiota has emerged as a crucial determinant of a host’s overall health and
understanding it has become crucial in many biological fields. In mammals, the gut microbiota has
been linked to important diseases such as diabetes, inflammatory bowel disease, and dementia. In
plants, the microbiota can provide protection against certain pathogens or confer resistance against
harsh environmental conditions such as drought. Furthermore, the leaves of plants represent one of
the largest surface areas that can potentially be colonized by microbes.
The advent of sequencing technologies has let researchers to study microbial communities
at unprecedented resolution and scale. By targeting individual loci such as the 16S rDNA locus in
bacteria, many species can be studied simultaneously, as well as their properties such as relative
abundance without the need of individual isolation of target taxa. Decreasing costs of DNA
sequencing has also led to whole shotgun sequencing where instead of targeting a single or a
number of loci, random fragments of DNA are sequenced. This effectively renders the entire
microbiome accessible to study, referred to as metagenomics. Consequently many more areas of
investigation are open, such as the exploration of within host genetic diversity, functional analysis, or
assembly of individual genomes from metagenomes.
In this study, I described the analysis of metagenomic sequencing data from microbial
11
communities in leaves of wild Arabidopsis thaliana individuals from southwest Germany. As a model
organisms, A. thaliana not only is accessible in the wild but also has a rich body of previous research
in plant-microbe interactions. In the first section, I describe how whole shotgun sequencing of leaf
DNA extracts can be used to accurately describe the taxonomic composition of the microbial
community of individual hosts. The nature of whole shotgun sequencing is used to estimate true
microbial abundances which can not be done with amplicons sequencing. I show how this
community varies across hosts, but some trends are seen, such as the dominance of the bacterial
genera Pseudomonas and Sphingomonas . Moreover, even though there is variation between
individuals, I explore the influence of site of origin and host genotype. Finally, metagenomic
assembly is applied to individual samples, showing the limitations of WGS in plant leaves.
In the second section, I explore the genomic diversity of the most abundant genera:
Pseudomonas and Sphingomonas . I use a core genome approach where a set of common genes is
obtained from previously sequenced and assembled genomes. Thereafter, the gene sequences of
the core genome is used as a reference for short genome mapping. Based on these mappings,
individual strain mixtures are inferred based on the frequency distribution of non reference bases at
each detected single nucleotide polymorphism (SNP). Finally, SNP’s are then used to derive
population structure of strain mixtures across samples and with known reference genomes.
In conclusion, this thesis provides insights into the use of metagenomic sequencing to study
microbial populations in wild plants. I identify the strengths and weaknesses of using whole genome
sequencing for this purpose. As well as a way to study strain level dynamics of prevalent taxa within
a single host
Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards
Handling Temperature Bursts Reaching 464 C: Different Microbial Strategies in the Sisters Peak Hydrothermal Chimney
The active venting Sisters Peak (SP) chimney on the Mid-Atlantic Ridge holds the current temperature record for the hottest ever measured hydrothermal fluids (400 degrees C, accompanied by sudden temperature bursts reaching 464 degrees C). Given the unprecedented temperature regime, we investigated the biome of this chimney with a focus on special microbial adaptations for thermal tolerance. The SP metagenome reveals considerable differences in the taxonomic composition from those of other hydrothermal vent and subsurface samples; these could be better explained by temperature than by other available abiotic parameters. The most common species to which SP genes were assigned were thermophilic Aciduliprofundum sp. strain MAR08-339 (11.8%), Hippea maritima (3.8%), Caldisericum exile (1.5%), and Caminibacter mediatlanticus (1.4%) as well as to the mesophilic Niastella koreensis (2.8%). A statistical analysis of associations between taxonomic and functional gene assignments revealed specific overrepresented functional categories: for Aciduliprofundum, protein biosynthesis, nucleotide metabolism, and energy metabolism genes; for Hippea and Caminibacter, cell motility and/or DNA replication and repair system genes; and for Niastella, cell wall and membrane biogenesis genes. Cultured representatives of these organisms inhabit different thermal niches; i.e., Aciduliprofundum has an optimal growth temperature of 70 degrees C, Hippea and Caminibacter have optimal growth temperatures around 55 degrees C, and Niastella grows between 10 and 37 degrees C. Therefore, we posit that the different enrichment profiles of functional categories reflect distinct microbial strategies to deal with the different impacts of the local sudden temperature bursts in disparate regions of the chimney
Recommended from our members
De novo Nanopore read quality improvement using deep learning.
BACKGROUND:Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS:Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS:MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub
- …