81 research outputs found

    Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

    Get PDF
    A major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels. We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic classifier MetaPhyler (http://metaphyler.cbcb.umd.edu), which uses phylogenetic marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results by analyzing a real metagenomic dataset. We have introduced a novel taxonomic classification method for analyzing the microbial diversity from whole-metagenome shotgun sequences. Compared with previous approaches, MetaPhyler is much more accurate in estimating the phylogenetic composition. In addition, we have shown that MetaPhyler can be used to guide the discovery of novel organisms from metagenomic samples.https://doi.org/10.1186/1471-2164-12-S2-S

    The impact of the neisserial DNA uptake sequences on genome evolution and stability

    Get PDF
    A study of the origin and distribution of the abundant short DNA uptake sequence (DUS) in six genomes of Neisseria suggests that transformation and recombination are tightly linked in evolution and that recombination has a key role in the establishment of DUS

    Faster Pan-Genome Construction for Efficient Differentiation of Naturally Occurring and Engineered Plasmids with Plaster

    Get PDF
    As sequence databases grow, characterizing diversity across extremely large collections of genomes requires the development of efficient methods that avoid costly all-vs-all comparisons [Marschall et al., 2018]. In addition to exponential increases in the amount of natural genomes being sequenced, improved techniques for the creation of human engineered sequences is ushering in a new wave of synthetic genome sequence databases that grow alongside naturally occurring genome databases. In this paper, we analyze the full diversity of available sequenced natural and synthetic plasmid genome sequences. This diversity can be represented by a data structure that captures all presently available nucleotide sequences, known as a pan-genome. In our case, we construct a single linear pan-genome nucleotide sequence that captures this diversity. To process such a large number of sequences, we introduce the plaster algorithmic pipeline. Using plaster we are able to construct the full synthetic plasmid pan-genome from 51,047 synthetic plasmid sequences as well as a natural pan-genome from 6,642 natural plasmid sequences. We demonstrate the efficacy of plaster by comparing its speed against another pan-genome construction method as well as demonstrating that nearly all plasmids align well to their corresponding pan-genome. Finally, we explore the use of pan-genome sequence alignment to distinguish between naturally occurring and synthetic plasmids. We believe this approach will lead to new techniques for rapid characterization of engineered plasmids. Applications for this work include detection of genome editing, tracking an unknown plasmid back to its lab of origin, and identifying naturally occurring sequences that may be of use to the synthetic biology community. The source code for fully reconstructing the natural and synthetic plasmid pan-genomes as well for plaster are publicly available and can be downloaded at https://gitlab.com/qiwangrice/plaster.git

    Traumatic Brain Injury in Mice Induces Acute Bacterial Dysbiosis Within the Fecal Microbiome

    Get PDF
    The secondary injury cascade that is activated following traumatic brain injury (TBI) induces responses from multiple physiological systems, including the immune system. These responses are not limited to the area of brain injury; they can also alter peripheral organs such as the intestinal tract. Gut microbiota play a role in the regulation of immune cell populations and microglia activation, and microbiome dysbiosis is implicated in immune dysregulation and behavioral abnormalities. However, changes to the gut microbiome induced after acute TBI remains largely unexplored. In this study, we have investigated the impact of TBI on bacterial dysbiosis. To test the hypothesis that TBI results in changes in microbiome composition, we performed controlled cortical impact (CCI) or sham injury in male 9-weeks old C57BL/6J mice. Fresh stool pellets were collected at baseline and at 24 h post-CCI. 16S rRNA based microbiome analysis was performed to identify differential abundance in bacteria at the genus and species level. In all baseline vs. 24 h post-CCI samples, we evaluated species-level differential abundances via clustered and annotated operational taxonomic units (OTU). At a high-level view, we observed significant changes in two genera after TBI, Marvinbryantia, and Clostridiales. At the species-level, we found significant decreases in three species (Lactobacillus gasseri, Ruminococcus flavefaciens, and Eubacterium ventriosum), and significant increases in two additional species (Eubacterium sulci, and Marvinbryantia formatexigens). These results pinpoint critical changes in the genus-level and species-level microbiome composition in injured mice compared to baseline; highlighting a previously unreported acute dysbiosis in the microbiome after TBI

    MetaCarvel: linking assembly graph motifs to biological variants

    Get PDF
    Reconstructing genomic segments from metagenomics data is a highly complex task. In addition to general challenges, such as repeats and sequencing errors, metagenomic assembly needs to tolerate the uneven depth of coverage among organisms in a community and differences between nearly identical strains. Previous methods have addressed these issues by smoothing genomic variants. We present a variant-aware metagenomic scaffolder called MetaCarvel, which combines new strategies for repeat detection with graph analytics for the discovery of variants. We show that MetaCarvel can accurately reconstruct genomic segments from complex microbial mixtures and correctly identify and characterize several classes of common genomic variants.https://doi.org/10.1186/s13059-019-1791-

    Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths

    Get PDF
    Abstract Background Late Pleistocene North America hosted at least two divergent and ecologically distinct species of mammoth: the periglacial woolly mammoth (Mammuthus primigenius) and the subglacial Columbian mammoth (Mammuthus columbi). To date, mammoth genetic research has been entirely restricted to woolly mammoths, rendering their genetic evolution difficult to contextualize within broader Pleistocene paleoecology and biogeography. Here, we take an interspecific approach to clarifying mammoth phylogeny by targeting Columbian mammoth remains for mitogenomic sequencing. Results We sequenced the first complete mitochondrial genome of a classic Columbian mammoth, as well as the first complete mitochondrial genome of a North American woolly mammoth. Somewhat contrary to conventional paleontological models, which posit that the two species were highly divergent, the M. columbi mitogenome we obtained falls securely within a subclade of endemic North American M. primigenius. Conclusions Though limited, our data suggest that the two species interbred at some point in their evolutionary histories. One potential explanation is that woolly mammoth haplotypes entered Columbian mammoth populations via introgression at subglacial ecotones, a scenario with compelling parallels in extant elephants and consistent with certain regional paleontological observations. This highlights the need for multi-genomic data to sufficiently characterize mammoth evolutionary history. Our results demonstrate that the use of next-generation sequencing technologies holds promise in obtaining such data, even from non-cave, non-permafrost Pleistocene depositional contexts.http://deepblue.lib.umich.edu/bitstream/2027.42/112426/1/13059_2011_Article_2544.pd

    Automated ensemble assembly and validation of microbial genomes

    Get PDF
    The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.https://doi.org/10.1186/1471-2105-15-12

    Improved understanding of biorisk for research involving microbial modification using annotated sequences of concern

    Get PDF
    Regulation of research on microbes that cause disease in humans has historically been focused on taxonomic lists of ‘bad bugs’. However, given our increased knowledge of these pathogens through inexpensive genome sequencing, 5 decades of research in microbial pathogenesis, and the burgeoning capacity of synthetic biologists, the limitations of this approach are apparent. With heightened scientific and public attention focused on biosafety and biosecurity, and an ongoing review by US authorities of dual-use research oversight, this article proposes the incorporation of sequences of concern (SoCs) into the biorisk management regime governing genetic engineering of pathogens. SoCs enable pathogenesis in all microbes infecting hosts that are ‘of concern’ to human civilization. Here we review the functions of SoCs (FunSoCs) and discuss how they might bring clarity to potentially problematic research outcomes involving infectious agents. We believe that annotation of SoCs with FunSoCs has the potential to improve the likelihood that dual use research of concern is recognized by both scientists and regulators before it occurs
    corecore