869 research outputs found

    Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

    Get PDF
    A major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels. We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic classifier MetaPhyler (http://metaphyler.cbcb.umd.edu), which uses phylogenetic marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results by analyzing a real metagenomic dataset. We have introduced a novel taxonomic classification method for analyzing the microbial diversity from whole-metagenome shotgun sequences. Compared with previous approaches, MetaPhyler is much more accurate in estimating the phylogenetic composition. In addition, we have shown that MetaPhyler can be used to guide the discovery of novel organisms from metagenomic samples.https://doi.org/10.1186/1471-2164-12-S2-S

    The impact of the neisserial DNA uptake sequences on genome evolution and stability

    Get PDF
    A study of the origin and distribution of the abundant short DNA uptake sequence (DUS) in six genomes of Neisseria suggests that transformation and recombination are tightly linked in evolution and that recombination has a key role in the establishment of DUS

    Faster Pan-Genome Construction for Efficient Differentiation of Naturally Occurring and Engineered Plasmids with Plaster

    Get PDF
    As sequence databases grow, characterizing diversity across extremely large collections of genomes requires the development of efficient methods that avoid costly all-vs-all comparisons [Marschall et al., 2018]. In addition to exponential increases in the amount of natural genomes being sequenced, improved techniques for the creation of human engineered sequences is ushering in a new wave of synthetic genome sequence databases that grow alongside naturally occurring genome databases. In this paper, we analyze the full diversity of available sequenced natural and synthetic plasmid genome sequences. This diversity can be represented by a data structure that captures all presently available nucleotide sequences, known as a pan-genome. In our case, we construct a single linear pan-genome nucleotide sequence that captures this diversity. To process such a large number of sequences, we introduce the plaster algorithmic pipeline. Using plaster we are able to construct the full synthetic plasmid pan-genome from 51,047 synthetic plasmid sequences as well as a natural pan-genome from 6,642 natural plasmid sequences. We demonstrate the efficacy of plaster by comparing its speed against another pan-genome construction method as well as demonstrating that nearly all plasmids align well to their corresponding pan-genome. Finally, we explore the use of pan-genome sequence alignment to distinguish between naturally occurring and synthetic plasmids. We believe this approach will lead to new techniques for rapid characterization of engineered plasmids. Applications for this work include detection of genome editing, tracking an unknown plasmid back to its lab of origin, and identifying naturally occurring sequences that may be of use to the synthetic biology community. The source code for fully reconstructing the natural and synthetic plasmid pan-genomes as well for plaster are publicly available and can be downloaded at https://gitlab.com/qiwangrice/plaster.git

    genoPlotR: comparative gene and genome visualization in R

    Get PDF
    Summary: The amount of gene and genome data obtained by next-generation sequencing technologies generates a need for comparative visualization tools. Complementing existing software for comparison and exploration of genomics data, genoPlotR automatically creates publication-grade linear maps of gene and genomes, in a highly automatic, flexible and reproducible way

    Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

    Get PDF
    Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of “binning” the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a simple synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique

    Joint scaling laws in functional and evolutionary categories in prokaryotic genomes

    Get PDF
    We propose and study a class-expansion/innovation/loss model of genome evolution taking into account biological roles of genes and their constituent domains. In our model numbers of genes in different functional categories are coupled to each other. For example, an increase in the number of metabolic enzymes in a genome is usually accompanied by addition of new transcription factors regulating these enzymes. Such coupling can be thought of as a proportional "recipe" for genome composition of the type "a spoonful of sugar for each egg yolk". The model jointly reproduces two known empirical laws: the distribution of family sizes and the nonlinear scaling of the number of genes in certain functional categories (e.g. transcription factors) with genome size. In addition, it allows us to derive a novel relation between the exponents characterising these two scaling laws, establishing a direct quantitative connection between evolutionary and functional categories. It predicts that functional categories that grow faster-than-linearly with genome size to be characterised by flatter-than-average family size distributions. This relation is confirmed by our bioinformatics analysis of prokaryotic genomes. This proves that the joint quantitative trends of functional and evolutionary classes can be understood in terms of evolutionary growth with proportional recipes.Comment: 39 pages, 21 figure

    MetAMOS: A modular and open source metagenomic assembly and analysis pipeline

    Get PDF
    © 2013 Treangen et al. We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS

    Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths

    Get PDF
    Abstract Background Late Pleistocene North America hosted at least two divergent and ecologically distinct species of mammoth: the periglacial woolly mammoth (Mammuthus primigenius) and the subglacial Columbian mammoth (Mammuthus columbi). To date, mammoth genetic research has been entirely restricted to woolly mammoths, rendering their genetic evolution difficult to contextualize within broader Pleistocene paleoecology and biogeography. Here, we take an interspecific approach to clarifying mammoth phylogeny by targeting Columbian mammoth remains for mitogenomic sequencing. Results We sequenced the first complete mitochondrial genome of a classic Columbian mammoth, as well as the first complete mitochondrial genome of a North American woolly mammoth. Somewhat contrary to conventional paleontological models, which posit that the two species were highly divergent, the M. columbi mitogenome we obtained falls securely within a subclade of endemic North American M. primigenius. Conclusions Though limited, our data suggest that the two species interbred at some point in their evolutionary histories. One potential explanation is that woolly mammoth haplotypes entered Columbian mammoth populations via introgression at subglacial ecotones, a scenario with compelling parallels in extant elephants and consistent with certain regional paleontological observations. This highlights the need for multi-genomic data to sufficiently characterize mammoth evolutionary history. Our results demonstrate that the use of next-generation sequencing technologies holds promise in obtaining such data, even from non-cave, non-permafrost Pleistocene depositional contexts.http://deepblue.lib.umich.edu/bitstream/2027.42/112426/1/13059_2011_Article_2544.pd
    • 

    corecore