43 research outputs found

    Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    Get PDF
    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow

    These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure

    Full text link
    K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer

    A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

    Full text link
    Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of mapping tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification

    Assembling large, complex environmental metagenomes

    Full text link
    The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more comput\ ationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license.Comment: Includes supporting informatio

    Identification of Soil Microbes Capable of Utilizing Cellobiosan

    Get PDF
    Approximately 100 million tons of anhydrosugars, such as levoglucosan and cellobiosan, are produced through biomass burning every year. These sugars are also produced through fast pyrolysis, the controlled thermal depolymerization of biomass. While the microbial pathways associated with levoglucosan utilization have been characterized, there is little known about cellobiosan utilization. Here we describe the isolation and characterization of six cellobiosan-utilizing microbes from soil samples. Each of these organisms is capable of using both cellobiosan and levoglucosan as sole carbon source, though both minimal and rich media cellobiosan supported significantly higher biomass production than levoglucosan. Ribosomal sequencing was used to identify the closest reported match for these organisms:Sphingobacterium multivorum, Acinetobacter oleivorans JC3-1, Enterobacter sp SJZ-6, andMicrobacterium sps FXJ8.207 and 203 and a fungal species Cryptococcus sp. The commercially-acquired Enterobacter cloacae DSM 16657 showed growth on levoglucosan and cellobiosan, supporting our isolate identification. Analysis of an existing database of 16S rRNA amplicons from Iowa soil samples confirmed the representation of our five bacterial isolates and four previously-reported levoglucosan-utilizing bacterial isolates in other soil samples and provided insight into their population distributions. Phylogenetic analysis of the 16S rRNA and 18S rRNA of strains previously reported to utilize levoglucosan and our newfound isolates showed that the organisms isolated in this study are distinct from previously described anhydrosugar-utilizing microbial species

    Allelic Variation in Outer Membrane Protein A and Its Influence on Attachment of Escherichia coli to Corn Stover

    Get PDF
    Understanding the genetic factors that govern microbe-sediment interactions in aquatic environments is important for water quality management and reduction of waterborne disease outbreaks. Although chemical properties of bacteria have been identified that contribute to initiation of attachment, the outer membrane proteins that contribute to these chemical properties still remain unclear. In this study we explored the attachment of 78 Escherichia coli environmental isolates to corn stover, a representative agricultural residue. Outer membrane proteome analysis led to the observation of amino acid variations, some of which had not been previously described, in outer membrane protein A (OmpA) at 10 distinct locations, including each of the four extracellular loops, three of the eight transmembrane segments, the proline-rich linker and the dimerization domain. Some of the polymorphisms within loops 1, 2, and 3 were found to significantly co-occur. Grouping of sequences according to the outer loop polymorphisms revealed five distinct patterns that each occur in at least 5% of our isolates. The two most common patterns, I and II, are encoded by 33.3 and 20.5% of these isolates and differ at each of the four loops. Statistically significant differences in attachment to corn stover were observed among isolates expressing different versions of OmpA and when different versions of OmpA were expressed in the same genetic background. Most notable was the increased corn stover attachment associated with a loop 3 sequence of SNFDGKN relative to the standard SNVYGKN sequence. These results provide further insight into the allelic variation of OmpA and implicate OmpA in contributing to attachment to corn stover

    Temporal Dynamics of Bacterial Communities in Soil and Leachate Water After Swine Manure Application

    Get PDF
    Application of swine manure to agricultural land allows recycling of plant nutrients, but excess nitrate, phosphorus and fecal bacteria impact surface and drainage water quality. While agronomic and water quality impacts are well studied, little is known about the impact of swine manure slurry on soil microbial communities. We applied swine manure to intact soil columns collected from plots maintained under chisel plow or no-till with corn and soybean rotation. Targeted 16S-rRNA gene sequencing was used to characterize and to identify shifts in bacterial communities in soil over 108 days after swine manure application. In addition, six simulated rainfalls were applied during this time. Drainage water from the columns and surface soil were sampled, and DNA was extracted and sequenced. Unique DNA sequences (OTU) associated with 12 orders of bacteria were responsible for the majority of OTUs stimulated by manure application. Proteobacteria were most prevalent, followed by Bacteroidetes, Firmicutes, Actinobacteria, and Spirochaetes. While the majority of the 12 orders decreased after day 59, relative abundances of genes associated with Rhizobiales and Actinomycetales in soil increased. Bacterial orders which were stimulated by manure application in soil had varied responses in drainage waters over the course of the experiment. We also identified a “manure-specific core” of five genera who comprised 13% of the manure community and were not significantly abundant in non-manured control soils. Of these five genera, Clostridium sensu stricto was the only genus which did not return to pre-manure relative abundance in soil by day 108. Our results show that enrichment responses after manure amendment could result from displacement of native soil bacteria by manure-borne bacteria during the application process or growth of native bacteria using manure-derived available nutrients

    The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus

    Get PDF
    Background: The barber's pole worm, Haemonchus contortus, is one of the most economically important parasites of small ruminants worldwide. Although this parasite can be controlled using anthelmintic drugs, resistance against most drugs in common use has become a widespread problem. We provide a draft of the genome and the transcriptomes of all key developmental stages of H. contortus to support biological and biotechnological research areas of this and related parasites. Results: The draft genome of H. contortus is 320 Mb in size and encodes 23,610 protein-coding genes. On a fundamental level, we elucidate transcriptional alterations taking place throughout the life cycle, characterize the parasite's gene silencing machinery, and explore molecules involved in development, reproduction, host-parasite interactions, immunity, and disease. The secretome of H. contortus is particularly rich in peptidases linked to blood-feeding activity and interactions with host tissues, and a diverse array of molecules is involved in complex immune responses. On an applied level, we predict drug targets and identify vaccine molecules. Conclusions: The draft genome and developmental transcriptome of H. contortus provide a major resource to the scientific community for a wide range of genomic, genetic, proteomic, metabolomic, evolutionary, biological, ecological, and epidemiological investigations, and a solid foundation for biotechnological outcomes, including new anthelmintics, vaccines and diagnostic tests. This first draft genome of any strongylid nematode paves the way for a rapid acceleration in our understanding of a wide range of socioeconomically important parasites of one of the largest nematode orders

    Strategies to improve reference databases for soil microbiomes

    Get PDF
    Microbial populations in the soil are critical in our lives. The soil microbiome helps to grow our food, nourishing and protecting plants, while also providing important ecological services such as erosion protection, water filtration and climate regulation. We are increasingly aware of the tremendous microbial diversity that has a role in soil heath; yet, despite significant efforts to isolate microbes from the soil, we have accessed only a small fraction of its biodiversity. Even with novel cell isolation techniques
    corecore