43 research outputs found
Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)
Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
K-mer abundance analysis is widely used for many purposes in nucleotide
sequence analysis, including data preprocessing for de novo assembly, repeat
detection, and sequencing coverage estimation. We present the khmer software
package for fast and memory efficient online counting of k-mers in sequencing
data sets. Unlike previous methods based on data structures such as hash
tables, suffix arrays, and trie structures, khmer relies entirely on a simple
probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits
online updating and retrieval of k-mer counts in memory which is necessary to
support online k-mer analysis algorithms. On sparse data sets this data
structure is considerably more memory efficient than any exact data structure.
In exchange, the use of a Count-Min Sketch introduces a systematic overcount
for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we
analyze the speed, the memory usage, and the miscount rate of khmer for
generating k-mer frequency distributions and retrieving k-mer counts for
individual k-mers. We also compare the performance of khmer to several other
k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC,
Turtle and KAnalyze. Finally, we examine the effectiveness of profiling
sequencing error, k-mer abundance trimming, and digital normalization of reads
in the context of high khmer false positive rates. khmer is implemented in C++
wrapped in a Python interface, offers a tested and robust API, and is freely
available under the BSD license at github.com/ged-lab/khmer
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified
single-cell genomes, and metagenomes has enabled investigation of a wide range
of organisms and ecosystems. However, sampling variation in short-read data
sets and high sequencing error rates of modern sequencers present many new
computational challenges in data interpretation. These challenges have led to
the development of new classes of mapping tools and {\em de novo} assemblers.
These algorithms are challenged by the continued improvement in sequencing
throughput. We here describe digital normalization, a single-pass computational
algorithm that systematizes coverage in shotgun sequencing data sets, thereby
decreasing sampling variation, discarding redundant data, and removing the
majority of errors. Digital normalization substantially reduces the size of
shotgun data sets and decreases the memory and time requirements for {\em de
novo} sequence assembly, all without significantly impacting content of the
generated contigs. We apply digital normalization to the assembly of microbial
genomic data, amplified single-cell genomic data, and transcriptomic data. Our
implementation is freely available for use and modification
Assembling large, complex environmental metagenomes
The large volumes of sequencing data required to sample complex environments
deeply pose new challenges to sequence analysis approaches. De novo metagenomic
assembly effectively reduces the total amount of data to be analyzed but
requires significant computational resources. We apply two pre-assembly
filtering approaches, digital normalization and partitioning, to make large
metagenome assemblies more comput\ ationaly tractable. Using a human gut mock
community dataset, we demonstrate that these methods result in assemblies
nearly identical to assemblies from unprocessed data. We then assemble two
large soil metagenomes from matched Iowa corn and native prairie soils. The
predicted functional content and phylogenetic origin of the assembled contigs
indicate significant taxonomic differences despite similar function. The
assembly strategies presented are generic and can be extended to any
metagenome; full source code is freely available under a BSD license.Comment: Includes supporting informatio
Identification of Soil Microbes Capable of Utilizing Cellobiosan
Approximately 100 million tons of anhydrosugars, such as levoglucosan and cellobiosan, are produced through biomass burning every year. These sugars are also produced through fast pyrolysis, the controlled thermal depolymerization of biomass. While the microbial pathways associated with levoglucosan utilization have been characterized, there is little known about cellobiosan utilization. Here we describe the isolation and characterization of six cellobiosan-utilizing microbes from soil samples. Each of these organisms is capable of using both cellobiosan and levoglucosan as sole carbon source, though both minimal and rich media cellobiosan supported significantly higher biomass production than levoglucosan. Ribosomal sequencing was used to identify the closest reported match for these organisms:Sphingobacterium multivorum, Acinetobacter oleivorans JC3-1, Enterobacter sp SJZ-6, andMicrobacterium sps FXJ8.207 and 203 and a fungal species Cryptococcus sp. The commercially-acquired Enterobacter cloacae DSM 16657 showed growth on levoglucosan and cellobiosan, supporting our isolate identification. Analysis of an existing database of 16S rRNA amplicons from Iowa soil samples confirmed the representation of our five bacterial isolates and four previously-reported levoglucosan-utilizing bacterial isolates in other soil samples and provided insight into their population distributions. Phylogenetic analysis of the 16S rRNA and 18S rRNA of strains previously reported to utilize levoglucosan and our newfound isolates showed that the organisms isolated in this study are distinct from previously described anhydrosugar-utilizing microbial species
Allelic Variation in Outer Membrane Protein A and Its Influence on Attachment of Escherichia coli to Corn Stover
Understanding the genetic factors that govern microbe-sediment interactions in aquatic environments is important for water quality management and reduction of waterborne disease outbreaks. Although chemical properties of bacteria have been identified that contribute to initiation of attachment, the outer membrane proteins that contribute to these chemical properties still remain unclear. In this study we explored the attachment of 78 Escherichia coli environmental isolates to corn stover, a representative agricultural residue. Outer membrane proteome analysis led to the observation of amino acid variations, some of which had not been previously described, in outer membrane protein A (OmpA) at 10 distinct locations, including each of the four extracellular loops, three of the eight transmembrane segments, the proline-rich linker and the dimerization domain. Some of the polymorphisms within loops 1, 2, and 3 were found to significantly co-occur. Grouping of sequences according to the outer loop polymorphisms revealed five distinct patterns that each occur in at least 5% of our isolates. The two most common patterns, I and II, are encoded by 33.3 and 20.5% of these isolates and differ at each of the four loops. Statistically significant differences in attachment to corn stover were observed among isolates expressing different versions of OmpA and when different versions of OmpA were expressed in the same genetic background. Most notable was the increased corn stover attachment associated with a loop 3 sequence of SNFDGKN relative to the standard SNVYGKN sequence. These results provide further insight into the allelic variation of OmpA and implicate OmpA in contributing to attachment to corn stover
Temporal Dynamics of Bacterial Communities in Soil and Leachate Water After Swine Manure Application
Application of swine manure to agricultural land allows recycling of plant nutrients, but excess nitrate, phosphorus and fecal bacteria impact surface and drainage water quality. While agronomic and water quality impacts are well studied, little is known about the impact of swine manure slurry on soil microbial communities. We applied swine manure to intact soil columns collected from plots maintained under chisel plow or no-till with corn and soybean rotation. Targeted 16S-rRNA gene sequencing was used to characterize and to identify shifts in bacterial communities in soil over 108 days after swine manure application. In addition, six simulated rainfalls were applied during this time. Drainage water from the columns and surface soil were sampled, and DNA was extracted and sequenced. Unique DNA sequences (OTU) associated with 12 orders of bacteria were responsible for the majority of OTUs stimulated by manure application. Proteobacteria were most prevalent, followed by Bacteroidetes, Firmicutes, Actinobacteria, and Spirochaetes. While the majority of the 12 orders decreased after day 59, relative abundances of genes associated with Rhizobiales and Actinomycetales in soil increased. Bacterial orders which were stimulated by manure application in soil had varied responses in drainage waters over the course of the experiment. We also identified a “manure-specific core” of five genera who comprised 13% of the manure community and were not significantly abundant in non-manured control soils. Of these five genera, Clostridium sensu stricto was the only genus which did not return to pre-manure relative abundance in soil by day 108. Our results show that enrichment responses after manure amendment could result from displacement of native soil bacteria by manure-borne bacteria during the application process or growth of native bacteria using manure-derived available nutrients
The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus
Background: The barber's pole worm, Haemonchus contortus, is one of the most economically important parasites of small ruminants worldwide. Although this parasite can be controlled using anthelmintic drugs, resistance against most drugs in common use has become a widespread problem. We provide a draft of the genome and the transcriptomes of all key developmental stages of H. contortus to support biological and biotechnological research areas of this and related parasites.
Results: The draft genome of H. contortus is 320 Mb in size and encodes 23,610 protein-coding genes. On a fundamental level, we elucidate transcriptional alterations taking place throughout the life cycle, characterize the parasite's gene silencing machinery, and explore molecules involved in development, reproduction, host-parasite interactions, immunity, and disease. The secretome of H. contortus is particularly rich in peptidases linked to blood-feeding activity and interactions with host tissues, and a diverse array of molecules is involved in complex immune responses. On an applied level, we predict drug targets and identify vaccine molecules.
Conclusions: The draft genome and developmental transcriptome of H. contortus provide a major resource to the scientific community for a wide range of genomic, genetic, proteomic, metabolomic, evolutionary, biological, ecological, and epidemiological investigations, and a solid foundation for biotechnological outcomes, including new anthelmintics, vaccines and diagnostic tests. This first draft genome of any strongylid nematode paves the way for a rapid acceleration in our understanding of a wide range of socioeconomically important parasites of one of the largest nematode orders
Strategies to improve reference databases for soil microbiomes
Microbial populations in the soil are critical in our lives. The soil microbiome helps to grow our food, nourishing and protecting plants, while also providing important ecological services such as erosion protection, water filtration and climate regulation. We are increasingly aware of the tremendous microbial diversity that has a role in soil heath; yet, despite significant efforts to isolate microbes from the soil, we have accessed only a small fraction of its biodiversity. Even with novel cell isolation techniques