87 research outputs found

    MetaGT : A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data

    Get PDF
    While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands. However, there are several problems that complicate metatranscriptome analysis: complexity of microbial communities, wide dynamic range of transcriptome expression and importantly, the lack of high-quality computational methods for assembling meta-RNA sequencing data. These factors deteriorate the contiguity and completeness of metatranscriptome assemblies, therefore affecting further downstream analysis. Here we present MetaGT, a pipeline for de novo assembly of metatranscriptomes, which is based on the idea of combining both metatranscriptomic and metagenomic data sequenced from the same sample. MetaGT assembles metatranscriptomic contigs and fills in missing regions based on their alignments to metagenome assembly. This approach allows to overcome described complexities and obtain complete RNA sequences, and additionally estimate their abundances. Using various publicly available real and simulated datasets, we demonstrate that MetaGT yields significant improvement in coverage and completeness of metatranscriptome assemblies compared to existing methods that do not exploit metagenomic data. The pipeline is implemented in NextFlow and is freely available fromhttps://github.com/ablab/metaGT.Peer reviewe

    Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data

    Get PDF
    Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP

    Next-generation sequencing (NGS) in the microbiological world : how to make the most of your money

    Get PDF
    The Sanger sequencing method produces relatively long DNA sequences of unmatched quality and has been considered for long time as the gold standard for sequencing DNA. Many improvements of the Sanger method that culminated with fluorescent dyes coupled with automated capillary electrophoresis enabled the sequencing of the first genomes. Nevertheless, using this technology to sequence whole genomes was costly, laborious and time consuming even for genomes that are relatively small in size. A major technological advance was the introduction of next-generation sequencing (NGS) pioneered by 454 Life Sciences in the early part of the 21th century. NGS allowed scientists to sequence thousands to millions of DNA molecules in a single machine run. Since then, new NGS technologies have emerged and existing NGS platforms have been improved, enabling the production of genome sequences at an unprecedented rate as well as broadening the spectrum of NGS applications. The current affordability of generating genomic information, especially with microbial samples, has resulted in a false sense of simplicity that belies the fact that many researchers still consider these technologies a black box. In this review, our objective is to identify and discuss four steps that we consider crucial to the success of any NGS-related project. These steps are: (1) the definition of the research objectives beyond sequencing and appropriate experimental planning, (2) library preparation, (3) sequencing and (4) data analysis. The goal of this review is to give an overview of the process, from sample to analysis, and discuss how to optimize your resources to achieve the most from your NGS-based research. Regardless of the evolution and improvement of the sequencing technologies, these four steps will remain relevant

    Spatially distinct, temporally stable microbial populations mediate biogeochemical cycling at and below the seafloor in hydrothermal vent fluids

    Get PDF
    © The Author(s), 2017. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Environmental Microbiology 20 (2018): 769–784, doi:10.1111/1462-2920.14011.At deep-sea hydrothermal vents, microbial communities thrive across geochemical gradients above, at, and below the seafloor. In this study, we determined the gene content and transcription patterns of microbial communities and specific populations to understand the taxonomy and metabolism both spatially and temporally across geochemically different diffuse fluid hydrothermal vents. Vent fluids were examined via metagenomic, metatranscriptomic, genomic binning, and geochemical analyses from Axial Seamount, an active submarine volcano on the Juan de Fuca Ridge in the NE Pacific Ocean, from 2013 to 2015 at three different vents: Anemone, Marker 33, and Marker 113. Results showed that individual vent sites maintained microbial communities and specific populations over time, but with spatially distinct taxonomic, metabolic potential, and gene transcription profiles. The geochemistry and physical structure of each vent both played important roles in shaping the dominant organisms and metabolisms present at each site. Genomic binning identified key populations of SUP05, Aquificales and methanogenic archaea carrying out important transformations of carbon, sulfur, hydrogen, and nitrogen, with groups that appear unique to individual sites. This work highlights the connection between microbial metabolic processes, fluid chemistry, and microbial population dynamics at and below the seafloor and increases understanding of the role of hydrothermal vent microbial communities in deep ocean biogeochemical cycles.Gordon and Betty Moore Foundation Grant Number: GBMF3297; NSF Center for Dark Energy Biosphere Investigations Grant Number: OCE—0939564; Schmidt Ocean Institut

    Eukaryotic Plant Pathogen Detection Through High Throughput DNA/RNA Sequencing Data Analysis

    Get PDF
    Plant pathogen detection is crucial for developing appropriate management techniques. A variety of tools are available for rapid plant pathogen detection. Most tools rely on unique features of the pathogen to detect its presence. Immunoassays rely on unique proteins while genetic approaches rely on unique DNA signatures. However, most of these tools can detect a limited number of pathogens at once. E-probe Diagnostics Nucleic acid Analysis (EDNA) is a bioinformatic tool originally designed as a theoretical approach to detect multiple plant pathogens at once. EDNA uses metagenomic databases and bioinformatics to infer the presence/absence of plant pathogens in a given sample. Additionally, EDNA relies on a continuous design and curation of unique signatures termed e-probes. EDNA has been successfully validated in viral, bacterial and eukaryotic plant pathogens. However, most of these validations have been performed solely at the species level and only using DNA sequencing. My thesis involved the refinement of EDNA to increase its detection scope to include plant pathogens at the strain/isolate level. Additional refinements included its increasing EDNA’s capacity to use transcriptomic analysis to detect actively infecting plant pathogens and metabolic pathways. Actively infecting/growing plant pathogen detection was performed by using Slerotinia minor as an eukaryotic model system. We sequenced and annotated the genome of S. minor to be able to use its genome for e-probe generation. In vitro detection of actively growing S. minor was successfully achieved using EDNA for RNA sequencing analysis. However, actively infecting S. minor in peanut was non-detectable. EDNA’s capacity to detect the aflatoxin metabolic pathway was also assesed. Actively producing aflatoxin A. flavus strains (AF70) were successfully used to differentially detect the production of aflatoxin when A. flavus grows in an environment conducive for the production of aflatoxin (maize). Finally, EDNA’s detection scope was assesed with eukaryotic strains having very low genetic diversity within its species (Pythium aphanidermatum). We were able to successfully discriminate P. aphanidermatum P16 strain from P. aphanidermatum BR444, concomitantly, these two strains were differentiated from other related species (Globisporangium irregulare and Pythium deliense) in the same detection run trial.Plant Patholog
    • …
    corecore