360 research outputs found

    Computational Workflow for the FineGrained Analysis of Metagenomic Samples

    Get PDF
    El desarrollo de nuevas tecnologías de adquisición de datos ha propiciado una enorme disponibilidad de información en casi todos los campos existentes de la investigación científica, permitiendo a la vez una especialización que resulta en desarrollos software particulares. Con motivo de facilitar al usuario final la obtención de resultados a partir de sus datos, un nuevo paradigma de computación ha surgido con fuerza: los flujos de trabajo automáticos para procesar la información, que han conseguido imponerse gracias al soporte que proporcionan para ensamblar un sistema de procesamiento completo y robusto. La bioinformática es un claro ejemplo donde muchas instituciones ofrecen servicios específicos de procesamiento que, en general, necesitan combinarse para obtener un resultado global. Los ‘gestores de flujos de trabajo’ como Galaxy [1], Swift [2] o Taverna [3] se utilizan para el análisis de datos (entre otros) obtenidos por las nuevas tecnologías de secuenciación del ADN, como Next Generation Sequencing [4], las cuales producen ingentes cantidades de datos en el campos de la genómica, y en particular, metagenómica. La metagenómica estudia las especies presentes en una muestra no cultivada, directamente recolectada del entorno, y los estudios de interés tratan de observar variaciones en la composición de las muestras con objeto de identificar diferencias significativas que correlacionen con características (fenotipo)de los individuos a los que pertenecen las muestras; lo que incluye el análisis funcional de las especies presentes en un metagenoma para comprender las consecuencias derivadas de éstas. Analizar genomas completos ya resulta una tarea importante computacionalmente, por lo que analizar metagenomas en los que no solo está presente el genoma de una especie sino de las varias que conviven en la muestra, resulta una tarea hercúlea. Por ello, el análisis metagenómico requiere algoritmos eficientes capaces de procesar estos datos de forma efectiva y eficiente, en tiempo razonable. Algunas de las dificultades que deben salvarse son (1) el proceso de comparación de muestras contra bases de datos patrón, (2) la asignación (m apping ) de lecturas (r eads ) a genomas mediante estimadores de parecido, (3) los datos procesados suelen ser pesados y necesitan formas de acceso funcionales, (4) la particularidad de cada muestra requiere programas específicos y nuevos para su análisis; (5) la representación visual de resultados ndimensionales para la comprensión y (6) los procesos de verificación de calidad y certidumbre de cada etapa. Para ello presentamos un flujo de trabajo completo pero adaptable, dividido en módulos acoplables y reutilizables mediante estructuras de datos definidas, lo que además permite fácil extensión y customización para satisfacer la demanda de nuevos experimentos

    High-performance integrated virtual environment (HIVE) tools and applications for big data analysis

    Get PDF
    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis

    CowPI::A rumen microbiome focussed version of the PICRUSt functional inference software

    Get PDF
    Metataxonomic 16S rDNA based studies are a commonplace and useful tool in the research of the microbiome, but they do not provide the full investigative power of metagenomics and metatranscriptomics for revealing the functional potential of microbial communities. However, the use of metagenomic and metatranscriptomic technologies is hindered by high costs and skills barrier necessary to generate and interpret the data. To address this, a tool for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was developed for inferring the functional potential of an observed microbiome profile, based on 16S data. This allows functional inferences to be made from metataxonomic 16S rDNA studies with little extra work or cost, but its accuracy relies on the availability of completely sequenced genomes of representative organisms from the community being investigated. The rumen microbiome is an example of a community traditionally underrepresented in genome and sequence databases, but recent efforts by projects such as the Global Rumen Census and Hungate 1000 have resulted in a wide sampling of 16S rDNA profiles and over 500 fully sequenced microbial genomes from this environment. Using this information we have developed ?CowPI? a focused version of the PICRUSt tool provided for use by the wider scientific community in the study of the rumen microbiome. We evaluated the accuracy of CowPI and PICRUSt using two 16S datasets from the rumen microbiome: one generated from rDNA and the other from rRNA where corresponding metagenomic and metatranscriptomic data was also available. We show that the functional profiles predicted by CowPI better match estimates for both the meta-genomic and transcriptomic datasets than PICRUSt, and captures the higher degree of genetic variation and larger pangenomes of rumen organisms. Nonetheless, whilst being closer in terms of predictive power for the rumen microbiome, there were differences when compared to both the metagenomic and metatranscriptome data and so we recommend where possible, functional inferences from 16S data should not replace metagenomic and metatranscriptomic approaches. The tool can be accessed at http://www.cowpi.org and is provided to the wider scientific community for use in the study of the rumen microbiomepublishersversionPeer reviewe

    MSG-Fast: Metagenomic shotgun data fast annotation using microbial gene catalogs

    Full text link
    Background: Current methods used for annotating metagenomics shotgun sequencing (MGS) data rely on a computationally intensive and low-stringency approach of mapping each read to a generic database of proteins or reference microbial genomes. Results: We developed MGS-Fast, an analysis approach for shotgun whole-genome metagenomic data utilizing Bowtie2 DNA-DNA alignment of reads that is an alternative to using the integrated catalog of reference genes database of well-annotated genes compiled from human microbiome data. This method is rapid and provides high-stringency matches (\u3e90% DNA sequence identity) of the metagenomics reads to genes with annotated functions. We demonstrate the use of this method with data from a study of liver disease and synthetic reads, and Human Microbiome Project shotgun data, to detect differentially abundant Kyoto Encyclopedia of Genes and Genomes gene functions in these experiments. This rapid annotation method is freely available as a Galaxy workflow within a Docker image. Conclusions: MGS-Fast can confidently transfer functional annotations from gene databases to metagenomic reads, with speed and accuracy

    Next-generation sequencing (NGS) in the microbiological world : how to make the most of your money

    Get PDF
    The Sanger sequencing method produces relatively long DNA sequences of unmatched quality and has been considered for long time as the gold standard for sequencing DNA. Many improvements of the Sanger method that culminated with fluorescent dyes coupled with automated capillary electrophoresis enabled the sequencing of the first genomes. Nevertheless, using this technology to sequence whole genomes was costly, laborious and time consuming even for genomes that are relatively small in size. A major technological advance was the introduction of next-generation sequencing (NGS) pioneered by 454 Life Sciences in the early part of the 21th century. NGS allowed scientists to sequence thousands to millions of DNA molecules in a single machine run. Since then, new NGS technologies have emerged and existing NGS platforms have been improved, enabling the production of genome sequences at an unprecedented rate as well as broadening the spectrum of NGS applications. The current affordability of generating genomic information, especially with microbial samples, has resulted in a false sense of simplicity that belies the fact that many researchers still consider these technologies a black box. In this review, our objective is to identify and discuss four steps that we consider crucial to the success of any NGS-related project. These steps are: (1) the definition of the research objectives beyond sequencing and appropriate experimental planning, (2) library preparation, (3) sequencing and (4) data analysis. The goal of this review is to give an overview of the process, from sample to analysis, and discuss how to optimize your resources to achieve the most from your NGS-based research. Regardless of the evolution and improvement of the sequencing technologies, these four steps will remain relevant

    ANASTASIA: An Automated Metagenomic Analysis Pipeline for Novel Enzyme Discovery Exploiting Next Generation Sequencing Data

    Get PDF
    Metagenomic analysis of environmental samples provides deep insight into the enzymatic mixture of the corresponding niches, capable of revealing peptide sequences with novel functional properties exploiting the high performance of next-generation sequencing (NGS) technologies. At the same time due to their ever increasing complexity, there is a compelling need for ever larger computational configurations to ensure proper bioinformatic analysis, and fine annotation. With the aiming to address the challenges of such an endeavor, we have developed a novel web-based application named ANASTASIA (automated nucleotide aminoacid sequences translational plAtform for systemic interpretation and analysis). ANASTASIA provides a rich environment of bioinformatic tools, either publicly available or novel, proprietary algorithms, integrated within numerous automated algorithmic workflows, and which enables versatile data processing tasks for (meta)genomic sequence datasets. ANASTASIA was initially developed in the framework of the European FP7 project HotZyme, whose aim was to perform exhaustive analysis of metagenomes derived from thermal springs around the globe and to discover new enzymes of industrial interest. ANASTASIA has evolved to become a stable and extensible environment for diversified, metagenomic, functional analyses for a range of applications overarching industrial biotechnology to biomedicine, within the frames of the ELIXIR-GR project. As a showcase, we report the successful in silico mining of a novel thermostable esterase termed “EstDZ4” from a metagenomic sample collected from a hot spring located in Krisuvik, Iceland

    CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    Get PDF
    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35
    corecore