9 research outputs found

    Cyberinfrastructure resources enabling creation of the loblolly pine reference transcriptome

    Get PDF
    This paper was presented at XSEDE 15 conference.Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological discovery in both model and non- model organisms, with and without whole genome sequence references. A major limitation in effective building of transcriptome references is no longer the sequencing data generation itself, but the computing infrastructure and expertise needed to assemble, analyze and manage the data. Here we describe a currently available resource dedicated to achieving such goals, and its use for extensive RNA assembly of up to 1.3 billion reads representing the massive transcriptome of loblolly pine, using four major assembly software installations. The Mason cluster, an XSEDE second tier resource at Indiana University, provides the necessary fast CPU cycles, large memory, and high I/O throughput for conducting large-scale genomics research. The National Center for Genome Analysis Support, or NCGAS, provides technical support in using HPC systems, bioinformatic support for determining the appropriate method to analyze a given dataset, and practical assistance in running computations. We demonstrate that a sufficient supercomputing resource and good workflow design are elements that are essential to large eukaryotic genomics and transcriptomics projects such as the complex transcriptome of loblolly pine, gene expression data that inform annotation and functional interpretation of the largest genome sequence reference to date.This work was supported in part by USDA NIFA grant 2011- 67009-30030, PineRefSeq, led by the University of California, Davis, and NCGAS funded by NSF under award No. 1062432

    National Center for Genome Analysis Program Year 3 Report – September 15, 2013 – September 14, 2014

    Get PDF
    On September 15, 2011, Indiana University (IU) received three years of support to establish the National Center for Genome Analysis Support (NCGAS). This technical report describes the activities of the third 12 months of NCGASThe facilities supported by the Research Technologies division at Indiana University are supported by a number of grants. The authors would like to acknowledge that although the National Center for Genome Analysis Support is funded by NSF 1062432, our work would not be possible without the generous support of the following awards received by our parent organization, the Pervasive Technology Institute at Indiana University. • The Indiana University Pervasive Technology Institute was supported in part by two grants from the Lilly Endowment, Inc. • NCGAS has also been supported directly by the Indiana METACyt Initiative. The Indiana METACyt Initiative of Indiana University is supported in part by the Lilly Endowment, Inc. • This material is based in part upon work supported by the National Science Foundation under Grant No. CNS-0521433. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF)

    Doctor of Philosophy

    Get PDF
    dissertationThe MAKER genome annotation and curation software tool was developed in response to increased demand for genome annotation services, secondary to decreased genome sequencing costs. MAKER currently has over 1000 registered users throughout the world. This wide adoption of MAKER has uncovered the need for additional functionalities. Here I addressed moving MAKER into the domain of plant annotation, expanding MAKER to include new methods of gene and noncoding RNA annotation, and improving usability of MAKER through documentation and community outreach. To move MAKER into the plant annotation domain, I benchmarked MAKER on the well-annotated Arabidopsis thaliana genome. MAKER performs well on the Arabidopsis genome in de novo genome annotation and was able to improve the current TAIR10 gene models by incorporating mRNA-seq data not available during the original annotation efforts. In addition to this benchmarking, I annotated the genome of the sacred lotus Nelumbo Nucifera. I enabled noncoding RNA annotation in MAKER by adding the ability for MAKER to run and process the outputs of tRNAscan-SE and snoscan. These functionalities were tested on the Arabidopsis genome and used MAKER to annotate tRNAs and snoRNAs in Zea mays. The resulting version of MAKER was named MAKER-P. I added the functionality of a combiner by adding EVidence Modeler to the MAKER code base. iv As the number of MAKER users has grown, so have the help requests sent to the MAKER developers list. Motivated by the belief that improving the MAKER documentation would obviate the need for many of these requests, I created a media wiki that was linked to the MAKER download page, and the MAKER developers list was made searchable. Additionally I have written a unit on genome annotation using MAKER for Current Protocols in Bioinformatics. In response to these efforts I have seen a corresponding decrease in help requests, even though the number of registered MAKER users continues to increase. Taken together these products and activities have moved MAKER into the domain of plant annotation, expanded MAKER to include new methods of gene and noncoding RNA annotation, and improved the usability of MAKER through documentation and community outreach

    Forest genomics and biotechnology

    Get PDF
    This Research Topic addresses research in genomics and biotechnology to improve the growth and quality of forest trees for wood, pulp, biorefineries and carbon capture. Forests are the world’s greatest repository of terrestrial biomass and biodiversity. Forests serve critical ecological services, supporting the preservation of fauna and flora, and water resources. Planted forests also offer a renewable source of timber, for pulp and paper production, and the biorefinery. Despite their fundamental role for society, thousands of hectares of forests are lost annually due to deforestation, pests and pathogens and urban development. As a consequence, there is an increasing need to develop trees that are more productive under lower inputs, while understanding how they adapt to the environment and respond to biotic and abiotic stress. Forest genomics and biotechnology, disciplines that study the genetic composition of trees and the methods required to modify them, began over a quarter of a century ago with the development of the first genetic maps and establishment of early methods of genetic transformation. Since then, genomics and biotechnology have impacted all research areas of forestry. Genome analyses of tree populations have uncovered genes involved in adaptation and response to biotic and abiotic stress. Genes that regulate growth and development have been identified, and in many cases their mechanisms of action have been described. Genetic transformation is now widely used to understand the roles of genes and to develop germplasm that is more suitable for commercial tree plantations. However, in contrast to many annual crops that have benefited from centuries of domestication and extensive genomic and biotechnology research, in forestry the field is still in its infancy. Thus, tremendous opportunities remain unexplored. This Research Topic aims to briefly summarize recent findings, to discuss long-term goals and to think ahead about future developments and how this can be applied to improve growth and quality of forest trees. Mini-review articles are sought in forest genomics and biotechnology, with a focus on future directions applied to (1) genetic engineering, (2) adaptation, (3) genomics of conifers and hardwoods, (4) cell wall and wood formation, (5) development (6) metabolic engineering (7) biotic and abiotic resistance and (8) the biorefinery

    Unruptured brain arteriovenous malformations : primary ONYX embolization in ARUBA (A Randomized Trial of Unruptured Brain Arteriovenous Malformations)-eligible patients

    Get PDF
    Background and Purpose: In light of evidence from ARUBA (A Randomized Trial of Unruptured Brain Arteriovenous Malformations), neurovascular specialists had to reconsider deliberate treatment of unruptured brain arteriovenous malformations (uBAVMs). Our objective was to determine the outcomes of uBAVM treated with primary embolization using ethylene vinyl alcohol (ONYX). Methods: Patients with uBAVM who met the inclusion criteria of ARUBA and were treated with primary Onyx embolization were assigned to this retrospective study. The primary outcome was the modified Rankin Scale score. Secondary outcomes were stroke or death because of uBAVM or intervention and uBAVM obliteration. Results: Sixty-one patients (mean age, 38 years) were included. The median observation period was 60 months. Patients were treated by embolization alone (41.0%), embolization and radiosurgery (57.4%), or embolization and excision (1.6%). Occlusion was achieved in 44 of 57 patients with completed treatment (77.2%). Forty-seven patients (77.1%) had no clinical impairment at the end of observation (modified Rankin Scale score of <2). Twelve patients (19.7%) reached the outcome of stroke or death because of uBAVM or intervention. Treatment-related mortality was 6.6% (4 patients). Conclusions: In uBAVM, Onyx embolization alone or combined with stereotactic radiosurgery achieves a high occlusion rate. Morbidity remains a challenge, even if it seems lower than in the ARUBA trial

    Plataforma de supercomputación para bioinformática

    Get PDF
    En el año 2007 la Universidad de Málaga amplió y trasladó sus recursos de cálculo a un nuevo centro dedicado exclusivamente a la investigación: el edificio de Supercomputación y Bioinnovación sito en el Parque Tecnológico de Andalucía. Este edificio albergaría también la Plataforma Andaluza de Bioinformática junto con otras unidades y laboratorios con instrumentación muy especializada. Desde aquel momento he trabajado como administrador de los recursos de supercomputación del centro y como parte del equipo bioinformático para proporcionar soporte a un gran número de investigadores en sus tareas diarias. Teniendo una visión de ambas partes, fue fácil detectar las carencias existentes en la bioinformática que podían ser cubiertas con una aplicación adecuada de los recursos de cálculo disponibles, y ahí es donde surgió la semilla que nos llevó a comenzar los primeros trabajos que componen este estudio. Al haberse realizado en un entorno tan orientado a la resolución de problemas como el que hemos descrito, esta tesis tendrá un carácter eminentemente práctico, donde cada aportación realizada lleva un importante estudio teórico detrás, pero que culmina en un resultado práctico concreto que puede aplicarse a problemas cotidianos de la bioinformática o incluso de otras áreas de la investigación. Así, con el objetivo de facilitar el acceso a los recursos de supercomputación para los bioinformáticos, hemos creado un generador automático de interfaces web para programas que se ejecutan en línea de comandos, que permite ejecutar los trabajos utilizando recursos de supercomputación de forma transparente para el usuario. Además aportamos un sistema de escritorios virtuales que permiten el acceso remoto a un conjunto de programas ya instalados que proporcionan interfaces visuales para analizar pequeños conjuntos de datos o visualizar los resultados más complejos que hayan sido generados con recursos de supercomputación. Para optimizar el uso de los recursos de supercomputación hemos diseñado un nuevo algoritmo para la ejecución distribuida de tareas, que puede utilizarse tanto en el diseño de nuevas herramientas como para optimizar la ejecución de programas ya existentes. Por otra parte, preocupados por el incremento en la cantidad de datos producidos por las técnicas de ultrasecuenciación, aportamos un nuevo formato de compresión de secuencias, que además de reducir el espacio de almacenamiento utilizado, permite buscar y extraer rápidamente cualquier secuencia almacenada sin necesidad de descomprimir el archivo completo. En el desarrollo de nuevos algoritmos para resolver problemas biológicos concretos, proporcionamos cuatro herramientas nuevas que abarcan la búsqueda de regiones divergentes en alineamientos, el preprocesamiento y limpieza de lecturas obtenidas mediante técnicas de ultrasecuenciación, el análisis de transcriptomas de especies no modelo obtenidos mediante ensamblajes de novo y un prototipo para anotar secuencias genómicas incompletas. Como solución para la difusión y el almacenamiento a largo plazo de resultados obtenidos en diversas investigaciones, se ha desarrollado un sistema genérico de máquinas virtuales para bases de datos de transcriptómica que ya está siendo utilizado en varios proyectos. Además, con el ánimo de difundir los resultados de nuestro trabajo, todos los algoritmos y herramientas productos de esta tesis se han publicado como código abierto en https://github.com/dariogf
    corecore