52 research outputs found

    Cyberinfrastructure resources enabling creation of the loblolly pine reference transcriptome

    Get PDF
    This paper was presented at XSEDE 15 conference.Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological discovery in both model and non- model organisms, with and without whole genome sequence references. A major limitation in effective building of transcriptome references is no longer the sequencing data generation itself, but the computing infrastructure and expertise needed to assemble, analyze and manage the data. Here we describe a currently available resource dedicated to achieving such goals, and its use for extensive RNA assembly of up to 1.3 billion reads representing the massive transcriptome of loblolly pine, using four major assembly software installations. The Mason cluster, an XSEDE second tier resource at Indiana University, provides the necessary fast CPU cycles, large memory, and high I/O throughput for conducting large-scale genomics research. The National Center for Genome Analysis Support, or NCGAS, provides technical support in using HPC systems, bioinformatic support for determining the appropriate method to analyze a given dataset, and practical assistance in running computations. We demonstrate that a sufficient supercomputing resource and good workflow design are elements that are essential to large eukaryotic genomics and transcriptomics projects such as the complex transcriptome of loblolly pine, gene expression data that inform annotation and functional interpretation of the largest genome sequence reference to date.This work was supported in part by USDA NIFA grant 2011- 67009-30030, PineRefSeq, led by the University of California, Davis, and NCGAS funded by NSF under award No. 1062432

    Jetstream: A self-provisoned, scalable science and engineering cloud environment

    Get PDF
    The paper describes the motivation behind Jetstream, its functions, hardware configuration, software environment, user interface, design, use cases, relationships with other projects such as Wrangler and iPlant, and challenges in implementation.Funded by the National Science Foundation Award #ACI - 144560

    National Center for Genome Analysis Program Year 3 Report – September 15, 2013 – September 14, 2014

    Get PDF
    On September 15, 2011, Indiana University (IU) received three years of support to establish the National Center for Genome Analysis Support (NCGAS). This technical report describes the activities of the third 12 months of NCGASThe facilities supported by the Research Technologies division at Indiana University are supported by a number of grants. The authors would like to acknowledge that although the National Center for Genome Analysis Support is funded by NSF 1062432, our work would not be possible without the generous support of the following awards received by our parent organization, the Pervasive Technology Institute at Indiana University. • The Indiana University Pervasive Technology Institute was supported in part by two grants from the Lilly Endowment, Inc. • NCGAS has also been supported directly by the Indiana METACyt Initiative. The Indiana METACyt Initiative of Indiana University is supported in part by the Lilly Endowment, Inc. • This material is based in part upon work supported by the National Science Foundation under Grant No. CNS-0521433. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF)

    National Center for Genome Analysis Program Year 2 Report – September 15, 2012 – September 14, 2013

    Get PDF
    On September 15, 2011, Indiana University (IU) received three years of support to establish the National Center for Genome Analysis Support (NCGAS). This technical report describes the activities of the second 12 months of NCGASThe facilities supported by the Research Technologies division at Indiana University are supported by a number of grants. The authors would like to acknowledge that although the National Center for Genome Analysis Support is funded by NSF 1062432, our work would not be possible without the generous support of the following awards received by our parent organization, the Pervasive Technology Institute at Indiana University. • The Indiana University Pervasive Technology Institute was supported in part by two grants from the Lilly Endowment, Inc. • NCGAS has also been supported directly by the Indiana METACyt Initiative. The Indiana METACyt Initiative of Indiana University is supported in part by the Lilly Endowment, Inc. • This material is based in part upon work supported by the National Science Foundation under Grant No. CNS-0521433. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF)

    Galaxy based BLAST submission to distributed national high throughput computing resources

    Get PDF
    To assist the bioinformatic community in leveraging the national cyberinfrastructure, the National Center for Genomic Analysis Support (NCGAS) along with Indiana University's High Throughput Computing (HTC) group have engineered a method to use the Galaxy to submit BLAST jobs to the Open Science Grid (OSG). OSG is a collaboration of resource providers that utilize opportunistic cycles at more than 100 universities and research centers in the US. BLAST jobs make a significant portion of the research conducted on NCGAS resources, moving jobs that are conducive to an HTC environment to the national cyberinfrastructure would alleviate load on resources at NCGAS and provide a cost effective solution for getting more cycles to reduce the unmet needs of bioinformatic researchers. To this point researchers have tackled this issue by purchasing additional resources or enlisting collaborators doing the same type of research, while HTC experts have focused on expanding the number of resources available to historically HTC friendly science workflows. In this paper, we bring together expertise from both areas to address how a bioinformatics researcher using their normal interface, Galaxy, can seamlessly access the OSG which routinely supplies researchers with millions of compute hours daily. Efficient use of these results will supply additional compute time to researcher and help provide a yet unmet need for BLAST computing cycles.This material is based upon work supported by the National Science Foundation under Grant No. ABI-1062432, Craig Stewart, PI. William Barnett, Matthew Hahn, and Michael Lynch, co-PIs. This work was supported in part by the Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute. Any opinions presented here are those of the presenter(s) and do not necessarily represent the opinions of the National Science Foundation or any other funding agencie

    Report of the 2014 NSF Cybersecurity Summit for Large Facilities and Cyberinfrastructure

    Get PDF
    This event was supported in part by the National Science Foundation under Grant Number 1234408. Any opinions, findings, and conclusions or recommendations expressed at the event or in this report are those of the authors and do not necessarily reflect the views of the National Science Foundation

    Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services

    Get PDF
    ABSTRACT We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads

    Usage of Indiana University computation and data cyberinfrastructure in FY 2011/2012 and assessment of future needs

    Get PDF
    This report details the past and current cyberinfrastructure resources that have been deployed by the Research Technologies (RT) division of University Information Technologies Services to support research and scholarly activities at IU. This report also presents data and detailed analysis of system usage and services supported by RT for the FY 2011/2012 period, projects future usage trends based on these data, and provides several recommendations for the most effective ways to meet the growing need for high performance computing resources in research and scholarly endeavors.This research was supported in part by: The Pervasive Technology Institute, Indiana Metabolomics and Cytomics Initiative, and the Indiana Genomics Initiative. All of these initiatives have been supported in part by Lilly Endowment, Inc. Grant number 1U24AA014818-01 from NIAAA/NIH. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIAAA/NIH. National Science Foundation under Grants CDA-9601632, EIA-0116050, ACI-0338618l, OCI-0451237, OCI-0535258, and OCI-0504075, CNS-0723054, and CNS-0521433. Shared University Research grants from IBM, Inc. to Indiana University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies represented above

    The iPlant Collaborative: Cyberinfrastructure for Plant Biology

    Get PDF
    The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services
    • …
    corecore