1,968 research outputs found

    Extreme Scale De Novo Metagenome Assembly

    Full text link
    Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

    MSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting

    Full text link
    A major challenge in next-generation genome sequencing (NGS) is to assemble massive overlapping short reads that are randomly sampled from DNA fragments. To complete assembling, one needs to finish a fundamental task in many leading assembly algorithms: counting the number of occurrences of k-mers (length-k substrings in sequences). The counting results are critical for many components in assembly (e.g. variants detection and read error correction). For large genomes, the k-mer counting task can easily consume a huge amount of memory, making it impossible for large-scale parallel assembly on commodity servers. In this paper, we develop MSPKmerCounter, a disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. Our approach is based on a novel technique called Minimum Substring Partitioning (MSP). MSP breaks short reads into multiple disjoint partitions such that each partition can be loaded into memory and processed individually. By leveraging the overlaps among the k-mers derived from the same short read, MSP can achieve astonishing compression ratio so that the I/O cost can be significantly reduced. For the task of k-mer counting, MSPKmerCounter offers a very fast and memory-efficient solution. Experiment results on large real-life short reads data sets demonstrate that MSPKmerCounter can achieve better overall performance than state-of-the-art k-mer counting approaches. MSPKmerCounter is available at http://www.cs.ucsb.edu/~yangli/MSPKmerCounte

    Research in the effective implementation of guidance computers with large scale arrays Interim report

    Get PDF
    Functional logic character implementation in breadboard design of NASA modular compute

    T-infinity: The Dependency Inversion Principle for Rapid and Sustainable Multidisciplinary Software Development

    Get PDF
    The CFD Vision 2030 Study recommends that, NASA should develop and maintain an integrated simulation and software development infrastructure to enable rapid CFD technology maturation.... [S]oftware standards and interfaces must be emphasized and supported whenever possible, and open source models for noncritical technology components should be adopted. The current paper presents an approach to an open source development architecture, named T-infinity, for accelerated research in CFD leveraging the Dependency Inversion Principle to realize plugins that communicate through collections of functions without exposing internal data structures. Steady state flow visualization, mesh adaptation, fluid-structure interaction, and overset domain capabilities are demonstrated through compositions of plugins via standardized abstract interfaces without the need for source code dependencies between disciplines. Plugins interact through abstract interfaces thereby avoiding N 2 direct code-to-code data structure coupling where N is the number of codes. This plugin architecture enhances sustainable development by controlling the interaction between components to limit software complexity growth. The use of T-infinity abstract interfaces enables multidisciplinary application developers to leverage legacy applications alongside newly-developed capabilities. While rein, a description of interface details is deferred until the are more thoroughly tested and can be closed to modification

    Unified Framework for Finite Element Assembly

    Full text link
    At the heart of any finite element simulation is the assembly of matrices and vectors from discrete variational forms. We propose a general interface between problem-specific and general-purpose components of finite element programs. This interface is called Unified Form-assembly Code (UFC). A wide range of finite element problems is covered, including mixed finite elements and discontinuous Galerkin methods. We discuss how the UFC interface enables implementations of variational form evaluation to be independent of mesh and linear algebra components. UFC does not depend on any external libraries, and is released into the public domain

    The finite element machine: An experiment in parallel processing

    Get PDF
    The finite element machine is a prototype computer designed to support parallel solutions to structural analysis problems. The hardware architecture and support software for the machine, initial solution algorithms and test applications, and preliminary results are described

    Using Rapid Prototyping in Computer Architecture Design Laboratories

    Get PDF
    This paper describes the undergraduate computer architecture courses and laboratories introduced at Georgia Tech during the past two years. A core sequence of six required courses for computer engineering students has been developed. In this paper, emphasis is placed upon the new core laboratories which utilize commercial CAD tools, FPGAs, hardware emulators, and a VHDL based rapid prototyping approach to simulate, synthesize, and implement prototype computer hardware
    • …
    corecore