Search CORE

1,968 research outputs found

Extreme Scale De Novo Metagenome Assembly

Author: Arndt Bill
Buluc Aydin
Egan Rob
Georganas Evangelos
Goltsman Eugene
Hofmeyr Steven
Oliker Leonid
Tritt Andrew
Yelick Katherine
Publication venue
Publication date: 01/01/2018
Field of study

Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

MSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting

Author: Li Yang
XifengYan
Publication venue
Publication date: 25/05/2015
Field of study

A major challenge in next-generation genome sequencing (NGS) is to assemble massive overlapping short reads that are randomly sampled from DNA fragments. To complete assembling, one needs to finish a fundamental task in many leading assembly algorithms: counting the number of occurrences of k-mers (length-k substrings in sequences). The counting results are critical for many components in assembly (e.g. variants detection and read error correction). For large genomes, the k-mer counting task can easily consume a huge amount of memory, making it impossible for large-scale parallel assembly on commodity servers. In this paper, we develop MSPKmerCounter, a disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. Our approach is based on a novel technique called Minimum Substring Partitioning (MSP). MSP breaks short reads into multiple disjoint partitions such that each partition can be loaded into memory and processed individually. By leveraging the overlaps among the k-mers derived from the same short read, MSP can achieve astonishing compression ratio so that the I/O cost can be significantly reduced. For the task of k-mer counting, MSPKmerCounter offers a very fast and memory-efficient solution. Experiment results on large real-life short reads data sets demonstrate that MSPKmerCounter can achieve better overall performance than state-of-the-art k-mer counting approaches. MSPKmerCounter is available at http://www.cs.ucsb.edu/~yangli/MSPKmerCounte

arXiv.org e-Print Archive

CiteSeerX

Research in the effective implementation of guidance computers with large scale arrays Interim report

Author: Burke J. A.
Disparte C. P.
Erwin F. D.
Mc Kevitt J. F.
Pariser J. J.
Schardin C. H.
Publication venue
Publication date
Field of study

Functional logic character implementation in breadboard design of NASA modular compute

NASA Technical Reports Server

T-infinity: The Dependency Inversion Principle for Rapid and Sustainable Multidisciplinary Software Development

Author: Anderson William K.
Biedron Robert T.
Carlson Jan-Renee
Druyor Cameron T.
Jacobson Kevin E.
Jones William T.
Kleb William L.
Nielsen Eric J.
O'Connell Matthew D.
Park Michael A.
Thompson Kyle B.
Zhang Cindy
Publication venue
Publication date
Field of study

The CFD Vision 2030 Study recommends that, NASA should develop and maintain an integrated simulation and software development infrastructure to enable rapid CFD technology maturation.... [S]oftware standards and interfaces must be emphasized and supported whenever possible, and open source models for noncritical technology components should be adopted. The current paper presents an approach to an open source development architecture, named T-infinity, for accelerated research in CFD leveraging the Dependency Inversion Principle to realize plugins that communicate through collections of functions without exposing internal data structures. Steady state flow visualization, mesh adaptation, fluid-structure interaction, and overset domain capabilities are demonstrated through compositions of plugins via standardized abstract interfaces without the need for source code dependencies between disciplines. Plugins interact through abstract interfaces thereby avoiding N 2 direct code-to-code data structure coupling where N is the number of codes. This plugin architecture enhances sustainable development by controlling the interaction between components to limit software complexity growth. The use of T-infinity abstract interfaces enables multidisciplinary application developers to leverage legacy applications alongside newly-developed capabilities. While rein, a description of interface details is deferred until the are more thoroughly tested and can be closed to modification

NASA Technical Reports Server

Unified Framework for Finite Element Assembly

Author: Alnæs Martin Sandve
Langtangen Hans Petter
Logg Anders
Mardal Kent-Andre
Skavhaug Ola
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2009
Field of study

At the heart of any finite element simulation is the assembly of matrices and vectors from discrete variational forms. We propose a general interface between problem-specific and general-purpose components of finite element programs. This interface is called Unified Form-assembly Code (UFC). A wide range of finite element problems is covered, including mixed finite elements and discontinuous Galerkin methods. We discuss how the UFC interface enables implementations of variational form evaluation to be independent of mesh and linear algebra components. UFC does not depend on any external libraries, and is released into the public domain

arXiv.org e-Print Archive

Crossref

Chalmers Research

The finite element machine: An experiment in parallel processing

Author: Adams L.
Crockett T. W.
Knott J. D.
Peebles S. W.
Storaasli O. O.
Publication venue
Publication date
Field of study

The finite element machine is a prototype computer designed to support parallel solutions to structural analysis problems. The hardware architecture and support software for the machine, initial solution algorithms and test applications, and preliminary results are described

NASA Technical Reports Server

Using Rapid Prototyping in Computer Architecture Design Laboratories

Author: Binh Dao
Henry Owen
James Hamblen
Sudhakar Yalamanchili
Publication venue
Publication date: 01/01/1996
Field of study

This paper describes the undergraduate computer architecture courses and laboratories introduced at Georgia Tech during the past two years. A core sequence of six required courses for computer engineering students has been developed. In this paper, emphasis is placed upon the new core laboratories which utilize commercial CAD tools, FPGAs, hardware emulators, and a VHDL based rapid prototyping approach to simulate, synthesize, and implement prototype computer hardware

CiteSeerX

Crossref