Search CORE

91 research outputs found

Extreme Scale De Novo Metagenome Assembly

Author: Arndt Bill
Buluc Aydin
Egan Rob
Georganas Evangelos
Goltsman Eugene
Hofmeyr Steven
Oliker Leonid
Tritt Andrew
Yelick Katherine
Publication venue
Publication date: 01/01/2018
Field of study

Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Architecture for an Artificial Immune System

Author: Brooks R. A.
Jerne N. K.
Stephanie Forrest
Steven A. Hofmeyr
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

Author: Buluç Aydın
Ding Nan
Ellis Marquita
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Santambrogio Marco D.
Yelick Katherine
Zeni Alberto
Publication venue
Publication date: 01/01/2020
Field of study

Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6x and 30.7x using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3x LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6x. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near-optimal on the NVIDIA Tesla V100s

arXiv.org e-Print Archive

eScholarship - University of California

to Computer Security By

Author: Steven Andrew Hofmeyr
Steven Andrew Hofmeyr
Steven Andrew Hofmeyr
Steven Hofmeyr
Publication venue
Publication date
Field of study

CiteSeerX

Terabase-scale metagenome coassembly with MetaHipMer

Author: Hofmeyr Steven,
Publication venue
Publication date: 10/08/2020
Field of study

Ezid

An Immunological Model of Distributed Detection and Its Application to Computer Security

Author: Steven Andrew Hofmeyr
Publication venue
Publication date
Field of study

This dissertation explores an immunological model of distributed detection, called negative detection, and studies its performance in the domain of intrusion detection on computer networks. The goal of the detection system is to distinguish between illegitimate behaviour (nonself ), and legitimate behaviour (self ). The detection system consists of sets of negative detectors that detect instances of nonself; these detectors are distributed across multiple locations. The negative detection model was developed previously; this research extends that previous work in several ways. Firstly, analyses are derived for the negative detection model. In particular, a framework for explicitly incorporating distribution is developed, and is used to demonstrate that negative detection is both scalable and robust. Furthermore, it is shown that any scalable distributed detection system that requires communication (memory sharing) is always less robust than a system that does not require communication..

CiteSeerX