Search CORE

3,030 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Extreme Scale De Novo Metagenome Assembly

Author: Arndt Bill
Buluc Aydin
Egan Rob
Georganas Evangelos
Goltsman Eugene
Hofmeyr Steven
Oliker Leonid
Tritt Andrew
Yelick Katherine
Publication venue
Publication date: 01/01/2018
Field of study

Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

A Secure Cooperative Sensing Protocol for Cognitive Radio Networks

Author: Garrigues Olivella Carles
Rifà Pous Helena
Publication venue: Publicacions URV
Publication date: 07/09/2010
Field of study

Cognitive radio networks sense spectrum occupancy and manage themselves to operate in unused bands without disturbing licensed users. Spectrum sensing is more accurate if jointly performed by several reliable nodes. Even though cooperative sensing is an active area of research, the secure authentication of local sensing reports remains unsolved, thus empowering false results. This paper presents a distributed protocol based on digital signatures and hash functions, and an analysis of its security features. The system allows determining a final sensing decision from multiple sources in a quick and secure way.Las redes de radio cognitiva detectora de espectro se las arreglan para operar en las nuevas bandas sin molestar a los usuarios con licencia. La detección de espectro es más precisa si el conjunto está realizado por varios nodos fiables. Aunque la detección cooperativa es un área activa de investigación, la autenticación segura de informes locales de detección no ha sido resuelta, por lo tanto se pueden dar resultados falsos. Este trabajo presenta un protocolo distribuido basado en firmas digitales y en funciones hash, y un análisis de sus características de seguridad. El sistema permite determinar una decisión final de detección de múltiples fuentes de una manera rápida y segura.Les xarxes de ràdio cognitiva detectora d'espectre se les arreglen per operar en les noves bandes sense destorbar els usuaris amb llicència. La detecció d'espectre és més precisa si el conjunt està realitzat per diversos nodes fiables. Encara que la detecció cooperativa és una àrea activa d'investigació, l'autenticació segura d'informes locals de detecció no ha estat resolta, per tant es poden donar resultats falsos. Aquest treball presenta un protocol distribuït basat en signatures digitals i en funcions hash, i una anàlisi de les seves característiques de seguretat. El sistema permet determinar una decisió final de detecció de múltiples fonts d'una manera ràpida i segura

The Oberta in open access

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

Author: A. Bowe
A. Kirsch
E. Porat
F.R. Blattner
J. Pell
J.R. Miller
M.G. Grabherr
P.A. Pevzner
R. Chikhi
T.C. Conway
Y. Peng
Z. Iqbal
Publication venue
Publication date: 01/01/2013
Field of study

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Efficient pebbling for list traversal synopses

Author: Matias Yossi
Porat Ely
Publication venue
Publication date: 01/01/2002
Field of study

We show how to support efficient back traversal in a unidirectional list, using small memory and with essentially no slowdown in forward steps. Using

O(\log n)

memory for a list of size

n

, the

i

'th back-step from the farthest point reached so far takes

O(\log i)

time in the worst case, while the overhead per forward step is at most

\epsilon

for arbitrary small constant

\epsilon>0

. An arbitrary sequence of forward and back steps is allowed. A full trade-off between memory usage and time per back-step is presented:

k

vs.

kn^{1/k}

and vice versa. Our algorithms are based on a novel pebbling technique which moves pebbles on a virtual binary, or

t

-ary, tree that can only be traversed in a pre-order fashion. The compact data structures used by the pebbling algorithms, called list traversal synopses, extend to general directed graphs, and have other interesting applications, including memory efficient hash-chain implementation. Perhaps the most surprising application is in showing that for any program, arbitrary rollback steps can be efficiently supported with small overhead in memory, and marginal overhead in its ordinary execution. More concretely: Let