3,030 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Extreme Scale De Novo Metagenome Assembly
Metagenome assembly is the process of transforming a set of short,
overlapping, and potentially erroneous DNA segments from environmental samples
into the accurate representation of the underlying microbiomes's genomes.
State-of-the-art tools require big shared memory machines and cannot handle
contemporary metagenome datasets that exceed Terabytes in size. In this paper,
we introduce the MetaHipMer pipeline, a high-quality and high-performance
metagenome assembler that employs an iterative de Bruijn graph approach.
MetaHipMer leverages a specialized scaffolding algorithm that produces long
scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is
end-to-end parallelized using the Unified Parallel C language and therefore can
run seamlessly on shared and distributed-memory systems. Experimental results
show that MetaHipMer matches or outperforms the state-of-the-art tools in terms
of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and
is able to assemble previously intractable grand challenge metagenomes. We
demonstrate the unprecedented capability of MetaHipMer by computing the first
full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion
reads - size 2.6 TBytes.Comment: Accepted to SC1
A Secure Cooperative Sensing Protocol for Cognitive Radio Networks
Cognitive radio networks sense spectrum occupancy
and manage themselves to operate in unused bands without disturbing licensed users. Spectrum sensing is more accurate if jointly performed by several reliable nodes. Even though cooperative sensing is an active area of research, the secure
authentication of local sensing reports remains unsolved, thus empowering false results. This paper presents a distributed protocol based on digital signatures and hash functions, and an
analysis of its security features. The system allows determining a final sensing decision from multiple sources in a quick and secure way.Las redes de radio cognitiva detectora de espectro se las arreglan para operar en las nuevas bandas sin molestar a los usuarios con licencia. La detección de espectro es más precisa
si el conjunto está realizado por varios nodos fiables. Aunque la detección cooperativa es un área activa de investigación, la autenticación segura de informes locales de detección no ha sido resuelta, por lo tanto se pueden dar resultados falsos. Este trabajo presenta un protocolo distribuido basado en firmas digitales y en funciones hash, y un análisis de sus características de seguridad. El sistema permite determinar una decisión final de detección de múltiples fuentes de una manera rápida y segura.Les xarxes de ràdio cognitiva detectora d'espectre se les arreglen per operar en les noves bandes sense destorbar els usuaris amb llicència. La detecció d'espectre és més precisa
si el conjunt està realitzat per diversos nodes fiables. Encara que la detecció cooperativa és una àrea activa d'investigació, l'autenticació segura d'informes locals de detecció no ha estat resolta, per tant es poden donar resultats falsos. Aquest treball presenta un protocol distribuït basat en signatures digitals i en funcions hash, i una anàlisi de les seves característiques de seguretat. El sistema permet determinar una decisió final de detecció de múltiples fonts d'una manera ràpida i segura
Using cascading Bloom filters to improve the memory usage for de Brujin graphs
De Brujin graphs are widely used in bioinformatics for processing
next-generation sequencing data. Due to a very large size of NGS datasets, it
is essential to represent de Bruijn graphs compactly, and several approaches to
this problem have been proposed recently. In this work, we show how to reduce
the memory required by the algorithm of [3] that represents de Brujin graphs
using Bloom filters. Our method requires 30% to 40% less memory with respect to
the method of [3], with insignificant impact to construction time. At the same
time, our experiments showed a better query time compared to [3]. This is, to
our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte
Efficient pebbling for list traversal synopses
We show how to support efficient back traversal in a unidirectional list,
using small memory and with essentially no slowdown in forward steps. Using
memory for a list of size , the 'th back-step from the
farthest point reached so far takes time in the worst case, while
the overhead per forward step is at most for arbitrary small
constant . An arbitrary sequence of forward and back steps is
allowed. A full trade-off between memory usage and time per back-step is
presented: vs. and vice versa. Our algorithms are based on a
novel pebbling technique which moves pebbles on a virtual binary, or -ary,
tree that can only be traversed in a pre-order fashion. The compact data
structures used by the pebbling algorithms, called list traversal synopses,
extend to general directed graphs, and have other interesting applications,
including memory efficient hash-chain implementation. Perhaps the most
surprising application is in showing that for any program, arbitrary rollback
steps can be efficiently supported with small overhead in memory, and marginal
overhead in its ordinary execution. More concretely: Let be a program that
runs for at most steps, using memory of size . Then, at the cost of
recording the input used by the program, and increasing the memory by a factor
of to , the program can be extended to support an
arbitrary sequence of forward execution and rollback steps: the 'th rollback
step takes time in the worst case, while forward steps take O(1)
time in the worst case, and amortized time per step.Comment: 27 page
- …