3,030 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Extreme Scale De Novo Metagenome Assembly

    Full text link
    Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

    A Secure Cooperative Sensing Protocol for Cognitive Radio Networks

    Get PDF
    Cognitive radio networks sense spectrum occupancy and manage themselves to operate in unused bands without disturbing licensed users. Spectrum sensing is more accurate if jointly performed by several reliable nodes. Even though cooperative sensing is an active area of research, the secure authentication of local sensing reports remains unsolved, thus empowering false results. This paper presents a distributed protocol based on digital signatures and hash functions, and an analysis of its security features. The system allows determining a final sensing decision from multiple sources in a quick and secure way.Las redes de radio cognitiva detectora de espectro se las arreglan para operar en las nuevas bandas sin molestar a los usuarios con licencia. La detección de espectro es más precisa si el conjunto está realizado por varios nodos fiables. Aunque la detección cooperativa es un área activa de investigación, la autenticación segura de informes locales de detección no ha sido resuelta, por lo tanto se pueden dar resultados falsos. Este trabajo presenta un protocolo distribuido basado en firmas digitales y en funciones hash, y un análisis de sus características de seguridad. El sistema permite determinar una decisión final de detección de múltiples fuentes de una manera rápida y segura.Les xarxes de ràdio cognitiva detectora d'espectre se les arreglen per operar en les noves bandes sense destorbar els usuaris amb llicència. La detecció d'espectre és més precisa si el conjunt està realitzat per diversos nodes fiables. Encara que la detecció cooperativa és una àrea activa d'investigació, l'autenticació segura d'informes locals de detecció no ha estat resolta, per tant es poden donar resultats falsos. Aquest treball presenta un protocol distribuït basat en signatures digitals i en funcions hash, i una anàlisi de les seves característiques de seguretat. El sistema permet determinar una decisió final de detecció de múltiples fonts d'una manera ràpida i segura

    Using cascading Bloom filters to improve the memory usage for de Brujin graphs

    Get PDF
    De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte

    Efficient pebbling for list traversal synopses

    Full text link
    We show how to support efficient back traversal in a unidirectional list, using small memory and with essentially no slowdown in forward steps. Using O(logn)O(\log n) memory for a list of size nn, the ii'th back-step from the farthest point reached so far takes O(logi)O(\log i) time in the worst case, while the overhead per forward step is at most ϵ\epsilon for arbitrary small constant ϵ>0\epsilon>0. An arbitrary sequence of forward and back steps is allowed. A full trade-off between memory usage and time per back-step is presented: kk vs. kn1/kkn^{1/k} and vice versa. Our algorithms are based on a novel pebbling technique which moves pebbles on a virtual binary, or tt-ary, tree that can only be traversed in a pre-order fashion. The compact data structures used by the pebbling algorithms, called list traversal synopses, extend to general directed graphs, and have other interesting applications, including memory efficient hash-chain implementation. Perhaps the most surprising application is in showing that for any program, arbitrary rollback steps can be efficiently supported with small overhead in memory, and marginal overhead in its ordinary execution. More concretely: Let PP be a program that runs for at most TT steps, using memory of size MM. Then, at the cost of recording the input used by the program, and increasing the memory by a factor of O(logT)O(\log T) to O(MlogT)O(M \log T), the program PP can be extended to support an arbitrary sequence of forward execution and rollback steps: the ii'th rollback step takes O(logi)O(\log i) time in the worst case, while forward steps take O(1) time in the worst case, and 1+ϵ1+\epsilon amortized time per step.Comment: 27 page
    corecore