Search CORE

154 research outputs found

NCUWM Talk Abstracts 2015

Author
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2015
Field of study

An extensive English language bibliography on graph theory and its applications

Author: Deo N.
Publication venue
Publication date
Field of study

Bibliography on graph theory and its application

NASA Technical Reports Server

Proceedings of the 3rd International Workshop on Optimal Networks Topologies IWONT 2010

Author
Publication venue: 'Iniciativa Digital Politecnica'
Publication date: 01/02/2011
Field of study

Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC

Parallel and scalable combinatorial string algorithms on distributed memory systems

Author: Flick Patrick
Publication venue: Georgia Institute of Technology
Publication date: 29/05/2019
Field of study

Methods for processing and analyzing DNA and genomic data are built upon combinatorial graph and string algorithms. The advent of high-throughput DNA sequencing is enabling the generation of billions of reads per experiment. Classical and sequential algorithms can no longer deal with these growing data sizes - which for the last 10 years have greatly out-paced advances in processor speeds. Processing and analyzing state-of-the-art genomic data sets require the design of scalable and efficient parallel algorithms and the use of large computing clusters. Suffix arrays and trees are fundamental string data structures, which lie at the foundation of many string algorithms, with important applications in text processing, information retrieval, and computational biology. Conversely, the parallel construction of these indices is an actively studied problem. However, prior approaches lacked good worst-case run-time guarantees and exhibit poor scaling and overall performance. In this work, we present our distributed-memory parallel algorithms for indexing large datasets, including algorithms for the distributed construction of suffix arrays, LCP arrays, and suffix trees. We formulate a generalized version of the All-Nearest-Smaller-Values problem, provide an optimal distributed solution, and apply it to the distributed construction of suffix trees - yielding a work-optimal parallel algorithm. Our algorithms for distributed suffix array and suffix tree construction improve the state-of-the-art by simultaneously improving worst-case run-time bounds and achieving superior practical performance. Next, we introduce a novel distributed string index, the Distributed Enhanced Suffix Array (DESA) - based on the suffix and LCP arrays, the DESA consists of these and additional distributed data structures. The DESA is designed to allow efficient pattern search queries in distributed memory while requiring at most O(n/p) memory per process. We present efficient distributed-memory parallel algorithms for querying, as well as for the efficient construction of this distributed index. Finally, we present our work on distributed-memory algorithms for clustering de Bruijn graphs and its application to solving a grand challenge metagenomic dataset.Ph.D

Scholarly Materials And Research @ Georgia Tech