790 research outputs found

    Combined Data Structure for Previous- and Next-Smaller-Values

    Get PDF
    Let AA be a static array storing nn elements from a totally ordered set. We present a data structure of optimal size at most nlog2(3+22)+o(n)n\log_2(3+2\sqrt{2})+o(n) bits that allows us to answer the following queries on AA in constant time, without accessing AA: (1) previous smaller value queries, where given an index ii, we wish to find the first index to the left of ii where AA is strictly smaller than at ii, and (2) next smaller value queries, which search to the right of ii. As an additional bonus, our data structure also allows to answer a third kind of query: given indices i<ji<j, find the position of the minimum in A[i..j]A[i..j]. Our data structure has direct consequences for the space-efficient storage of suffix trees.Comment: to appear in Theoretical Computer Scienc

    On the Complexity of the (Approximate) Nearest Colored Node Problem

    Get PDF
    Given a graph G=(V,E) where each vertex is assigned a color from the set C={c_1, c_2, .., c_sigma}. In the (approximate) nearest colored node problem, we want to query, given v in V and c in C, for the (approximate) distance dist^(v, c) from v to the nearest node of color c. For any integer 1 <= k <= log n, we present a Color Distance Oracle (also often referred to as Vertex-label Distance Oracle) of stretch 4k-5 using space O(kn sigma^{1/k}) and query time O(log{k}). This improves the query time from O(k) to O(log{k}) over the best known Color Distance Oracle by Chechik [Chechik, 2012]. We then prove a lower bound in the cell probe model showing that even for unweighted undirected paths any static data structure that uses space S requires at least Omega (log (log{sigma} / log(S/n)+log log{n})) query time to give a distance estimate of stretch O(polylog(n)). This implies for the important case when sigma = Theta(n^{epsilon}) for some constant 0 < epsilon < 1, that our Color Distance Oracle has asymptotically optimal query time in regard to k, and that recent Color Distance Oracles for trees [Tsur, 2018] and planar graphs [Mozes and Skop, 2018] achieve asymptotically optimal query time in regard to n. We also investigate the setting where the data structure additionally has to support color-reassignments. We present the first Color Distance Oracle that achieves query times matching our lower bound from the static setting for large stretch yielding an exponential improvement over the best known query time [Chechik, 2014]. Finally, we give new conditional lower bounds proving the hardness of answering queries if edge insertions and deletion are allowed that strictly improve over recent bounds in time and generality

    Data Structures for Categorical Path Counting Queries

    Get PDF

    GPU accelerating distributed succinct de Bruijn graph construction

    Get PDF
    The research and methods in the field of computational biology have grown in the last decades, thanks to the availability of biological data. One of the applications in computational biology is genome sequencing or sequence alignment, a method to arrange sequences of, for example, DNA or RNA, to determine regions of similarity between these sequences. Sequence alignment applications include public health purposes, such as monitoring antimicrobial resistance. Demand for fast sequence alignment has led to the usage of data structures, such as the de Bruijn graph, to store a large amount of information efficiently. De Bruijn graphs are currently one of the top data structures used in indexing genome sequences, and different methods to represent them have been explored. One of these methods is the BOSS data structure, a special case of Wheeler graph index, which uses succinct data structures to represent a de Bruijn graph. As genomes can take a large amount of space, the construction of succinct de Bruijn graphs is slow. This has led to experimental research on using large-scale cluster engines such as Apache Spark and Graphic Processing Units (GPUs) in genome data processing. This thesis explores the use of Apache Spark and Spark RAPIDS, a GPU computing library for Apache Spark, in the construction of a succinct de Bruijn graph index from genome sequences. The experimental results indicate that Spark RAPIDS can provide up to 8 times speedups to specific operations, but for some other operations has severe limitations that limit its processing power in terms of succinct de Bruijn graph index construction
    corecore