7 research outputs found

    Cross-Document Pattern Matching

    Get PDF
    We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem

    Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss

    Get PDF
    Motivation: Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction

    A Simple Linear-Space Data Structure for Constant-Time Range Minimum Query

    Full text link
    Abstract. We revisit the range minimum query problem and present a new O(n)-space data structure that supports queries in O(1) time. Although previous data structures exist whose asymptotic bounds match ours, our goal is to introduce a new solution that is simple, intuitive, and practical without increasing asymptotic costs for query time or space

    Almost optimal exact distance oracles for planar graphs

    Get PDF
    We consider the problem of preprocessing a weighted directed planar graph in order to quickly answer exact distance queries. The main tension in this problem is between space S and query time Q, and since the mid-1990s all results had polynomial time-space tradeoffs, e.g., Q = ~ Ī˜(n/āˆš S) or Q = ~Ī˜(n5/2/S3/2). In this article we show that there is no polynomial tradeoff between time and space and that it is possible to simultaneously achieve almost optimal space n1+o(1) and almost optimal query time no(1). More precisely, we achieve the following space-time tradeoffs: n1+o(1) space and log2+o(1) n query time, n log2+o(1) n space and no(1) query time, n4/3+o(1) space and log1+o(1) n query time. We reduce a distance query to a variety of point location problems in additively weighted Voronoi diagrams and develop new algorithms for the point location problem itself using several partially persistent dynamic tree data structures

    Path Minima Queries in Dynamic Weighted Trees

    No full text
    Abstract. In the path minima problem on trees each tree edge is assigned a weight and a query asks for the edge with minimum weight on a path between two nodes. For the dynamic version of the problem on a tree, where the edge-weights can be updated, we give comparisonbased and RAM data structures that achieve optimal query time. These structures support inserting a node on an edge, inserting a leaf, and contracting edges. When only insertion and deletion of leaves in a tree are needed, we give two data structures that achieve optimal and significantly lower query times than when updating the edge-weights is allowed. One is a semigroup structure for which the edge-weights are from an arbitrary semigroup and queries ask for the semigroup-sum of the edge-weights on a given path. For the other structure the edge-weights are given in the word RAM. We complement these upper bounds with lower bounds for different variants of the problem.

    In-Memory Storage for Labeled Tree-Structured Data

    Get PDF
    In this thesis, we design in-memory data structures for labeled and weights trees, so that various types of path queries or operations can be supported with efficient query time. We assume the word RAM model with word size w, which permits random accesses to w-bit memory cells. Our data structures are space-efficient and many of them are even succinct. These succinct data structures occupy space close to the information theoretic lower bounds of the input trees within lower order terms. First, we study the problems of supporting various path queries over weighted trees. A path counting query asks for the number of nodes on a query path whose weights lie within a query range, while a path reporting query requires to report these nodes. A path median query asks for the median weight on a path between two given nodes, and a path selection query returns the k-th smallest weight. We design succinct data structures to support path counting queries in O(lg Ļƒ/ lg lg n + 1) time, path reporting queries in O((occ + 1)(lg Ļƒ/ lg lg n + 1)) time, and path median and path selection queries in O(lg Ļƒ/ lg lg Ļƒ) time, where n is the size of the input tree, the weights of nodes are drawn from [1..Ļƒ] and occ is the size of the output. Our results not only greatly improve the best known data structures [31, 75, 65], but also match the lower bounds for path counting, median and selection queries [86, 87, 71] when Ļƒ = Ī©(n/polylog(n)). Second, we study the problem of representing labeled ordinal trees succinctly. Our new representations support a much broader collection of operations than previous work. In our approach, labels of nodes are stored in a preorder label sequence, which can be compressed using any succinct representation of strings that supports access, rank and select operations. Thus, we present a framework for succinct representations of labeled ordinal trees that is able to handle large alphabets. This answers an open problem presented by Geary et al. [54], which asks for representations of labeled ordinal trees that remain space-efficient for large alphabets. We further extend our work and present the first succinct representations for dynamic labeled ordinal trees that support several label-based operations including finding the level ancestor with a given label. Third, we study the problems of supporting path minimum and semigroup path sum queries. In the path minimum problem, we preprocess a tree on n weighted nodes, such that given an arbitrary path, the node with the smallest weight along this path can be located. We design novel succinct indices for this problem under the indexing model, for which weights of nodes are read-only and can be accessed with ranks of nodes in the preorder traversal sequence of the input tree. One of our index structures supports queries in O(Ī±(m,n)) time, and occupies O(m) bits of space in addition to the space required for the input tree, where m is an integer greater than or equal to n and Ī±(m, n) is the inverse-Ackermann function. Following the same approach, we also develop succinct data structures for semigroup path sum queries, for which a query asks for the sum of weights along a given query path. Then, using the succinct indices for path minimum queries, we achieve three different time-space tradeoffs for path reporting queries. Finally, we study the problems of supporting various path queries in dynamic settings. We propose the first non-trivial linear-space solution that supports path reporting in O((lgn/lglgn)^2 +occlgn/lglgn)) query time, where n is the size of the input tree and occ is the output size, and the insertion and deletion of a node of an arbitrary degree in O(lg^{2+Īµ} n) amortized time, for any constant Īµ āˆˆ (0, 1). Obvious solutions based on directly dynamizing solutions to the static version of this problem all require Ī©((lg n/ lg lg n)^2) time for each node reported. We also design data structures that support path counting and path reporting queries in O((lg n/ lg lg n)^2) time, and insertions and deletions in O((lg n/ lg lg n)^2) amortized time. This matches the best known results for dynamic two-dimensional range counting [62] and range selection [63], which can be viewed as special cases of path counting and path selection
    corecore