15 research outputs found

    Range Shortest Unique Substring queries

    Get PDF
    Let be a string of length n and be the substring of starting at position i and ending at position j. A substring of is a repeat if it occurs more than once in; otherwise, it is a unique substring of. Repeats and unique substrings are of great interest in computational biology and in information retrieval. Given string as input, the Shortest Unique Substring problem is to find a shortest substring of that does not occur elsewhere in. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over answering the following type of online queries efficiently. Given a range, return a shortest substring of with exactly one occurrence in. We present an -word data structure with query time, where is the word size. Our construction is based on a non-trivial reduction allowing us to apply a recently introduced optimal geometric data structure [Chan et al. ICALP 2018]

    Lower Bounds for Multilinear Order-Restricted ABPs

    Get PDF
    Proving super-polynomial lower bounds on the size of syntactic multilinear Algebraic Branching Programs (smABPs) computing an explicit polynomial is a challenging problem in Algebraic Complexity Theory. The order in which variables in {x_1,...,x_n} appear along any source to sink path in an smABP can be viewed as a permutation in S_n. In this article, we consider the following special classes of smABPs where the order of occurrence of variables along a source to sink path is restricted: 1) Strict circular-interval ABPs: For every sub-program the index set of variables occurring in it is contained in some circular interval of {1,..., n}. 2) L-ordered ABPs: There is a set of L permutations (orders) of variables such that every source to sink path in the smABP reads variables in one of these L orders, where L 0. We prove exponential (i.e., 2^{Omega(n^delta)}, delta>0) lower bounds on the size of above models computing an explicit multilinear 2n-variate polynomial in VP. As a main ingredient in our lower bounds, we show that any polynomial that can be computed by an smABP of size S, can be written as a sum of O(S) many multilinear polynomials where each summand is a product of two polynomials in at most 2n/3 variables, computable by smABPs. As a corollary, we show that any size S syntactic multilinear ABP can be transformed into a size S^{O(sqrt{n})} depth four syntactic multilinear Sigma Pi Sigma Pi circuit where the bottom Sigma gates compute polynomials on at most O(sqrt{n}) variables. Finally, we compare the above models with other standard models for computing multilinear polynomials

    Approximation Schemes for Subset Sum Ratio Problems

    Get PDF
    We consider the Subset Sum Ratio Problem (SSRSSR), in which given a set of integers the goal is to find two subsets such that the ratio of their sums is as close to~1 as possible, and introduce a family of variations that capture additional meaningful requirements. Our main contribution is a generic framework that yields fully polynomial time approximation schemes (FPTAS) for problems in this family that meet certain conditions. We use our framework to design explicit FPTASs for two such problems, namely Two-Set Subset-Sum Ratio and Factor-rr Subset-Sum Ratio, with running time O(n4/ε)\mathcal{O}(n^4/\varepsilon), which coincides with the best known running time for the original SSRSSR problem [15]

    Efficient Data Structures for Text Processing Applications

    Get PDF
    This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models
    corecore