8 research outputs found

    Range Shortest Unique Substring queries

    Get PDF
    Let be a string of length n and be the substring of starting at position i and ending at position j. A substring of is a repeat if it occurs more than once in; otherwise, it is a unique substring of. Repeats and unique substrings are of great interest in computational biology and in information retrieval. Given string as input, the Shortest Unique Substring problem is to find a shortest substring of that does not occur elsewhere in. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over answering the following type of online queries efficiently. Given a range, return a shortest substring of with exactly one occurrence in. We present an -word data structure with query time, where is the word size. Our construction is based on a non-trivial reduction allowing us to apply a recently introduced optimal geometric data structure [Chan et al. ICALP 2018]

    Efficient data structures for range shortest unique substring queries†

    Get PDF
    Let T[1, n] be a string of length n and T[i, j] be the substring of T starting at position i and ending at position j. A substring T[i, j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α, β], return a shortest substring T[i, j] of T with exactly one occurrence in [α, β]. We present an O(n log n)-word data structure with O(logw n) query time, where w = Ω(log n) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(√ n logɛ n) query time, where ɛ > 0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012]

    Efficient Data Structures for Text Processing Applications

    Get PDF
    This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models

    Internal shortest absent word queries

    Get PDF
    Given a string T of length n over an alphabet Σ ⊂ {1, 2, . . . , nO(1)} of size σ, we are to preprocess T so that given a range [i, j], we can return a representation of a shortest string over Σ that is absent in the fragment T[i] · · · T[j] of T. For any positive integer k ∈ [1, log logσ n], we present an O((n/k) · log logσ n)-size data structure, which can be constructed in O(n logσ n) time, and answers queries in time O(log logσ k)

    Internal Shortest Absent Word Queries in Constant Time and Linear Space

    Get PDF
    International audienceGiven a string T of length n over an alphabet Σ ⊂ {1, 2,. .. , n O(1) } of size σ, we are to preprocess T so that given a range [i, j], we can return a representation of a shortest string over Σ that is absent in the fragment T [i] • • • T [j] of T. We present an O(n)-space data structure that answers such queries in constant time and can be constructed in O(n log σ n) time

    Climatic and Topologic Controls on the Complexity of River Networks

    Get PDF
    The emergence and evolution of channel networks are controlled by the competition between the hillslopes and fluvial processes on the landscape. Investigating the geomorphic and topologic properties of these networks is important for developing predictive models describing the network dynamics under changing environment as well as for quantifying the roles of processes in creating distinct patterns of channel networks. In this dissertation, the response of landscapes to changing climatic forcing via numerical-modeling and field observations was investigated. A new framework was proposed to evaluate the complexity of catchments using two different representations of channel networks. The structural complexity was studied using the width function, which characterizes the spatial arrangement of channels. Whereas, the functional complexity was explored using the incremental area function, capturing the patterns of transport of fluxes. Our analysis reveals stronger controls of topological connectivity on the functional complexity than on structural complexity, indicating that the unchannelized surface (hillslope) contributes to the increase of heterogeneity in transport processes. Furthermore, the channel network structure was investigated using a physically-based numerical landscape evolution model for varying hillslope and fluvial processes. Different magnitudes of soil transport (D) and fluvial incision (K) coefficients represent different magnitudes of hillslope and fluvial processes. We show that different combinations of D and K result in distinct branching structure in landscapes. For example, for smaller D and K combinations (mimicking dry climate), a higher number of branching channels was observed. Whereas, for larger D and K combinations (mimicking humid climate), a higher number of side-branching channels is obtained. These results are consistent with the field observations suggesting that varying climatic conditions imprint distinct signatures on the branching structure of channel networks

    A Linear Space Data Structure For Range Lcp Queries

    No full text
    Range LCP (longest common prefix) is an extension of the classical LCP problem and is defined as follows: Preprocess a string S[1...n] of n characters, such that whenever an interval [i, j] comes as a query, we can report max{|LCP(S p, S q)| | i ≤ p \u3c q ≤ j} Here LCP(S p, S q) is the longest common prefix of the suffixes of S starting at locations p and q, and |LCP(S p, S q)| is its length. This problem was first addressed by Amir et al. [ISAAC, 2011]. They showed that the query can be answered in O(log log n) time using an O(n log 1+ϵ n) space data structure for an arbitrarily small constant ϵ \u3e 0. In an attempt to reduce the space bound, they presented a linear space data structure of O(d log log n) query time, where d = (j - i + 1). In this paper, we present a new linear space data structure with an improved query time of O(dlogd (logn) 1/2-ϵ)

    A Linear-Space Data Structure For Range-Lcp Queries In Poly-Logarithmic Time

    No full text
    Let (Formula Presented) be a text of length n and (Formula Presented) be the suffix starting at position i. Also, for any two strings X and Y, let (Formula Presented) denote their longest common prefix. The range-LCP of (Formula Presented) w.r.t. a range (Formula Presented), where (Formula Presented) is Amir et al. [ISAAC 2011] introduced the indexing version of this problem, where the task is to build a data structure over (Formula Presented), so that (Formula Presented) for any query range (Formula Presented) can be reported efficiently. They proposed an (Formula Presented) space structure with query time (Formula Presented), and a linear space (i.e., O(n) words) structure with query time (Formula Presented), where (Formula Presented) is the length of the input range and (Formula Presented) is an arbitrarily small constant. Later, Patil et al. [SPIRE 2013] proposed another linear space structure with an improved query time of (Formula Presented). This poses an interesting question, whether it is possible to answer (Formula Presented) queries in poly-logarithmic time using a linear space data structure. In this paper, we settle this question by presenting an O(n) space data structure with query time (Formula Presented) and construction time (Formula Presented)
    corecore