Search CORE

8 research outputs found

Range Shortest Unique Substring queries

Author: Abedin P. (Paniz)
Ganguly A. (Arnab)
Pissis S. (Solon)
Thankachan S.V. (Sharma)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2019
Field of study

Let be a string of length n and be the substring of starting at position i and ending at position j. A substring of is a repeat if it occurs more than once in; otherwise, it is a unique substring of. Repeats and unique substrings are of great interest in computational biology and in information retrieval. Given string as input, the Shortest Unique Substring problem is to find a shortest substring of that does not occur elsewhere in. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over answering the following type of online queries efficiently. Given a range, return a shortest substring of with exactly one occurrence in. We present an -word data structure with query time, where is the word size. Our construction is based on a non-trivial reduction allowing us to apply a recently introduced optimal geometric data structure [Chan et al. ICALP 2018]

CWI's Institutional Repository

Efficient data structures for range shortest unique substring queries†

Author: Abedin P. (Paniz)
Ganguly A. (Arnab)
Pissis S. (Solon)
Thankachan S.V. (Sharma)
Publication venue: 'MDPI AG'
Publication date: 01/11/2020
Field of study

Let T[1, n] be a string of length n and T[i, j] be the substring of T starting at position i and ending at position j. A substring T[i, j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α, β], return a shortest substring T[i, j] of T with exactly one occurrence in [α, β]. We present an O(n log n)-word data structure with O(logw n) query time, where w = Ω(log n) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(√ n logɛ n) query time, where ɛ > 0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012]

CWI's Institutional Repository

Efficient Data Structures for Text Processing Applications

Author: Abedin Paniz
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/12/2021
Field of study

This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Internal shortest absent word queries

Author: Badkobeh G. (Golnaz)
Charalampopoulos P. (Panagiotis)
Pissis S. (Solon)
Publication venue
Publication date: 01/01/2021
Field of study

Given a string T of length n over an alphabet Σ ⊂ {1, 2, . . . , nO(1)} of size σ, we are to preprocess T so that given a range [i, j], we can return a representation of a shortest string over Σ that is absent in the fragment T[i] · · · T[j] of T. For any positive integer k ∈ [1, log logσ n], we present an O((n/k) · log logσ n)-size data structure, which can be constructed in O(n logσ n) time, and answers queries in time O(log logσ k)

VU Research Portal

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

Internal Shortest Absent Word Queries in Constant Time and Linear Space

Author: Badkobeh Golnaz
Charalampopoulos Panagiotis
Kosolobov Dmitry
Pissis Solon,
Publication venue: HAL CCSD
Publication date: 05/07/2021
Field of study

International audienceGiven a string T of length n over an alphabet Σ ⊂ {1, 2,. .. , n O(1) } of size σ, we are to preprocess T so that given a range [i, j], we can return a representation of a shortest string over Σ that is absent in the fragment T [i] • • • T [j] of T. We present an O(n)-space data structure that answers such queries in constant time and can be constructed in O(n log σ n) time

INRIA a CCSD electronic archive server

Climatic and Topologic Controls on the Complexity of River Networks

Author: Ranjbar Moshfeghi Sevil
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

The emergence and evolution of channel networks are controlled by the competition between the hillslopes and fluvial processes on the landscape. Investigating the geomorphic and topologic properties of these networks is important for developing predictive models describing the network dynamics under changing environment as well as for quantifying the roles of processes in creating distinct patterns of channel networks. In this dissertation, the response of landscapes to changing climatic forcing via numerical-modeling and field observations was investigated. A new framework was proposed to evaluate the complexity of catchments using two different representations of channel networks. The structural complexity was studied using the width function, which characterizes the spatial arrangement of channels. Whereas, the functional complexity was explored using the incremental area function, capturing the patterns of transport of fluxes. Our analysis reveals stronger controls of topological connectivity on the functional complexity than on structural complexity, indicating that the unchannelized surface (hillslope) contributes to the increase of heterogeneity in transport processes. Furthermore, the channel network structure was investigated using a physically-based numerical landscape evolution model for varying hillslope and fluvial processes. Different magnitudes of soil transport (D) and fluvial incision (K) coefficients represent different magnitudes of hillslope and fluvial processes. We show that different combinations of D and K result in distinct branching structure in landscapes. For example, for smaller D and K combinations (mimicking dry climate), a higher number of branching channels was observed. Whereas, for larger D and K combinations (mimicking humid climate), a higher number of side-branching channels is obtained. These results are consistent with the field observations suggesting that varying climatic conditions imprint distinct signatures on the branching structure of channel networks

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A Linear Space Data Structure For Range Lcp Queries

Author: Ganguly Arnab
Patil Manish
Shah Rahul
Thankachan Sharma V.
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2018
Field of study

Range LCP (longest common prefix) is an extension of the classical LCP problem and is defined as follows: Preprocess a string S[1...n] of n characters, such that whenever an interval [i, j] comes as a query, we can report max{|LCP(S p, S q)| | i ≤ p \u3c q ≤ j} Here LCP(S p, S q) is the longest common prefix of the suffixes of S starting at locations p and q, and |LCP(S p, S q)| is its length. This problem was first addressed by Amir et al. [ISAAC, 2011]. They showed that the query can be answered in O(log log n) time using an O(n log 1+ϵ n) space data structure for an arbitrarily small constant ϵ \u3e 0. In an attempt to reduce the space bound, they presented a linear space data structure of O(d log log n) query time, where d = (j - i + 1). In this paper, we present a new linear space data structure with an improved query time of O(dlogd (logn) 1/2-ϵ)

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A Linear-Space Data Structure For Range-Lcp Queries In Poly-Logarithmic Time

Author: Abedin Paniz
Ganguly Arnab
Hon Wing Kai
Nekrich Yakov
Sadakane Kunihiko
Publication venue: STARS
Publication date: 01/01/2018
Field of study

Let (Formula Presented) be a text of length n and (Formula Presented) be the suffix starting at position i. Also, for any two strings X and Y, let (Formula Presented) denote their longest common prefix. The range-LCP of (Formula Presented) w.r.t. a range (Formula Presented), where (Formula Presented) is Amir et al. [ISAAC 2011] introduced the indexing version of this problem, where the task is to build a data structure over (Formula Presented), so that (Formula Presented) for any query range (Formula Presented) can be reported efficiently. They proposed an (Formula Presented) space structure with query time (Formula Presented), and a linear space (i.e., O(n) words) structure with query time (Formula Presented), where (Formula Presented) is the length of the input range and (Formula Presented) is an arbitrarily small constant. Later, Patil et al. [SPIRE 2013] proposed another linear space structure with an improved query time of (Formula Presented). This poses an interesting question, whether it is possible to answer (Formula Presented) queries in poly-logarithmic time using a linear space data structure. In this paper, we settle this question by presenting an O(n) space data structure with query time (Formula Presented) and construction time (Formula Presented)

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)