7 research outputs found

    Binary Jumbled String Matching for Highly Run-Length Compressible Texts

    Full text link
    The Binary Jumbled String Matching problem is defined as: Given a string ss over {a,b}\{a,b\} of length nn and a query (x,y)(x,y), with x,yx,y non-negative integers, decide whether ss has a substring tt with exactly xx aa's and yy bb's. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time O(n2/logn)O(n^2/\log n) [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or O(n2/log2n)O(n^2/\log^2 n) in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of ss. The construction time of our index is O(n+ρ2logρ)O(n+\rho^2\log \rho), where O(n) is the time for computing the run-length encoding of ss and ρ\rho is the length of this encoding---this is no worse than previous solutions if ρ=O(n/logn)\rho = O(n/\log n) and better if ρ=o(n/logn)\rho = o(n/\log n). Our index LL can be queried in O(logρ)O(\log \rho) time. While L=O(min(n,ρ2))|L|= O(\min(n, \rho^{2})) in the worst case, preliminary investigations have indicated that L|L| may often be close to ρ\rho. Furthermore, the algorithm for constructing the index is conceptually simple and easy to implement. In an attempt to shed light on the structure and size of our index, we characterize it in terms of the prefix normal forms of ss introduced in [Fici and Lipt\'ak, DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures on size of Corner Index (we no longer conjecture it to be always linear in size of RLE); removed experimental part on random strings (these are valid but limited in their predictive power w.r.t. general strings); v3 published in IP

    Algorithms for Computing Abelian Periods of Words

    Full text link
    Constantinescu and Ilie (Bulletin EATCS 89, 167--170, 2006) introduced the notion of an \emph{Abelian period} of a word. A word of length nn over an alphabet of size σ\sigma can have Θ(n2)\Theta(n^{2}) distinct Abelian periods. The Brute-Force algorithm computes all the Abelian periods of a word in time O(n2×σ)O(n^2 \times \sigma) using O(n×σ)O(n \times \sigma) space. We present an off-line algorithm based on a \sel function having the same worst-case theoretical complexity as the Brute-Force one, but outperforming it in practice. We then present on-line algorithms that also enable to compute all the Abelian periods of all the prefixes of ww.Comment: Accepted for publication in Discrete Applied Mathematic

    On the Parikh-de-Bruijn grid

    Full text link
    We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed-order Parikh vectors, and whose edges are given by a simple shift operation. This graph gives structural insight into the nature of sets of Parikh vectors as well as that of the Parikh set of a given string. We show its utility by proving some results on Parikh-de-Bruijn strings, the abelian analog of de-Bruijn sequences.Comment: 18 pages, 3 figures, 1 tabl

    Reconstruction of Trees from Jumbled and Weighted Subtrees

    Get PDF
    Let T be an edge-labeled graph, where the labels are from a finite alphabet Sigma. For a subtree U of T the Parikh vector of U is a vector of length |Sigma| which specifies the multiplicity of each label in U. We ask when T can be reconstructed from the multiset of Parikh vectors of all its subtrees, or all of its paths, or all of its maximal paths. We consider the analogous problems for weighted trees. We show how several well-known reconstruction problems on labeled strings, weighted strings and point sets on a line can be included in this framework. We present reconstruction algorithms and non-reconstructibility results, and extend the polynomial method, previously applied to jumbled strings [Acharya et al., SIAM J. on Discr. Math, 2015] and weighted strings [Bansal et al., CPM 2004], to deal with general trees and special tree classes
    corecore