4 research outputs found

    Adaptive learning of compressible strings

    Get PDF
    Suppose an oracle knows a string S that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is s a substring of S?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle Sigma n/4 - O(n) queries in order to be able to reconstruct the hidden string, where Sigma is the size of the alphabet of S and n its length, and gave an algorithm that spends (Sigma - 1)n + O(Sigma root n) queries to reconstruct S. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compressor that compresses the string to Tau bits, performs q = O(Tau) substring queries; this algorithm, however, runs in exponential time. For this reason, the second part of the paper focuses on more time-efficient algorithms whose number of queries is bounded by specific compressibility measures. We first show that any string of length n over an integer alphabet of size Sigma with rle runs can be reconstructed with q = O(rle(Sigma + log nrle)) substring queries in linear time and space. We then present an algorithm that spends q is an element of O (Sigma g log n) substring queries and runs in O (n(logn + log Sigma) + q) time using linear space, where g is the size of a smallest straight-line program generating the string. (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

    A linear algorithm for string reconstruction in the reverse complement equivalence model

    Get PDF
    In the reverse complement equivalence model, it is not possible to distinguish a string from its reverse complement. We show that one can still reconstruct a string of length n, up to reverse complement, using a linear number of subsequence queries of bounded length. We first give the proof for strings over a binary alphabet, and then extend it to arbitrary finite alphabets. A simple information theoretic lower bound proves the number of queries to be asymptotically tight. Furthermore, our result is optimal w.r.t. the bound on the query length given in Erdos et al. (2006) [6]

    Tight bounds for string reconstruction using substring queries

    No full text
    Abstract We resolve two open problems presented in [8]. First, we consider the problem of recon-structing an unknown string T over a fixed alphabet using queries of the form "does thestring S appear in T? " for some query string S. We show that every non-adaptive algorithmmust make \Omega ( ffl-1/2n2) queries in order to reconstruct a 1- ffl fraction of the strings of length n. The second problem is reconstructing a string using queries of the form "does a stringfrom S appear in T?", where S is a set of strings. We show a non-adaptive reconstructionalgorithm for this model which is optimal both in the number of queries, and in the lengt