992 research outputs found

    Shortest prefix strings containing all subset permutations

    Get PDF
    AbstractWhat is the length of the shortest string consisting of elements of {1,
n} that contains as subsequences all permutations of any k-element subset? Many authors have considered the special case where k=n. We instead consider an incremental variation on this problem first proposed by Koutas and Hu. For a fixed value of n they ask for a string such that for all values of kâ©œn, the prefix containing all permutations of any k-element subset as subsequences is as short as possible. The problem can also be viewed as follows:For k=1 one needs n distinct digits to find each of the n possible permutations. In going from k to k+1, one starts with a string containing all k-element permutations as subsequences, and one adds as few digits as possible to the end of the string so that the new string contains all (k+1)-element permutations.We give a new construction that gives shorter strings than the best previous construction. We then prove a weak form of lower bound for the number of digits added in successive suffixes. The lower bound proof leads to a construction that matches the bound exactly. The length of a shortest prefix string is k(n−2)+[13(k+1)]+3, for k > 2.The lengths for k=1, 2 are n and 2n−1. This proves the natural conjecture that requiring the strings to be prefixes strictly increases the length of the strings required for all but the smallest values of k

    Superstrings with multiplicities

    Get PDF
    A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.Peer reviewe

    Normal, Abby Normal, Prefix Normal

    Full text link
    A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number pnw(n)pnw(n) of prefix normal words of length nn, showing that pnw(n)=Ω(2n−cnln⁥n)pnw(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right) for some cc and pnw(n)=O(2n(ln⁥n)2n)pnw(n) = O \left(\frac{2^n (\ln n)^2}{n}\right). We introduce efficient algorithms for testing the prefix normal property and a "mechanical algorithm" for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes to stay normal but Bob wants to drive her "abnormal" -- we discuss which parameter settings allow Alice to succeed.Comment: Accepted at FUN '1

    Around Kolmogorov complexity: basic notions and results

    Full text link
    Algorithmic information theory studies description complexity and randomness and is now a well known field of theoretical computer science and mathematical logic. There are several textbooks and monographs devoted to this theory where one can find the detailed exposition of many difficult results as well as historical references. However, it seems that a short survey of its basic notions and main results relating these notions to each other, is missing. This report attempts to fill this gap and covers the basic notions of algorithmic information theory: Kolmogorov complexity (plain, conditional, prefix), Solomonoff universal a priori probability, notions of randomness (Martin-L\"of randomness, Mises--Church randomness), effective Hausdorff dimension. We prove their basic properties (symmetry of information, connection between a priori probability and prefix complexity, criterion of randomness in terms of complexity, complexity characterization for effective dimension) and show some applications (incompressibility method in computational complexity theory, incompleteness theorems). It is based on the lecture notes of a course at Uppsala University given by the author

    The Google Similarity Distance

    Full text link
    Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.Comment: 15 pages, 10 figures; changed some text/figures/notation/part of theorem. Incorporated referees comments. This is the final published version up to some minor changes in the galley proof

    Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

    Get PDF
    Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an O(n(log⁥n)2/3)O(n (\log n)^{2/3})-time algorithm for answering O(n)O(n) LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to O(nlog⁥log⁥n)O(n \log \log n) time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any nn such non-crossing queries can be answered on-line in O(nα(n))O(n \alpha(n)) time, which yields an O(nα(n))O(n \alpha(n))-time algorithm for computing runs
    • 

    corecore