7,322 research outputs found

    Improved bounds for testing Dyck languages

    Full text link
    In this paper we consider the problem of deciding membership in Dyck languages, a fundamental family of context-free languages, comprised of well-balanced strings of parentheses. In this problem we are given a string of length nn in the alphabet of parentheses of mm types and must decide if it is well-balanced. We consider this problem in the property testing setting, where one would like to make the decision while querying as few characters of the input as possible. Property testing of strings for Dyck language membership for m=1m=1, with a number of queries independent of the input size nn, was provided in [Alon, Krivelevich, Newman and Szegedy, SICOMP 2001]. Property testing of strings for Dyck language membership for m≄2m \ge 2 was first investigated in [Parnas, Ron and Rubinfeld, RSA 2003]. They showed an upper bound and a lower bound for distinguishing strings belonging to the language from strings that are far (in terms of the Hamming distance) from the language, which are respectively (up to polylogarithmic factors) the 2/32/3 power and the 1/111/11 power of the input size nn. Here we improve the power of nn in both bounds. For the upper bound, we introduce a recursion technique, that together with a refinement of the methods in the original work provides a test for any power of nn larger than 2/52/5. For the lower bound, we introduce a new problem called Truestring Equivalence, which is easily reducible to the 22-type Dyck language property testing problem. For this new problem, we show a lower bound of nn to the power of 1/51/5

    Grammars with two-sided contexts

    Full text link
    In a recent paper (M. Barash, A. Okhotin, "Defining contexts in context-free grammars", LATA 2012), the authors introduced an extension of the context-free grammars equipped with an operator for referring to the left context of the substring being defined. This paper proposes a more general model, in which context specifications may be two-sided, that is, both the left and the right contexts can be specified by the corresponding operators. The paper gives the definitions and establishes the basic theory of such grammars, leading to a normal form and a parsing algorithm working in time O(n^4), where n is the length of the input string.Comment: In Proceedings AFL 2014, arXiv:1405.527

    The Tandem Duplication Distance Is NP-Hard

    Get PDF
    In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ? AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems

    Finite-State Complexity and the Size of Transducers

    Full text link
    Finite-state complexity is a variant of algorithmic information theory obtained by replacing Turing machines with finite transducers. We consider the state-size of transducers needed for minimal descriptions of arbitrary strings and, as our main result, we show that the state-size hierarchy with respect to a standard encoding is infinite. We consider also hierarchies yielded by more general computable encodings.Comment: In Proceedings DCFS 2010, arXiv:1008.127

    A Characterization for Decidable Separability by Piecewise Testable Languages

    Full text link
    The separability problem for word languages of a class C\mathcal{C} by languages of a class S\mathcal{S} asks, for two given languages II and EE from C\mathcal{C}, whether there exists a language SS from S\mathcal{S} that includes II and excludes EE, that is, I⊆SI \subseteq S and S∩E=∅S\cap E = \emptyset. In this work, we assume some mild closure properties for C\mathcal{C} and study for which such classes separability by a piecewise testable language (PTL) is decidable. We characterize these classes in terms of decidability of (two variants of) an unboundedness problem. From this, we deduce that separability by PTL is decidable for a number of language classes, such as the context-free languages and languages of labeled vector addition systems. Furthermore, it follows that separability by PTL is decidable if and only if one can compute for any language of the class its downward closure wrt. the scattered substring ordering (i.e., if the set of scattered substrings of any language of the class is effectively regular). The obtained decidability results contrast some undecidability results. In fact, for all (non-regular) language classes that we present as examples with decidable separability, it is undecidable whether a given language is a PTL itself. Our characterization involves a result of independent interest, which states that for any kind of languages II and EE, non-separability by PTL is equivalent to the existence of common patterns in II and EE

    Normal, Abby Normal, Prefix Normal

    Full text link
    A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number pnw(n)pnw(n) of prefix normal words of length nn, showing that pnw(n)=Ω(2n−cnln⁥n)pnw(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right) for some cc and pnw(n)=O(2n(ln⁥n)2n)pnw(n) = O \left(\frac{2^n (\ln n)^2}{n}\right). We introduce efficient algorithms for testing the prefix normal property and a "mechanical algorithm" for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes to stay normal but Bob wants to drive her "abnormal" -- we discuss which parameter settings allow Alice to succeed.Comment: Accepted at FUN '1

    A Parameterized Study of Maximum Generalized Pattern Matching Problems

    Full text link
    The generalized function matching (GFM) problem has been intensively studied starting with [Ehrenfeucht and Rozenberg, 1979]. Given a pattern p and a text t, the goal is to find a mapping from the letters of p to non-empty substrings of t, such that applying the mapping to p results in t. Very recently, the problem has been investigated within the framework of parameterized complexity [Fernau, Schmid, and Villanger, 2013]. In this paper we study the parameterized complexity of the optimization variant of GFM (called Max-GFM), which has been introduced in [Amir and Nor, 2007]. Here, one is allowed to replace some of the pattern letters with some special symbols "?", termed wildcards or don't cares, which can be mapped to an arbitrary substring of the text. The goal is to minimize the number of wildcards used. We give a complete classification of the parameterized complexity of Max-GFM and its variants under a wide range of parameterizations, such as, the number of occurrences of a letter in the text, the size of the text alphabet, the number of occurrences of a letter in the pattern, the size of the pattern alphabet, the maximum length of a string matched to any pattern letter, the number of wildcards and the maximum size of a string that a wildcard can be mapped to.Comment: to appear in Proc. IPEC'1
    • 

    corecore