7,322 research outputs found
Improved bounds for testing Dyck languages
In this paper we consider the problem of deciding membership in Dyck
languages, a fundamental family of context-free languages, comprised of
well-balanced strings of parentheses. In this problem we are given a string of
length in the alphabet of parentheses of types and must decide if it is
well-balanced. We consider this problem in the property testing setting, where
one would like to make the decision while querying as few characters of the
input as possible.
Property testing of strings for Dyck language membership for , with a
number of queries independent of the input size , was provided in [Alon,
Krivelevich, Newman and Szegedy, SICOMP 2001]. Property testing of strings for
Dyck language membership for was first investigated in [Parnas, Ron
and Rubinfeld, RSA 2003]. They showed an upper bound and a lower bound for
distinguishing strings belonging to the language from strings that are far (in
terms of the Hamming distance) from the language, which are respectively (up to
polylogarithmic factors) the power and the power of the input size
.
Here we improve the power of in both bounds. For the upper bound, we
introduce a recursion technique, that together with a refinement of the methods
in the original work provides a test for any power of larger than .
For the lower bound, we introduce a new problem called Truestring Equivalence,
which is easily reducible to the -type Dyck language property testing
problem. For this new problem, we show a lower bound of to the power of
Grammars with two-sided contexts
In a recent paper (M. Barash, A. Okhotin, "Defining contexts in context-free
grammars", LATA 2012), the authors introduced an extension of the context-free
grammars equipped with an operator for referring to the left context of the
substring being defined. This paper proposes a more general model, in which
context specifications may be two-sided, that is, both the left and the right
contexts can be specified by the corresponding operators. The paper gives the
definitions and establishes the basic theory of such grammars, leading to a
normal form and a parsing algorithm working in time O(n^4), where n is the
length of the input string.Comment: In Proceedings AFL 2014, arXiv:1405.527
The Tandem Duplication Distance Is NP-Hard
In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ? AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology.
The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems
Finite-State Complexity and the Size of Transducers
Finite-state complexity is a variant of algorithmic information theory
obtained by replacing Turing machines with finite transducers. We consider the
state-size of transducers needed for minimal descriptions of arbitrary strings
and, as our main result, we show that the state-size hierarchy with respect to
a standard encoding is infinite. We consider also hierarchies yielded by more
general computable encodings.Comment: In Proceedings DCFS 2010, arXiv:1008.127
A Characterization for Decidable Separability by Piecewise Testable Languages
The separability problem for word languages of a class by
languages of a class asks, for two given languages and
from , whether there exists a language from that
includes and excludes , that is, and . In this work, we assume some mild closure properties for
and study for which such classes separability by a piecewise
testable language (PTL) is decidable. We characterize these classes in terms of
decidability of (two variants of) an unboundedness problem. From this, we
deduce that separability by PTL is decidable for a number of language classes,
such as the context-free languages and languages of labeled vector addition
systems. Furthermore, it follows that separability by PTL is decidable if and
only if one can compute for any language of the class its downward closure wrt.
the scattered substring ordering (i.e., if the set of scattered substrings of
any language of the class is effectively regular).
The obtained decidability results contrast some undecidability results. In
fact, for all (non-regular) language classes that we present as examples with
decidable separability, it is undecidable whether a given language is a PTL
itself.
Our characterization involves a result of independent interest, which states
that for any kind of languages and , non-separability by PTL is
equivalent to the existence of common patterns in and
Normal, Abby Normal, Prefix Normal
A prefix normal word is a binary word with the property that no substring has
more 1s than the prefix of the same length. This class of words is important in
the context of binary jumbled pattern matching. In this paper we present
results about the number of prefix normal words of length , showing
that for some and
. We introduce efficient
algorithms for testing the prefix normal property and a "mechanical algorithm"
for computing prefix normal forms. We also include games which can be played
with prefix normal words. In these games Alice wishes to stay normal but Bob
wants to drive her "abnormal" -- we discuss which parameter settings allow
Alice to succeed.Comment: Accepted at FUN '1
A Parameterized Study of Maximum Generalized Pattern Matching Problems
The generalized function matching (GFM) problem has been intensively studied
starting with [Ehrenfeucht and Rozenberg, 1979]. Given a pattern p and a text
t, the goal is to find a mapping from the letters of p to non-empty substrings
of t, such that applying the mapping to p results in t. Very recently, the
problem has been investigated within the framework of parameterized complexity
[Fernau, Schmid, and Villanger, 2013].
In this paper we study the parameterized complexity of the optimization
variant of GFM (called Max-GFM), which has been introduced in [Amir and Nor,
2007]. Here, one is allowed to replace some of the pattern letters with some
special symbols "?", termed wildcards or don't cares, which can be mapped to an
arbitrary substring of the text. The goal is to minimize the number of
wildcards used.
We give a complete classification of the parameterized complexity of Max-GFM
and its variants under a wide range of parameterizations, such as, the number
of occurrences of a letter in the text, the size of the text alphabet, the
number of occurrences of a letter in the pattern, the size of the pattern
alphabet, the maximum length of a string matched to any pattern letter, the
number of wildcards and the maximum size of a string that a wildcard can be
mapped to.Comment: to appear in Proc. IPEC'1
- âŠ