317 research outputs found
The Height of Piecewise-Testable Languages with Applications in Logical Complexity
The height of a piecewise-testable language L is the maximum length of the words needed to define L by excluding and requiring given subwords. The height of L is an important descriptive complexity measure that has not yet been investigated in a systematic way. This paper develops a series of new techniques for bounding the height of finite languages and of languages obtained by taking closures by subwords, superwords and related operations.
As an application of these results, we show that FO^2(A^*, subword), the two-variable fragment of the first-order logic of sequences with the subword ordering, can only express piecewise-testable properties and has elementary complexity
A Characterization for Decidable Separability by Piecewise Testable Languages
The separability problem for word languages of a class by
languages of a class asks, for two given languages and
from , whether there exists a language from that
includes and excludes , that is, and . In this work, we assume some mild closure properties for
and study for which such classes separability by a piecewise
testable language (PTL) is decidable. We characterize these classes in terms of
decidability of (two variants of) an unboundedness problem. From this, we
deduce that separability by PTL is decidable for a number of language classes,
such as the context-free languages and languages of labeled vector addition
systems. Furthermore, it follows that separability by PTL is decidable if and
only if one can compute for any language of the class its downward closure wrt.
the scattered substring ordering (i.e., if the set of scattered substrings of
any language of the class is effectively regular).
The obtained decidability results contrast some undecidability results. In
fact, for all (non-regular) language classes that we present as examples with
decidable separability, it is undecidable whether a given language is a PTL
itself.
Our characterization involves a result of independent interest, which states
that for any kind of languages and , non-separability by PTL is
equivalent to the existence of common patterns in and
On shuffle products, acyclic automata and piecewise-testable languages
We show that the shuffle L \unicode{x29E2} F of a piecewise-testable
language and a finite language is piecewise-testable. The proof relies
on a classic but little-used automata-theoretic characterization of
piecewise-testable languages. We also discuss some mild generalizations of the
main result, and provide bounds on the piecewise complexity of L
\unicode{x29E2} F
The Edit Distance to k-Subsequence Universality
A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ?_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ?_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ?_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ?_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ?_k-equivalent to u.
In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = ? is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over ?. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words
Longest Common Subsequence with Gap Constraints
We consider the longest common subsequence problem in the context of
subsequences with gap constraints. In particular, following Day et al. 2022, we
consider the setting when the distance (i. e., the gap) between two consecutive
symbols of the subsequence has to be between a lower and an upper bound (which
may depend on the position of those symbols in the subsequence or on the
symbols bordering the gap) as well as the case where the entire subsequence is
found in a bounded range (defined by a single upper bound), considered by
Kosche et al. 2022. In all these cases, we present effcient algorithms for
determining the length of the longest common constrained subsequence between
two given strings
--Factorization and the Binary Case of Simon's Congruence
In 1991 H\'ebrard introduced a factorization of words that turned out to be a
powerful tool for the investigation of a word's scattered factors (also known
as (scattered) subwords or subsequences). Based on this, first Karandikar and
Schnoebelen introduced the notion of -richness and later on Barker et al.
the notion of -universality. In 2022 Fleischmann et al. presented a
generalization of the arch factorization by intersecting the arch factorization
of a word and its reverse. While the authors merely used this factorization for
the investigation of shortest absent scattered factors, in this work we
investigate this new --factorization as such. We characterize
the famous Simon congruence of -universal words in terms of -universal
words. Moreover, we apply these results to binary words. In this special case,
we obtain a full characterization of the classes and calculate the index of the
congruence. Lastly, we start investigating the ternary case, present a full
list of possibilities for -factors, and characterize their
congruence
- …