6,598 research outputs found
Digraph Complexity Measures and Applications in Formal Language Theory
We investigate structural complexity measures on digraphs, in particular the
cycle rank. This concept is intimately related to a classical topic in formal
language theory, namely the star height of regular languages. We explore this
connection, and obtain several new algorithmic insights regarding both cycle
rank and star height. Among other results, we show that computing the cycle
rank is NP-complete, even for sparse digraphs of maximum outdegree 2.
Notwithstanding, we provide both a polynomial-time approximation algorithm and
an exponential-time exact algorithm for this problem. The former algorithm
yields an O((log n)^(3/2))- approximation in polynomial time, whereas the
latter yields the optimum solution, and runs in time and space O*(1.9129^n) on
digraphs of maximum outdegree at most two. Regarding the star height problem,
we identify a subclass of the regular languages for which we can precisely
determine the computational complexity of the star height problem. Namely, the
star height problem for bideterministic languages is NP-complete, and this
holds already for binary alphabets. Then we translate the algorithmic results
concerning cycle rank to the bideterministic star height problem, thus giving a
polynomial-time approximation as well as a reasonably fast exact exponential
algorithm for bideterministic star height.Comment: 19 pages, 1 figur
Verifying proofs in constant depth
In this paper we initiate the study of proof systems where verification of proofs proceeds by NC circuits. We investigate the question which languages admit proof systems in this very restricted model. Formulated alternatively, we ask which languages can be enumerated by NC functions. Our results show that the answer to this problem is not determined by the complexity of the language. On the one hand, we construct NC proof systems for a variety of languages ranging from regular to NP-complete. On the other hand, we show by combinatorial methods that even easy regular languages such as Exact-OR do not admit NC proof systems. We also present a general construction of proof systems for regular languages with strongly connected NFA's
A Trichotomy for Regular Trail Queries
Regular path queries (RPQs) are an essential component of graph query languages. Such queries consider a regular expression r and a directed edge-labeled graph G and search for paths in G for which the sequence of labels is in the language of r. In order to avoid having to consider infinitely many paths, some database engines restrict such paths to be trails, that is, they only consider paths without repeated edges. In this paper we consider the evaluation problem for RPQs under trail semantics, in the case where the expression is fixed. We show that, in this setting, there exists a trichotomy. More precisely, the complexity of RPQ evaluation divides the regular languages into the finite languages, the class T_tract (for which the problem is tractable), and the rest. Interestingly, the tractable class in the trichotomy is larger than for the trichotomy for simple paths, discovered by Bagan et al. [Bagan et al., 2013]. In addition to this trichotomy result, we also study characterizations of the tractable class, its expressivity, the recognition problem, closure properties, and show how the decision problem can be extended to the enumeration problem, which is relevant to practice
Colored operads, series on colored operads, and combinatorial generating systems
We introduce bud generating systems, which are used for combinatorial
generation. They specify sets of various kinds of combinatorial objects, called
languages. They can emulate context-free grammars, regular tree grammars, and
synchronous grammars, allowing us to work with all these generating systems in
a unified way. The theory of bud generating systems uses colored operads.
Indeed, an object is generated by a bud generating system if it satisfies a
certain equation in a colored operad. To compute the generating series of the
languages of bud generating systems, we introduce formal power series on
colored operads and several operations on these. Series on colored operads are
crucial to express the languages specified by bud generating systems and allow
us to enumerate combinatorial objects with respect to some statistics. Some
examples of bud generating systems are constructed; in particular to specify
some sorts of balanced trees and to obtain recursive formulas enumerating
these.Comment: 48 page
Joining Extractions of Regular Expressions
Regular expressions with capture variables, also known as "regex formulas,"
extract relations of spans (interval positions) from text. These relations can
be further manipulated via Relational Algebra as studied in the context of
document spanners, Fagin et al.'s formal framework for information extraction.
We investigate the complexity of querying text by Conjunctive Queries (CQs) and
Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds
(NP-completeness and W[1]-hardness) from the relational world also hold in our
setting; in particular, hardness hits already single-character text! Yet, the
upper bounds from the relational world do not carry over. Unlike the relational
world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source
of hardness is that it may be intractable to instantiate the relation defined
by a regex formula, simply because it has an exponential number of tuples. Yet,
we are able to establish general upper bounds. In particular, UCQs can be
evaluated with polynomial delay, provided that every CQ has a bounded number of
atoms (while unions and projection can be arbitrary). Furthermore, UCQ
evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the
parameter is the size of the UCQ
Small NFAs from Regular Expressions: Some Experimental Results
Regular expressions (res), because of their succinctness and clear syntax,
are the common choice to represent regular languages. However, efficient
pattern matching or word recognition depend on the size of the equivalent
nondeterministic finite automata (NFA). We present the implementation of
several algorithms for constructing small epsilon-free NFAss from res within
the FAdo system, and a comparison of regular expression measures and NFA sizes
based on experimental results obtained from uniform random generated res. For
this analysis, nonredundant res and reduced res in star normal form were
considered.Comment: Proceedings of 6th Conference on Computability in Europe (CIE 2010),
pages 194-203, Ponta Delgada, Azores, Portugal, June/July 201
MatchPy: A Pattern Matching Library
Pattern matching is a powerful tool for symbolic computations, based on the
well-defined theory of term rewriting systems. Application domains include
algebraic expressions, abstract syntax trees, and XML and JSON data.
Unfortunately, no lightweight implementation of pattern matching as general and
flexible as Mathematica exists for Python Mathics,MacroPy,patterns,PyPatt.
Therefore, we created the open source module MatchPy which offers similar
pattern matching functionality in Python using a novel algorithm which finds
matches for large pattern sets more efficiently by exploiting similarities
between patterns.Comment: arXiv admin note: substantial text overlap with arXiv:1710.0007
- …