13 research outputs found
Parikh's Theorem: A simple and direct automaton construction
Parikh's theorem states that the Parikh image of a context-free language is
semilinear or, equivalently, that every context-free language has the same
Parikh image as some regular language. We present a very simple construction
that, given a context-free grammar, produces a finite automaton recognizing
such a regular language.Comment: 12 pages, 3 figure
Latent Tree Language Model
In this paper we introduce Latent Tree Language Model (LTLM), a novel
approach to language modeling that encodes syntax and semantics of a given
sentence as a tree of word roles.
The learning phase iteratively updates the trees by moving nodes according to
Gibbs sampling. We introduce two algorithms to infer a tree for a given
sentence. The first one is based on Gibbs sampling. It is fast, but does not
guarantee to find the most probable tree. The second one is based on dynamic
programming. It is slower, but guarantees to find the most probable tree. We
provide comparison of both algorithms.
We combine LTLM with 4-gram Modified Kneser-Ney language model via linear
interpolation. Our experiments with English and Czech corpora show significant
perplexity reductions (up to 46% for English and 49% for Czech) compared with
standalone 4-gram Modified Kneser-Ney language model.Comment: Accepted to EMNLP 201
Finite Automata for the Sub- and Superword Closure of CFLs: Descriptional and Computational Complexity
We answer two open questions by (Gruber, Holzer, Kutrib, 2009) on the
state-complexity of representing sub- or superword closures of context-free
grammars (CFGs): (1) We prove a (tight) upper bound of on
the size of nondeterministic finite automata (NFAs) representing the subword
closure of a CFG of size . (2) We present a family of CFGs for which the
minimal deterministic finite automata representing their subword closure
matches the upper-bound of following from (1).
Furthermore, we prove that the inequivalence problem for NFAs representing sub-
or superword-closed languages is only NP-complete as opposed to PSPACE-complete
for general NFAs. Finally, we extend our results into an approximation method
to attack inequivalence problems for CFGs
Approximating Petri Net Reachability Along Context-free Traces
We investigate the problem asking whether the intersection of a context-free
language (CFL) and a Petri net language (PNL) is empty. Our contribution to
solve this long-standing problem which relates, for instance, to the
reachability analysis of recursive programs over unbounded data domain, is to
identify a class of CFLs called the finite-index CFLs for which the problem is
decidable. The k-index approximation of a CFL can be obtained by discarding all
the words that cannot be derived within a budget k on the number of occurrences
of non-terminals. A finite-index CFL is thus a CFL which coincides with its
k-index approximation for some k. We decide whether the intersection of a
finite-index CFL and a PNL is empty by reducing it to the reachability problem
of Petri nets with weak inhibitor arcs, a class of systems with infinitely many
states for which reachability is known to be decidable. Conversely, we show
that the reachability problem for a Petri net with weak inhibitor arcs reduces
to the emptiness problem of a finite-index CFL intersected with a PNL.Comment: 16 page
Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda
Valiant (1975) has developed an algorithm for recognition of context free
languages. As of today, it remains the algorithm with the best asymptotic
complexity for this purpose. In this paper, we present an algebraic
specification, implementation, and proof of correctness of a generalisation of
Valiant's algorithm. The generalisation can be used for recognition, parsing or
generic calculation of the transitive closure of upper triangular matrices. The
proof is certified by the Agda proof assistant. The certification is
representative of state-of-the-art methods for specification and proofs in
proof assistants based on type-theory. As such, this paper can be read as a
tutorial for the Agda system
Interprocedural Reachability for Flat Integer Programs
We study programs with integer data, procedure calls and arbitrary call
graphs. We show that, whenever the guards and updates are given by octagonal
relations, the reachability problem along control flow paths within some
language w1* ... wd* over program statements is decidable in Nexptime. To
achieve this upper bound, we combine a program transformation into the same
class of programs but without procedures, with an Np-completeness result for
the reachability problem of procedure-less programs. Besides the program, the
expression w1* ... wd* is also mapped onto an expression of a similar form but
this time over the transformed program statements. Several arguments involving
context-free grammars and their generative process enable us to give tight
bounds on the size of the resulting expression. The currently existing gap
between Np-hard and Nexptime can be closed to Np-complete when a certain
parameter of the analysis is assumed to be constant.Comment: 38 pages, 1 figur