8,472 research outputs found
On the equivalence, containment, and covering problems for the regular and context-free languages
We consider the complexity of the equivalence and containment problems for regular expressions and context-free grammars, concentrating on the relationship between complexity and various language properties. Finiteness and boundedness of languages are shown to play important roles in the complexity of these problems. An encoding into grammars of Turing machine computations exponential in the size of the grammar is used to prove several exponential lower bounds. These lower bounds include exponential time for testing equivalence of grammars generating finite sets, and exponential space for testing equivalence of non-self-embedding grammars. Several problems which might be complex because of this encoding are shown to simplify for linear grammars. Other problems considered include grammatical covering and structural equivalence for right-linear, linear, and arbitrary grammars
Learning context-free grammars from structural data in polynomial time
AbstractWe consider the problem of learning a context-free grammar from its structural descriptions. Structural descriptions of a context-free grammar are unlabelled derivation trees of the grammar. We present an efficient algorithm for learning context-free grammars using two types of queries: structural equivalence queries and structural membership queries. The learning protocol is based on what is called “minimally adequate teacher”, and it is shown that a grammar learned by the algorithm is not only a correct grammar, i.e. equivalent to the unknown grammar but also structurally equivalent to it. Furthermore, the algorithm runs in time polynomial in the number of states of the minimum frontier-to-root tree automaton for the set of structural descriptions of the unknown grammar and the maximum size of any counter-example returned by a structural equivalence query
Learning cover context-free grammars from structural data
We consider the problem of learning an unknown context-free grammar when the
only knowledge available and of interest to the learner is about its structural
descriptions with depth at most The goal is to learn a cover
context-free grammar (CCFG) with respect to , that is, a CFG whose
structural descriptions with depth at most agree with those of the
unknown CFG. We propose an algorithm, called , that efficiently learns
a CCFG using two types of queries: structural equivalence and structural
membership. We show that runs in time polynomial in the number of
states of a minimal deterministic finite cover tree automaton (DCTA) with
respect to . This number is often much smaller than the number of states
of a minimum deterministic finite tree automaton for the structural
descriptions of the unknown grammar
Partially-commutative context-free languages
The paper is about a class of languages that extends context-free languages
(CFL) and is stable under shuffle. Specifically, we investigate the class of
partially-commutative context-free languages (PCCFL), where non-terminal
symbols are commutative according to a binary independence relation, very much
like in trace theory. The class has been recently proposed as a robust class
subsuming CFL and commutative CFL. This paper surveys properties of PCCFL. We
identify a natural corresponding automaton model: stateless multi-pushdown
automata. We show stability of the class under natural operations, including
homomorphic images and shuffle. Finally, we relate expressiveness of PCCFL to
two other relevant classes: CFL extended with shuffle and trace-closures of
CFL. Among technical contributions of the paper are pumping lemmas, as an
elegant completion of known pumping properties of regular languages, CFL and
commutative CFL.Comment: In Proceedings EXPRESS/SOS 2012, arXiv:1208.244
Inducing Probabilistic Grammars by Bayesian Model Merging
We describe a framework for inducing probabilistic grammars from corpora of
positive samples. First, samples are {\em incorporated} by adding ad-hoc rules
to a working grammar; subsequently, elements of the model (such as states or
nonterminals) are {\em merged} to achieve generalization and a more compact
representation. The choice of what to merge and when to stop is governed by the
Bayesian posterior probability of the grammar given the data, which formalizes
a trade-off between a close fit to the data and a default preference for
simpler models (`Occam's Razor'). The general scheme is illustrated using three
types of probabilistic grammars: Hidden Markov models, class-based -grams,
and stochastic context-free grammars.Comment: To appear in Grammatical Inference and Applications, Second
International Colloquium on Grammatical Inference; Springer Verlag, 1994. 13
page
Comparing and evaluating extended Lambek calculi
Lambeks Syntactic Calculus, commonly referred to as the Lambek calculus, was
innovative in many ways, notably as a precursor of linear logic. But it also
showed that we could treat our grammatical framework as a logic (as opposed to
a logical theory). However, though it was successful in giving at least a basic
treatment of many linguistic phenomena, it was also clear that a slightly more
expressive logical calculus was needed for many other cases. Therefore, many
extensions and variants of the Lambek calculus have been proposed, since the
eighties and up until the present day. As a result, there is now a large class
of calculi, each with its own empirical successes and theoretical results, but
also each with its own logical primitives. This raises the question: how do we
compare and evaluate these different logical formalisms? To answer this
question, I present two unifying frameworks for these extended Lambek calculi.
Both are proof net calculi with graph contraction criteria. The first calculus
is a very general system: you specify the structure of your sequents and it
gives you the connectives and contractions which correspond to it. The calculus
can be extended with structural rules, which translate directly into graph
rewrite rules. The second calculus is first-order (multiplicative
intuitionistic) linear logic, which turns out to have several other,
independently proposed extensions of the Lambek calculus as fragments. I will
illustrate the use of each calculus in building bridges between analyses
proposed in different frameworks, in highlighting differences and in helping to
identify problems.Comment: Empirical advances in categorial grammars, Aug 2015, Barcelona,
Spain. 201
Structure preserving transformations on non-left-recursive grammars
We will be concerned with grammar covers, The first part of this paper presents a general framework for covers. The second part introduces a transformation from nonleft-recursive grammars to grammars in Greibach normal form. An investigation of the structure preserving properties of this transformation, which serves also as an illustration of our framework for covers, is presented
- …