29,314 research outputs found
Precise n-gram Probabilities from Stochastic Context-free Grammars
We present an algorithm for computing n-gram probabilities from stochastic
context-free grammars, a procedure that can alleviate some of the standard
problems associated with n-grams (estimation from sparse data, lack of
linguistic structure, among others). The method operates via the computation of
substring expectations, which in turn is accomplished by solving systems of
linear equations derived from the grammar. We discuss efficient implementation
of the algorithm and report our practical experience with it.Comment: 12 pages, to appear in ACL-9
A specification language for Lexical Functional Grammars
This paper defines a language L for specifying LFG grammars. This enables
constraints on LFG's composite ontology (c-structures synchronised with
f-structures) to be stated directly; no appeal to the LFG construction
algorithm is needed. We use L to specify schemata annotated rules and the LFG
uniqueness, completeness and coherence principles. Broader issues raised by
this work are noted and discussed.Comment: 6 pages, LaTeX uses eaclap.sty; Procs of Euro ACL-9
Controlled non uniform random generation of decomposable structures
Consider a class of decomposable combinatorial structures, using different
types of atoms \Atoms = \{\At_1,\ldots ,\At_{|{\Atoms}|}\}. We address the
random generation of such structures with respect to a size and a targeted
distribution in of its \emph{distinguished} atoms. We consider two
variations on this problem. In the first alternative, the targeted distribution
is given by real numbers \TargFreq_1, \ldots, \TargFreq_k such that 0 <
\TargFreq_i < 1 for all and \TargFreq_1+\cdots+\TargFreq_k \leq 1. We
aim to generate random structures among the whole set of structures of a given
size , in such a way that the {\em expected} frequency of any distinguished
atom \At_i equals \TargFreq_i. We address this problem by weighting the
atoms with a -tuple \Weights of real-valued weights, inducing a weighted
distribution over the set of structures of size . We first adapt the
classical recursive random generation scheme into an algorithm taking
\bigO{n^{1+o(1)}+mn\log{n}} arithmetic operations to draw structures from
the \Weights-weighted distribution. Secondly, we address the analytical
computation of weights such that the targeted frequencies are achieved
asymptotically, i. e. for large values of . We derive systems of functional
equations whose resolution gives an explicit relationship between \Weights
and \TargFreq_1, \ldots, \TargFreq_k. Lastly, we give an algorithm in
\bigO{k n^4} for the inverse problem, {\it i.e.} computing the frequencies
associated with a given -tuple \Weights of weights, and an optimized
version in \bigO{k n^2} in the case of context-free languages. This allows
for a heuristic resolution of the weights/frequencies relationship suitable for
complex specifications. In the second alternative, the targeted distribution is
given by a natural numbers such that
where is the number of undistinguished atoms.
The structures must be generated uniformly among the set of structures of size
that contain {\em exactly} atoms \At_i (). We give
a \bigO{r^2\prod_{i=1}^k n_i^2 +m n k \log n} algorithm for generating
structures, which simplifies into a \bigO{r\prod_{i=1}^k n_i +m n} for
regular specifications
Graph Grammars, Insertion Lie Algebras, and Quantum Field Theory
Graph grammars extend the theory of formal languages in order to model
distributed parallelism in theoretical computer science. We show here that to
certain classes of context-free and context-sensitive graph grammars one can
associate a Lie algebra, whose structure is reminiscent of the insertion Lie
algebras of quantum field theory. We also show that the Feynman graphs of
quantum field theories are graph languages generated by a theory dependent
graph grammar.Comment: 19 pages, LaTeX, 3 jpeg figure
Computation of distances for regular and context-free probabilistic languages
Several mathematical distances between probabilistic languages have been investigated in the literature, motivated by applications in language modeling, computational biology, syntactic pattern matching and machine learning. In most cases, only pairs of probabilistic regular languages were considered. In this paper we extend the previous results to pairs of languages generated by a probabilistic context-free grammar and a probabilistic finite automaton.PostprintPeer reviewe
Polynomial Time Algorithms for Multi-Type Branching Processes and Stochastic Context-Free Grammars
We show that one can approximate the least fixed point solution for a
multivariate system of monotone probabilistic polynomial equations in time
polynomial in both the encoding size of the system of equations and in
log(1/\epsilon), where \epsilon > 0 is the desired additive error bound of the
solution. (The model of computation is the standard Turing machine model.)
We use this result to resolve several open problems regarding the
computational complexity of computing key quantities associated with some
classic and heavily studied stochastic processes, including multi-type
branching processes and stochastic context-free grammars
Equational reasoning with context-free families of string diagrams
String diagrams provide an intuitive language for expressing networks of
interacting processes graphically. A discrete representation of string
diagrams, called string graphs, allows for mechanised equational reasoning by
double-pushout rewriting. However, one often wishes to express not just single
equations, but entire families of equations between diagrams of arbitrary size.
To do this we define a class of context-free grammars, called B-ESG grammars,
that are suitable for defining entire families of string graphs, and crucially,
of string graph rewrite rules. We show that the language-membership and
match-enumeration problems are decidable for these grammars, and hence that
there is an algorithm for rewriting string graphs according to B-ESG rewrite
patterns. We also show that it is possible to reason at the level of grammars
by providing a simple method for transforming a grammar by string graph
rewriting, and showing admissibility of the induced B-ESG rewrite pattern.Comment: International Conference on Graph Transformation, ICGT 2015. The
final publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-21145-9_
Automatic acquisition of Spanish LFG resources from the Cast3LB treebank
In this paper, we describe the automatic annotation of the Cast3LB Treebank with LFG f-structures for the subsequent extraction of Spanish probabilistic grammar and lexical resources. We adapt the approach and methodology of Cahill et al. (2004), O’Donovan et al. (2004) and elsewhere for English to Spanish and the Cast3LB treebank encoding. We report on the quality and coverage of the automatic f-structure annotation. Following the pipeline and integrated models of Cahill et al. (2004), we extract wide-coverage
probabilistic LFG approximations and parse unseen Spanish text into f-structures. We also extend Bikel’s (2002) Multilingual Parse Engine to include a Spanish language module. Using the retrained Bikel parser in the pipeline model gives the best results against a manually constructed gold standard (73.20% predsonly f-score). We also extract Spanish lexical resources: 4090 semantic form types with 98 frame types. Subcategorised prepositions and particles are included in the frames
- …