8,500 research outputs found
Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
Recommended from our members
The automated inference of tree system
Tree systems are used in syntactic pattern recognition for
describing two-dimensional patterns. We extend results on tree
automata with the introduction of the subtree-invariant equivalence
relation R. R relates two trees when the appearance of one implies the
appearance of the other in similar trees. A new state minimizing
algorithm for tree automata is formed using R. We also determine a
bound for Brainerd's minimization method.
We introduce the Group Unordered Tree Automaton (GUTA) which
accepts all orientations of open-line patterns described using directed
arc primitives. The specification of a GUTA includes an unordered
tree automaton M, which only accepts a standard orientation of a given
class of open-line pictures, and a transformation group, which describes
how the primitives transform under rotational shifts. The GUTA
performs all orientational parses in parallel, reports all successful
transformations and operates in the same time complexity as M. The
GUTA is much easier to specify than the equivalent non-decomposed
unordered tree automaton.
The problem of automating the design of unordered and ordered
tree automata (grammars) is studied both on a system directed and on a
highly interactive level. The system directed method uses Pao's lattice
technique to infer tree automata (grammars) from structurally
complete samples. It is shown that the method can infer any context-free
grammar when provided with skeletal structure descriptions. This
extends the results of Pao which only deal with proper subclasses of
context-free grammars.
The highly interactive inference system is based on the use of
tree derivatives, also introduced in this thesis, for determining
automaton states and possible state merging. Tree derivatives are
sets of tree forms derived by replacing selected subtrees with marked
nodes. The derivative sets are used to determine subtree-invariant
equivalence relations which characterize tree automata. A minimization
algorithm based on tree derivatives is given. We use tree derivatives
to prove that a tree automaton with n states can be fully
characterized by the set of trees that it accepts of depth at most 2n.
The inference method compares tree derivative sets and infers
subtree-invariant equivalence relations. A relation is inferred if
there is sufficient overlap between the derivative sets. Our method
was compared to other tree automata inference schemes, including
Crespi-Reghizzi's algorithm. We have shown that our method is applicable
to the entire class of context-free grammars and requires a
smaller sample than Crespi-Reghizzi's algorithm which can only infer a
proper subclass of operator precedence grammars. Furthermore, it
appears more general than the other inference systems for tree automata
or grammars
An Alternative Conception of Tree-Adjoining Derivation
The precise formulation of derivation for tree-adjoining grammars has
important ramifications for a wide variety of uses of the formalism, from
syntactic analysis to semantic interpretation and statistical language
modeling. We argue that the definition of tree-adjoining derivation must be
reformulated in order to manifest the proper linguistic dependencies in
derivations. The particular proposal is both precisely characterizable through
a definition of TAG derivations as equivalence classes of ordered derivation
trees, and computationally operational, by virtue of a compilation to linear
indexed grammars together with an efficient algorithm for recognition and
parsing according to the compiled grammar.Comment: 33 page
Grammar induction for mildly context sensitive languages using variational Bayesian inference
The following technical report presents a formal approach to probabilistic
minimalist grammar induction. We describe a formalization of a minimalist
grammar. Based on this grammar, we define a generative model for minimalist
derivations. We then present a generalized algorithm for the application of
variational Bayesian inference to lexicalized mildly context sensitive language
grammars which in this paper is applied to the previously defined minimalist
grammar
Weakly Restricted Stochastic Grammars
A new type of stochastic grammars is introduced for investigation: weakly restricted stochastic grammars. In this paper we will concentrate on the consistency problem. To find conditions for stochastic grammars to be consistent, the theory of multitype Galton-Watson branching processes and generating functions is of central importance.\ud
The unrestricted stochastic grammar formalism generates the same class of languages as the weakly restricted formalism. The inside-outside algorithm is adapted for use with weakly restricted grammars
- …