4,541 research outputs found
Inducing Probabilistic Grammars by Bayesian Model Merging
We describe a framework for inducing probabilistic grammars from corpora of
positive samples. First, samples are {\em incorporated} by adding ad-hoc rules
to a working grammar; subsequently, elements of the model (such as states or
nonterminals) are {\em merged} to achieve generalization and a more compact
representation. The choice of what to merge and when to stop is governed by the
Bayesian posterior probability of the grammar given the data, which formalizes
a trade-off between a close fit to the data and a default preference for
simpler models (`Occam's Razor'). The general scheme is illustrated using three
types of probabilistic grammars: Hidden Markov models, class-based -grams,
and stochastic context-free grammars.Comment: To appear in Grammatical Inference and Applications, Second
International Colloquium on Grammatical Inference; Springer Verlag, 1994. 13
page
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
Implicit learning of recursive context-free grammars
Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning
experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have
not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing
features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured
the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both
distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes
even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between
individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for
tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex
context-free structures, which model some features of natural languages. They support the relevance of artificial grammar
learning for probing mechanisms of language learning and challenge existing theories and computational models of
implicit learning
Grammar Variational Autoencoder
Deep generative models have been wildly successful at learning coherent
latent representations for continuous data such as video and audio. However,
generative modeling of discrete data such as arithmetic expressions and
molecular structures still poses significant challenges. Crucially,
state-of-the-art methods often produce outputs that are not valid. We make the
key observation that frequently, discrete data can be represented as a parse
tree from a context-free grammar. We propose a variational autoencoder which
encodes and decodes directly to and from these parse trees, ensuring the
generated outputs are always valid. Surprisingly, we show that not only does
our model more often generate valid outputs, it also learns a more coherent
latent space in which nearby points decode to similar discrete outputs. We
demonstrate the effectiveness of our learned models by showing their improved
performance in Bayesian optimization for symbolic regression and molecular
synthesis
The Semantics of Graph Programs
GP (for Graph Programs) is a rule-based, nondeterministic programming
language for solving graph problems at a high level of abstraction, freeing
programmers from handling low-level data structures. The core of GP consists of
four constructs: single-step application of a set of conditional
graph-transformation rules, sequential composition, branching and iteration. We
present a formal semantics for GP in the style of structural operational
semantics. A special feature of our semantics is the use of finitely failing
programs to define GP's powerful branching and iteration commands
- …