3,678 research outputs found
Calibrating Generative Models: The Probabilistic Chomsky-Schützenberger Hierarchy
A probabilistic Chomsky–Schützenberger hierarchy of grammars is introduced and studied, with the aim of understanding the expressive power of generative models. We offer characterizations of the distributions definable at each level of the hierarchy, including probabilistic regular, context-free, (linear) indexed, context-sensitive, and unrestricted grammars, each corresponding to familiar probabilistic machine classes. Special attention is given to distributions on (unary notations for) positive integers. Unlike in the classical case where the "semi-linear" languages all collapse into the regular languages, using analytic tools adapted from the classical setting we show there is no collapse in the probabilistic hierarchy: more distributions become definable at each level. We also address related issues such as closure under probabilistic conditioning
Probabilistic Constraint Logic Programming
This paper addresses two central problems for probabilistic processing
models: parameter estimation from incomplete data and efficient retrieval of
most probable analyses. These questions have been answered satisfactorily only
for probabilistic regular and context-free models. We address these problems
for a more expressive probabilistic constraint logic programming model. We
present a log-linear probability model for probabilistic constraint logic
programming. On top of this model we define an algorithm to estimate the
parameters and to select the properties of log-linear models from incomplete
data. This algorithm is an extension of the improved iterative scaling
algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm
applies to log-linear models in general and is accompanied with suitable
approximation methods when applied to large data spaces. Furthermore, we
present an approach for searching for most probable analyses of the
probabilistic constraint logic programming model. This method can be applied to
the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl
Empirical Risk Minimization with Approximations of Probabilistic Grammars
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting.
Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming
Genetic Programming (GP) is an heuristic method that can be applied to many
Machine Learning, Optimization and Engineering problems. In particular, it has
been widely used in Software Engineering for Test-case generation, Program
Synthesis and Improvement of Software (GI).
Grammar-Guided Genetic Programming (GGGP) approaches allow the user to refine
the domain of valid program solutions. Backus Normal Form is the most popular
interface for describing Context-Free Grammars (CFG) for GGGP. BNF and its
derivatives have the disadvantage of interleaving the grammar language and the
target language of the program.
We propose to embed the grammar as an internal Domain-Specific Language in
the host language of the framework. This approach has the same expressive power
as BNF and EBNF while using the host language type-system to take advantage of
all the existing tooling: linters, formatters, type-checkers, autocomplete, and
legacy code support. These tools have a practical utility in designing software
in general, and GP systems in particular.
We also present Meta-Handlers, user-defined overrides of the tree-generation
system. This technique extends our object-oriented encoding with more
practicability and expressive power than existing CFG approaches, achieving the
same expressive power of Attribute Grammars, but without the grammar vs target
language duality.
Furthermore, we evidence that this approach is feasible, showing an example
Python implementation as proof. We also compare our approach against textual
BNF-representations w.r.t. expressive power and ergonomics. These advantages do
not come at the cost of performance, as shown by our empirical evaluation on 5
benchmarks of our example implementation against PonyGE2. We conclude that our
approach has better ergonomics with the same expressive power and performance
of textual BNF-based grammar encodings
- …