Search CORE

14,661 research outputs found

Learning cover context-free grammars from structural data

Author: A. Farzan
D. Angluin
D. Angluin
F. Ipate
L.S. Levy
M. Holzer
O. Maler
V. Kumar
W.S. Brainerd
Y. Sakakibara
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of learning an unknown context-free grammar when the only knowledge available and of interest to the learner is about its structural descriptions with depth at most

\ell.

The goal is to learn a cover context-free grammar (CCFG) with respect to

\ell

, that is, a CFG whose structural descriptions with depth at most

\ell

agree with those of the unknown CFG. We propose an algorithm, called

LA^\ell

, that efficiently learns a CCFG using two types of queries: structural equivalence and structural membership. We show that

LA^\ell

runs in time polynomial in the number of states of a minimal deterministic finite cover tree automaton (DCTA) with respect to

\ell

. This number is often much smaller than the number of states of a minimum deterministic finite tree automaton for the structural descriptions of the unknown grammar

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2012
Field of study

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate

CiteSeerX

Edinburgh Research Explorer

Synthesizing Program Input Grammars

Author: Albarghouthi A.
Cadar C.
Cho C. Y.
Forrester J. E.
Godefroid P.
Holler C.
Huang L.
Lee L.
Oncina J.
Solomonoff R. J.
Sutton M.
Sutton M.
Vardhan A.
Viide J.
Wondracek G.
Publication venue
Publication date: 16/06/2017
Field of study

We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers

arXiv.org e-Print Archive

Crossref

Empirical Risk Minimization with Approximations of Probabilistic Grammars

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2010
Field of study

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting.

CiteSeerX

Edinburgh Research Explorer