Search CORE

1,585 research outputs found

Inducing Compact but Accurate Tree-Substitution Grammars

Author: Blunsom Phil
Cohn Trevor
Goldwater Sharon
Publication venue
Publication date: 01/01/2009
Field of study

Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.

CiteSeerX

Crossref

Edinburgh Research Explorer

Oxford University Research Archive

Inducing Tree-Substitution Grammars

Author: Blunsom Phil
Cohn Trevor
Goldwater Sharon
Publication venue
Publication date: 01/01/2010
Field of study

Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model's complexity we employ a Bayesian non-parametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model's efficacy on supervised phrase-structure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results. © 2010 Evangelos Theodorou, Jonas Buchli and Stefan Schaal

Edinburgh Research Explorer

Oxford University Research Archive

On the Herbrand content of LK

Author: Afshari Bahareh
Hetzl Stefan
Leigh Graham E.
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2016
Field of study

We present a structural representation of the Herbrand content of LK-proofs with cuts of complexity prenex Sigma-2/Pi-2. The representation takes the form of a typed non-deterministic tree grammar of order 2 which generates a finite language of first-order terms that appear in the Herbrand expansions obtained through cut-elimination. In particular, for every Gentzen-style reduction between LK-proofs we study the induced grammars and classify the cases in which language equality and inclusion hold.Comment: In Proceedings CL&C 2016, arXiv:1606.0582

arXiv.org e-Print Archive

Directory of Open Access Journals

Partially-commutative context-free languages

Author: Bas Luttik
Bergstra
Bergstra
Berstel
Bouajjani
Christensen
Czerwiński
Czerwiński
Esparza
Gischer
Hirshfeld
Mayr
Mazurkiewicz
Michel A. Reniers
Nederhof
Srba
Sławomir Lasota
Wojciech Czerwiński
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2012
Field of study

The paper is about a class of languages that extends context-free languages (CFL) and is stable under shuffle. Specifically, we investigate the class of partially-commutative context-free languages (PCCFL), where non-terminal symbols are commutative according to a binary independence relation, very much like in trace theory. The class has been recently proposed as a robust class subsuming CFL and commutative CFL. This paper surveys properties of PCCFL. We identify a natural corresponding automaton model: stateless multi-pushdown automata. We show stability of the class under natural operations, including homomorphic images and shuffle. Finally, we relate expressiveness of PCCFL to two other relevant classes: CFL extended with shuffle and trace-closures of CFL. Among technical contributions of the paper are pumping lemmas, as an elegant completion of known pumping properties of regular languages, CFL and commutative CFL.Comment: In Proceedings EXPRESS/SOS 2012, arXiv:1208.244

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Probabilistic Constraint Logic Programming

Author: Riezler Stefan
Publication venue
Publication date: 11/11/1997
Field of study

This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

arXiv.org e-Print Archive

CiteSeerX

Grammar induction for mildly context sensitive languages using variational Bayesian inference

Author: Bergen Leon
Bruno Chris
Harasim Daniel
O'Donnell Timothy J.
Portelance Eva
Publication venue
Publication date: 06/08/2014
Field of study

The following technical report presents a formal approach to probabilistic minimalist grammar induction. We describe a formalization of a minimalist grammar. Based on this grammar, we define a generative model for minimalist derivations. We then present a generalized algorithm for the application of variational Bayesian inference to lexicalized mildly context sensitive language grammars which in this paper is applied to the previously defined minimalist grammar

arXiv.org e-Print Archive

Dryad Digital Repository (Duke University)