Search CORE

4,622 research outputs found

Consistency of Probabilistic Context-Free Grammars

Author: Stüber Torsten
Publication venue: Technische Universität Dresden
Publication date: 10/05/2012
Field of study

We present an algorithm for deciding whether an arbitrary proper probabilistic context-free grammar is consistent, i.e., whether the probability that a derivation terminates is one. Our procedure has time complexity

\\\\mathcal O(n^3)

in the unit-cost model of computation. Moreover, we develop a novel characterization of consistent probabilistic context-free grammars. A simple corollary of our result is that training methods for probabilistic context-free grammars that are based on maximum-likelihood estimation always yield consistent grammars

Technische Universität Dresden: Qucosa

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

Author: Stolcke Andreas
Publication venue
Publication date: 01/01/1993
Field of study

We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational Linguistics 2

arXiv.org e-Print Archive

CiteSeerX

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2012
Field of study

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate

CiteSeerX

Crossref

Edinburgh Research Explorer

Precise n-gram Probabilities from Stochastic Context-free Grammars

Author: Segal Jonathan
Stolcke Andreas
Publication venue
Publication date: 01/01/1994
Field of study

We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations derived from the grammar. We discuss efficient implementation of the algorithm and report our practical experience with it.Comment: 12 pages, to appear in ACL-9

arXiv.org e-Print Archive

CiteSeerX

Crossref

Probabilistic Parsing Strategies

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2002
Field of study

We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counter-parts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results we also derive negative results on so-called generalized LR parsing.Comment: 36 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

Probabilistic Constraint Logic Programming

Author: Riezler Stefan
Publication venue
Publication date: 11/11/1997
Field of study

This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

arXiv.org e-Print Archive

CiteSeerX

Empirical Risk Minimization with Approximations of Probabilistic Grammars

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2010
Field of study

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting.

CiteSeerX

Edinburgh Research Explorer

Computation of distances for regular and context-free probabilistic languages

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2008
Field of study

Several mathematical distances between probabilistic languages have been investigated in the literature, motivated by applications in language modeling, computational biology, syntactic pattern matching and machine learning. In most cases, only pairs of probabilistic regular languages were considered. In this paper we extend the previous results to pairs of languages generated by a probabilistic context-free grammar and a probabilistic finite automaton.PostprintPeer reviewe

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Padova

University of St. Andrews - Pure

St Andrews Research Repository