Search CORE

2,837 research outputs found

Inducing Constraint Grammars

Author: Samuelsson Christer
Tapanainen Pasi
Voutilainen Atro
Publication venue
Publication date: 01/01/1996
Field of study

Constraint Grammar rules are induced from corpora. A simple scheme based on local information, i.e., on lexical biases and next-neighbour contexts, extended through the use of barriers, reached 87.3 percent precision (1.12 tags/word) at 98.2 percent recall. The results compare favourably with other methods that are used for similar tasks although they are by no means as good as the results achieved using the original hand-written rules developed over several years time.Comment: 10 pages, uuencoded, gzipped PostScrip

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Inducing Constraint-based Grammars using a Domain Ontology

Author: Muresan Smaranda
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

This thesis presents a framework for domain specific text- to-knowledge acquisition, with focus on medical domain. The main challenge of this domain is the abundance of linguistic phenomena that require both syntactic and semantic information in order to “understand” the meaning of the text, and thus to acquire knowledge. Examples include prepositional phrases, coordinations, noun-noun compounds and nominalizations, phenomena which are not well covered by existing syntactic or semantic parsers

Columbia University Academic Commons

Probabilistic Constraint Logic Programming

Author: Riezler Stefan
Publication venue
Publication date: 11/11/1997
Field of study

This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

arXiv.org e-Print Archive

CiteSeerX

Stochastic Attribute-Value Grammars

Author: Abney Steven
Publication venue
Publication date: 23/10/1996
Field of study

Probabilistic analogues of regular and context-free grammars are well-known in computational linguistics, and currently the subject of intensive research. To date, however, no satisfactory probabilistic analogue of attribute-value grammars has been proposed: previous attempts have failed to define a correct parameter-estimation algorithm. In the present paper, I define stochastic attribute-value grammars and give a correct algorithm for estimating their parameters. The estimation algorithm is adapted from Della Pietra, Della Pietra, and Lafferty (1995). To estimate model parameters, it is necessary to compute the expectations of certain functions under random fields. In the application discussed by Della Pietra, Della Pietra, and Lafferty (representing English orthographic constraints), Gibbs sampling can be used to estimate the needed expectations. The fact that attribute-value grammars generate constrained languages makes Gibbs sampling inapplicable, but I show how a variant of Gibbs sampling, the Metropolis-Hastings algorithm, can be used instead.Comment: 23 pages, 21 Postscript figures, uses rotate.st

arXiv.org e-Print Archive

CiteSeerX

Treebank-based acquisition of a Chinese lexical-functional grammar

Author: Bodomo Adams
Burke Michael
Cahill Aoife
Chan Rowena
Lam Olivia
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2004
Field of study

Scaling wide-coverage, constraint-based grammars such as Lexical-Functional Grammars (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001) or Head-Driven Phrase Structure Grammars (HPSG) (Pollard and Sag, 1994) from fragments to naturally occurring unrestricted text is knowledge-intensive, time-consuming and (often prohibitively) expensive. A number of researchers have recently presented methods to automatically acquire wide-coverage, probabilistic constraint-based grammatical resources from treebanks (Cahill et al., 2002, Cahill et al., 2003; Cahill et al., 2004; Miyao et al., 2003; Miyao et al., 2004; Hockenmaier and Steedman, 2002; Hockenmaier, 2003), addressing the knowledge acquisition bottleneck in constraint-based grammar development. Research to date has concentrated on English and German. In this paper we report on an experiment to induce wide-coverage, probabilistic LFG grammatical and lexical resources for Chinese from the Penn Chinese Treebank (CTB) (Xue et al., 2002) based on an automatic f-structure annotation algorithm. Currently 96.751% of the CTB trees receive a single, covering and connected f-structure, 0.112% do not receive an f-structure due to feature clashes, while 3.137% are associated with multiple f-structure fragments. From the f-structure-annotated CTB we extract a total of 12975 lexical entries with 20 distinct subcategorisation frame types. Of these 3436 are verbal entries with a total of 11 different frame types. We extract a number of PCFG-based LFG approximations. Currently our best automatically induced grammars achieve an f-score of 81.57% against the trees in unseen articles 301-325; 86.06% f-score (all grammatical functions) and 73.98% (preds-only) against the dependencies derived from the f-structures automatically generated for the original trees in 301-325 and 82.79% (all grammatical functions) and 67.74% (preds-only) against the dependencies derived from the manually annotated gold-standard f-structures for 50 trees randomly selected from articles 301-325

Irish Universities

DCU Online Research Access Service

Macro Grammars and Holistic Triggering for Efficient Semantic Parsing

Author: Liang Percy
Pasupat Panupong
Zhang Yuchen
Publication venue
Publication date: 01/01/2017
Field of study

To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations. We propose a new online learning algorithm that searches faster as training progresses. The two key ideas are using macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity. On the WikiTableQuestions dataset, we first expand the search space of an existing model to improve the state-of-the-art accuracy from 38.7% to 42.7%, and then use macro grammars and holistic triggering to achieve an 11x speedup and an accuracy of 43.7%.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Multilingual learning with parameter co-occurrence clustering

Author: Bane Max
Kirby James
Riggle Jason
Sylak John
Publication venue
Publication date: 02/12/2011
Field of study

Edinburgh Research Explorer

UsingWomb Grammars for Inducing the Grammar of a Subset of Yorùbá Noun Phrases

Author: Adebara Ife
Publication venue: Publicacions URV
Publication date: 01/06/2020
Field of study

We address the problem of inducing the grammar of an under-resourced language,Yorùbá, from the grammar of English using an efficient and, linguistically savvy, constraintsolving model of grammar induction –Womb Grammars (WG). Our proposed methodologyadapts WG for parsing a subset of noun phrases of the target language Yorùbá, from thegrammar of the source language English, which is described as properties between pairs ofconstituents. Our model is implemented in CHRG (Constraint Handling Rule Grammar) and,it has been used for inducing the grammar of a useful subset of Yorùbá Noun Phrases. Interestingextensions to the original Womb Grammar model are presented, motivated by the specificneeds of Yorùbá and, similar tone languages

Revistes Publicacions URV (Universitat Rovira i Virgili)