Search CORE

107 research outputs found

From treebank resources to LFG F-structures

Author: A Cahill
A Frank
A Frank.
C Pollard
E Charniak.
G Leech
J Bresnan.
J Genabith van
L Sadler
RM Kaplan
S Abney.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

We present two methods for automatically annotating treebank resources with functional structures. Both methods define systematic patterns of correspondence between partial PS configurations and functional structures. These are applied to PS rules extracted from treebanks, or directly to constraint set encodings of treebank PS trees

University of Essex Research Repository

Crossref

DCU Online Research Access Service

DCU 250 Arabic dependency bank: an LFG gold standard resource for the Arabic Penn treebank

Author: Akrout Amine
Al-Raheb Yafa
Dichy J.
van Genabith Josef
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is necessary to evaluate against established gold standard resources. Gold standards for various languages have been developed, but to our knowledge, such a resource has not yet been constructed for Arabic. The construction of the DCU 250 marks the first step towards the creation of an automatic LFG f-structure annotation algorithm for the ATB, and for the extraction of Arabic grammatical and lexical resources

Irish Universities

DCU Online Research Access Service

Automated Extraction of Tree Adjoining Grammars from a Treebank for Vietnamese

Author: Le-Hong Phuong
Nguyen Phuong Thai
Nguyen Thi Minh Huyen
Roussanaly Azim
Publication venue: HAL CCSD
Publication date: 10/06/2010
Field of study

International audienceIn this paper, we present a system that automatically extracts lexicalized tree adjoining grammars (LTAG) from treebanks. We first discuss in detail extraction algorithms and compare them to previous works. We then report the first LTAG extraction result for Vietnamese, using a recently released Vietnamese treebank. The implementation of an open source and language independent system for automatic extraction of LTAG grammars is also discussed

INRIA a CCSD electronic archive server

High-level methodologies for grammar engineering, introduction to the special issue

Author
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date
Field of study

Crossref

Automated Extraction of Tree Adjoining Grammars from a Treebank for Vietnamese

Author: Le-Hong Phuong
Nguyen Phuong Thai
Nguyen Thi Minh Huyen
Phan Thi Ha
Publication venue: Acedemie des Sciences du Vietnam
Publication date: 20/10/2010
Field of study

International audienceIn this paper, we present a system that automatically extracts lexicalized tree adjoining grammars (LTAG) fromtreebanks.We first discuss in detail extraction algorithms and compare themto previous works. We then report the first LTAG extraction result for Vietnamese, using a recently released Vietnamese treebank. The implementation of an open source and language independent system for automatic extraction of LTAG grammars is also discussed

INRIA a CCSD electronic archive server

A syntactic component for Vietnamese language processing

Author
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date
Field of study

Crossref

Using supertags as source language context in SMT

Author: Haque Rejwanul
Ma Yanjun
Naskar Sudip Kumar
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 01/01/2009
Field of study

Recent research has shown that Phrase-Based Statistical Machine Translation (PB-SMT) systems can benefit from two enhancements: (i) using words and POS tags as context-informed features on the source side; and (ii) incorporating lexical syntactic descriptions in the form of supertags on the target side. In this work we present a novel PB-SMT model that combines these two aspects by using supertags as source language contextinformed features. These features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. In our experiments two kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar. We use a memory-based classification framework that enables the estimation of these features while avoiding problems of sparseness. Despite the differences between these two approaches, the supertaggers give similar improvements. We evaluate the performance of our approach on an English-to-Chinese translation task using a state-of-the-art phrase-based SMT system, and report an improvement of 7.88% BLEU score in translation quality when adding supertags as context-informed features

Irish Universities

DCU Online Research Access Service

Promoting multiword expressions in A* TAG parsing

Author: Parmentier Yannick
Savary Agata
Waszczuk Jakub
Publication venue: HAL CCSD
Publication date: 13/12/2016
Field of study

International audienceMultiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives

HAL Université de Tours

Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources

Author: Schluter Natalie
Publication venue: Dublin City University. School of Computing
Publication date: 19/01/2011
Field of study

Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in wide-coverage grammars automatically obtained from treebanks. In particular, recent years have seen a move towards acquiring deep (LFG, HPSG and CCG) resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as syntactic dependency trees. As is often the case in early pioneering work in natural language processing, English has been the focus of attention in the first efforts towards acquiring treebank-based deep-grammar resources, followed by treatments of, for example, German, Japanese, Chinese and Spanish. However, to date no comparable large-scale automatically acquired deep-grammar resources have been obtained for French. The goal of the research presented in this thesis is to develop, implement, and evaluate treebank-based deep-grammar acquisition techniques for French. Along the way towards achieving this goal, this thesis presents the derivation of a new treebank for French from the Paris 7 Treebank, the Modified French Treebank, a cleaner, more coherent treebank with several transformed structures and new linguistic analyses. Statistical parsers trained on this data outperform those trained on the original Paris 7 Treebank, which has five times the amount of data. The Modified French Treebank is the data source used for the development of treebank-based automatic deep-grammar acquisition for LFG parsing resources for French, based on an f-structure annotation algorithm for this treebank. LFG CFG-based parsing architectures are then extended and tested, achieving a competitive best f-score of 86.73% for all features. The CFG-based parsing architectures are then complemented with an alternative dependency-based statistical parsing approach, obviating the CFG-based parsing step, and instead directly parsing strings into f-structures

CiteSeerX

Irish Universities

DCU Online Research Access Service