Search CORE

21,050 research outputs found

Extended Constituent-to-Dependency Conversion for English

Author: Johansson Richard
Nugues Pierre
Publication venue
Publication date: 01/01/2007
Field of study

Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 105-112

Lund University Publications

DSpace at Tartu University Library

The CoNLL 2007 shared task on dependency parsing

Author: Hall Johan
Kübler Sandra
McDonald Ryan
Nilsson Jens
Nivre Joakim
Riedel Sebastian
Yuret Deniz
Publication venue
Publication date: 01/01/2007
Field of study

The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results

UCL Discovery

Hochschulschriftenserver - Universität Frankfurt am Main

Automatic acquisition of LFG resources for German - as good as it gets

Author: Rehbein Ines
van Genabith Josef
Publication venue: CSLI Publications
Publication date: 01/01/2009
Field of study

We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

CiteSeerX

Irish Universities

DCU Online Research Access Service

Arc-Standard Spinal Parsing with Stack-LSTMs

Author: Fabienne Grob (4247449)
Hubertus van Hedel (3404111)
Rob LabruyĂ¨re (4247443)
Tabea Aurich-Schuler (4247446)
Publication venue
Publication date: 01/01/2017
Field of study

We present a neural transition-based parser for spinal trees, a dependency representation of constituent trees. The parser uses Stack-LSTMs that compose constituent nodes with dependency-based derivations. In experiments, we show that this model adapts to different styles of dependency relations, but this choice has little effect for predicting constituent structure, suggesting that LSTMs induce useful states by themselves.Comment: IWPT 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

ZORA

FigShare

Treebank-based acquisition of LFG resources for Chinese

Author: Guo Yuqing
van Genabith Josef
Wang Haifeng
Publication venue: CSLI Publications
Publication date: 01/01/2007
Field of study

This paper presents a method to automatically acquire wide-coverage, robust, probabilistic Lexical-Functional Grammar resources for Chinese from the Penn Chinese Treebank (CTB). Our starting point is the earlier, proofof- concept work of (Burke et al., 2004) on automatic f-structure annotation, LFG grammar acquisition and parsing for Chinese using the CTB version 2 (CTB2). We substantially extend and improve on this earlier research as regards coverage, robustness, quality and fine-grainedness of the resulting LFG resources. We achieve this through (i) improved LFG analyses for a number of core Chinese phenomena; (ii) a new automatic f-structure annotation architecture which involves an intermediate dependency representation; (iii) scaling the approach from 4.1K trees in CTB2 to 18.8K trees in CTB version 5.1 (CTB5.1) and (iv) developing a novel treebank-based approach to recovering non-local dependencies (NLDs) for Chinese parser output. Against a new 200-sentence good standard of manually constructed f-structures, the method achieves 96.00% f-score for f-structures automatically generated for the original CTB trees and 80.01%for NLD-recovered f-structures generated for the trees output by Bikel’s parser

Irish Universities

DCU Online Research Access Service

LFG without C-structures

Author: Cahill Aoife
Cetinoglu Ozlem
Foster Jennifer
Hogan Deirdre
Nivre Joakim
van Genabith Josef
Publication venue
Publication date: 29/11/2010
Field of study

We explore the use of two dependency parsers, Malt and MST, in a Lexical Functional Grammar parsing pipeline. We compare this to the traditional LFG parsing pipeline which uses constituency parsers. We train the dependency parsers not on classical LFG f-structures but rather on modified dependency-tree versions of these in which all words in the input sentence are represented and multiple heads are removed. For the purposes of comparison, we also modify the existing CFG-based LFG parsing pipeline so that these "LFG-inspired" dependency trees are produced. We find that the differences in parsing accuracy over the various parsing architectures is small

Irish Universities

DCU Online Research Access Service

DSpace at Tartu University Library

Comparing constituency and dependency representations for SMT phrase-extraction

Author: Hearne Mary
Ozdowska Sylwia
Tinsley John
Publication venue
Publication date: 01/01/2008
Field of study

We consider the value of replacing and/or combining string-based methods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT

Irish Universities

DCU Online Research Access Service

A Transition-Based Directed Acyclic Graph Parser for UCCA

Author: Abend Omri
Hershcovich Daniel
Rappoport Ari
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We present the first parser for UCCA, a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. To our knowledge, the conjunction of these formal properties is not supported by any existing parser. Our transition-based parser, which uses a novel transition set and features based on bidirectional LSTMs, has value not just for UCCA parsing: its ability to handle more general graph structures can inform the development of parsers for other semantic DAG structures, and in languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201

arXiv.org e-Print Archive

Crossref

C-structures and f-structures for the British national corpus

Author: Foster Jennifer
Seddah Djamé
van Genabith Josef
Wagner Joachim
Publication venue: CSLI Publications
Publication date: 01/01/2007
Field of study

We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%

DCU Online Research Access Service