Search CORE

204 research outputs found

Voting and Stacking in Data-Driven Dependency Parsing

Author: Fishel Mark
Nivre Joakim
Publication venue
Publication date: 13/05/2009
Field of study

Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 219-222. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

Improving the Arc-Eager Model with Reverse Parsing

Author: Fernández-González Daniel
Gómez-Rodríguez Carlos
Vilares David
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 02/11/2016
Field of study

A known way to improve the accuracy of dependency parsers is to combine several different parsing algorithms, in such a way that the weaknesses of each of the models can be compensated by the strengths of others. For example, voting-based combination schemes are based on variants of the idea of analyzing each sentence with various parsers, and constructing a combined output where the head of each node is determined by "majority vote" among the different parsers. Typically, such approaches combine very different parsing models to take advantage of the variability in the parsing errors they make. In this paper, we show that consistent improvements in accuracy can be obtained in a much simpler way by combining a single parser with itself. In particular, we start with a greedy implementation of the Nivre pseudo-projective arc-eager algorithm, a well-known left-to-right transition-based parser, and we combine it with a "mirrored" version of the algorithm that analyzes sentences from right to left. To determine which of the two obtained outputs we trust for the head of each node, we use simple criteria based on the length and position of dependency arcs. Experiments on several datasets from the CoNLL-X shared task and the WSJ section of the English Penn Treebank show that the novel combination system obtains better performance than the baseline arc-eager parser in all cases. To test the generality of the approach, we also perform experiments with a different transition system (arc-standard) and a different search strategy (beam search), obtaining similar improvements in all these settings

From news to comment: Resources and benchmarks for parsing the language of web 2.0

Author: Cetinoglu Ozlem
Foster Jennifer
Hogan Deirdre
Le Roux Joseph
Nivre Joakim
van Genabith Josef
Wagner Joachim
Publication venue: Asian Federation of Natural Language Processing
Publication date: 01/01/2011
Field of study

We investigate the problem of parsing the noisy language of social media. We evaluate four all-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers

CiteSeerX

Bare-Bones Dependency Parsing — A Case for Occam's Razor?

Author: Nivre Joakim
Publication venue
Publication date: 09/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 6-11. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

CiteSeerX

Conference Program

Author
Publication venue
Publication date: 15/05/2009
Field of study

Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), xi-xiv. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

Evolution of Italian Treebank and Dependency Parsing towards Universal Dependencies

Author: ATTARDI GIUSEPPE
Saletti Simone
SIMI MARIA
Publication venue: 'OpenEdition'
Publication date: 01/01/2015
Field of study

Illustriamo i principali cambiamenti effettuati sulla treebank a dipendenze per l’italiano nel passaggio a una versione estesa e rivista secondo lo stile di annotazione delle Universal Dependencies. Esploriamo come questi cambiamenti influenzano l’accuratezza dei parser a dipendenze, eseguendo test comparativi su diverse versioni della treebank. Nonostante i cambiamenti rilevanti nello stile di annotazione, i parser statistici sono in grado di adeguarsi e migliorare in accuratezza.We highlight the main changes recently undergone by the Italian De-pendency Treebank in the transition to an extended and revised edition, compliant with the annotation schema of Universal Dependencies. We explore how these changes affect the accuracy of dependen-cy parsers, performing comparative tests on various versions of the treebank. De-spite significant changes in the annota-tion style, statistical parsers seem to cope well and mostly improve

OpenEdition

Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

Author: Glass Kevin R
Bangay Shaun Douglas
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated

The Australian National University