Search CORE

90 research outputs found

A Reranking Approach for Dependency Parsing with Variable-sized Subtree Features

Author: Kawahara Daisuke
Kurohashi Sadao
Shen Mo
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Employing higher-order subtree structures in graph-based dependency parsing has shown substantial improvement over the accuracy, however suffers from the inefficiency increasing with the order of subtrees. We present a new reranking approach for dependency parsing that can utilize complex subtree representation by applying efficient subtree selection heuristics. We demonstrate the effective-ness of the approach in experiments conducted on the Penn Treebank and the Chinese Treebank. Our system improves the baseline accuracy from 91.88 % to 93.37 % for English, and in the case of Chinese from 87.39 % to 89.16%. 1

CiteSeerX

The Berkeley Parser at the EVALITA 2011 Constituency Parsing Task

Author: Lavelli Alberto
Publication venue
Publication date: 01/01/2012
Field of study

Archivio della ricerca - Fondazione Bruno Kessler

Parser lexicalisation through self-learning

Author: Briscoe T
Rei M
Publication venue: NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference
Publication date: 01/01/2014
Field of study

We describe a new self-learning framework for parser lexicalisation that requires only a plain-text corpus of in-domain text. The method first creates augmented versions of dependency graphs by applying a series of modifications designed to directly capture higherorder lexical path dependencies. Scores are assigned to each edge in the graph using statistics from an automatically parsed background corpus. As bilexical dependencies are sparse, a novel directed distributional word similarity measure is used to smooth edge score estimates. Edge scores are then combined into graph scores and used for reranking the topn analyses found by the unlexicalised parser. The approach achieves significant improvements on WSJ and biomedical text over the unlexicalised baseline parser, which is originally trained on a subset of the Brown corpus

From news to comment: Resources and benchmarks for parsing the language of web 2.0

Author: Cetinoglu Ozlem
Foster Jennifer
Hogan Deirdre
Le Roux Joseph
Nivre Joakim
van Genabith Josef
Wagner Joachim
Publication venue: Asian Federation of Natural Language Processing
Publication date: 01/01/2011
Field of study

We investigate the problem of parsing the noisy language of social media. We evaluate four all-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers

CiteSeerX

All Fragments Count in Parser Evaluation

Author: Bastings J.
Sima'an K.
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

Coordinate noun phrase disambiguation in a generative parsing model

Author: Hogan Deirdre
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

In this paper we present methods for improving the disambiguation of noun phrase (NP) coordination within the framework of a lexicalised history-based parsing model. As well as reducing noise in the data, we look at modelling two main sources of information for disambiguation: symmetry in conjunct structure, and the dependency between conjunct lexical heads. Our changes to the baseline model result in an increase in NP coordination dependency f-score from 69.9% to 73.8%, which represents a relative reduction in f-score error of 13%

Optimizing Spectral Learning for Parsing

Author: Cohen Shay
Narayan Shashi
Publication venue
Publication date: 01/01/2016
Field of study

We describe a search algorithm for optimizing the number of latent states when estimating latent-variable PCFGs with spectral methods. Our results show that contrary to the common belief that the number of latent states for each nonterminal in an L-PCFG can be decided in isolation with spectral methods, parsing results significantly improve if the number of latent states for each nonterminal is globally optimized, while taking into account interactions between the different nonterminals. In addition, we contribute an empirical analysis of spectral algorithms on eight morphologically rich languages: Basque, French, German, Hebrew, Hungarian, Korean, Polish and Swedish. Our results show that our estimation consistently performs better or close to coarse-to-fine expectation-maximization techniques for these languages.Comment: 11 pages, ACL 201

arXiv.org e-Print Archive

ZENODO