Search CORE

10 research outputs found

Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection

Author: Versley Yannick
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 83-92. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

A Syntax-first Approach to High-quality Morphological Analysis and Lemma Disambiguation for the TüBa-D/Z Treebank

Author: Beck Kathrin
Hinrichs Erhard
Telljohann Heike
Versley Yannick
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 233-244. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Merging syntactic lexica: the case for French verbs

Author: Danlos Laurence
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 22/05/2012
Field of study

International audienceSyntactic lexicons, which associate each lexical entry with information such as valency, are crucial for several natural language processing tasks, such as parsing. However, because they contain a rich and complex information, they are very costly to develop. In this paper, we show how syntactic lexical resources can be merged, in order to take benefit from their respective strong points, and despite the disparities in the way they represent syntactic lexical information. We illustrate our methodology with the example of French verbs. We describe four large-coverage syntactic lexicons for this language, among which the Lefff, and show how we were able, using our merging algorithm, to extend and improve the Lefff

INRIA a CCSD electronic archive server

Hal-Diderot

Merging syntactic lexica: the case for French verbs

Author: Danlos Laurence
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 22/05/2012
Field of study

INRIA a CCSD electronic archive server

The French Social Media Bank: a Treebank of Noisy User Generated Content

Author: Candito Marie
Combet Vanessa
Mouilleron Virginie
Sagot Benoît
Seddah Djamé
Publication venue: HAL CCSD
Publication date: 30/11/2012
Field of study

International audienceIn recent years, statistical parsers have reached high performance levels on well-edited texts. Domain adaptation techniques have improved parsing results on text genres differing from the journalistic data most parsers are trained on. However, such corpora usually comply with standard linguistic, spelling and typographic conventions. In the meantime, the emergence of Web 2.0 communication media has caused the apparition of new types of online textual data. Although valuable, e.g., in terms of data mining and sentiment analysis, such user-generated content rarely complies with standard conventions: they are noisy. This prevents most NLP tools, especially treebank based parsers, from performing well on such data. For this reason, we have developed the French Social Media Bank, the first user-generated content treebank for French, a morphologically rich language (MRL). The first release of this resource contains 1,700 sentences from various Web 2.0 sources, including data specifically chosen for their high noisiness. We describe here how we created this treebank and expose the methodology we used for fully annotating it. We also provide baseline POS tagging and statistical constituency parsing results, which are lower by far than usual results on edited texts. This highlights the high difficulty of automatically processing such noisy data in a MRL

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Proceedings

Author: Ahrenberg Lars
Tiedemann Jörg
Volk Martin
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 98 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

Proceedings

Author: Dickinson Markus
Müürisep Kaili
Passarotti Marco
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Scalable Discriminative Parsing for German

Author: Rehbein Ines
Versley Yannick
Publication venue: Stroudsburg, PA : Association for Computational Linguistics
Publication date: 21/11/2016
Field of study

Generative lexicalized parsing models, which are the mainstay for probabilistic parsing of English, do not perform as well when applied to languages with different language-specific properties such as free(r) word order or rich morphology. For German and other non-English languages, linguistically motivated complex treebank transformations have been shown to improve performance within the framework of PCFG parsing, while generative lexicalized models do not seem to be as easily adaptable to these languages. In this paper, we show a practical way to use grammatical functions as first-class citizens in a discriminative model that allows to extend annotated treebank grammars with rich feature sets without having to suffer from sparse data problems. We demonstrate the flexibility of the approach by integrating unsupervised PP attachment and POS-based word clusters into the parser

Publikationsserver des Instituts für Deutsche Sprache

Scalable Discriminative Parsing for German

Author: Ines Rehbein
Yannick Versley
Publication venue
Publication date: 01/01/2009
Field of study

CiteSeerX

Crossref