7 research outputs found
Semantic Annotation of Deverbal Nominalizations in the Spanish AnCora Corpus
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 187-198.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
IARG-AnCora: Annotating AnCora corpus with implicit arguments
[EN] Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in
AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling
based on machine learning techniques. Semantic analyzers are essential components in the
current applications of language technologies, in which it is important to obtain a deeper
understanding of the text to make inferences on the highest level in order to obtain qualitative
improvements in the results.[ES] IARG-AnCora tiene como objetivo la anotación con papeles temáticos de los
argumentos implícitos de las nominalizaciones deverbales en el corpus AnCora. Estos corpus
servirán de base para los sistemas de etiquetado automático de roles semánticos basados en
técnicas de aprendizaje automático. Los analizadores semánticos son componentes básicos en
las aplicaciones actuales de las tecnologías del lenguaje, en las que se quiere potenciar una
comprensión más profunda del texto para realizar inferencias de más alto nivel y obtener así
mejoras cualitativas en los resultados.Acción complementaria (FFI2011-13737-E), asociada al proyecto TextMess 2.0 (TIN2009-13391-C04-03/04).Taulé Delor, M.; Peris, A.; Martí Antonín, MA.; Moreno Boronat, LA.; Rodríguez, H.; Moreda, P. (2012). IARG-AnCora: Anotación de los corpus AnCora con argumentos implícitos. PROCESAMIENTO DEL LENGUAJE NATURAL. 49:181-184. http://hdl.handle.net/10251/29863S1811844
IARG-AnCora: Anotación de los corpus AnCora con argumentos implícitos
Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to obtain a deeper understanding of the text to make inferences on the highest level in order to obtain qualitative improvements in the results
AnCora-Nom: un léxico de nominalizaciones deverbales del español
En este artículo se describe un nuevo recurso: AnCora-Nom, un léxico de nominalizaciones deverbales del español. Actualmente, contiene 1.655 entradas léxicas y 3.094 sentidos, donde cada sentido tiene asociado el tipo denotativo y la estructura argumental con los papeles temáticos correspondientes. Este léxico se ha extraído automáticamente a partir de la información anotada en el corpus AnCora-Es. AnCora-Nom se derivó teniendo en cuenta no sólo la información estrictamente relacionada con las nominalizaciones deverbales sino también con información morfológica y sintáctico-semántica previamente anotada en el corpus.This paper describes a new lexical resource: Ancora-Nom, a Spanish lexicon of deverbal nominalizations. At present, it contains 1,655 lexical entries and 3,094 senses. Each sense has a denotation type associated, and the mapping of nominal complements with arguments and the corresponding theta roles is also annotated. A particular interest of this lexicon is that it has been automatically extracted from the annotated AnCora-Es corpus. AnCora-Nom was derived taking into account the information directly related to nominalizations, but also the morphological and syntactic-semantic information annotated in the corpus.This research has received support from the projects Text-Knowledge 2.0 (TIN2009-13391-C04-04) and AnCora-Net (FFI2009-06497-E/FILO) from the Spanish Ministry of Science and Innovation, and a FPU grant (AP2007-01028) from the Spanish Ministry of Education
AnCora-Nom: A Spanish lexicon of deverbal nominalizations
This paper describes a new lexical resource: Ancora-Nom, a Spanish lexicon of deverbal nominalizations. At present, it contains 1,655 lexical entries and 3,094 senses. Each sense has a denotation type associated, and the mapping of nominal complements with arguments and the corresponding theta roles is also annotated. A particular interest of this lexicon is that it has been automatically extracted from the annotated AnCora-Es corpus. AnCora-Nom was derived taking into account the information directly related to nominalizations, but also the morphological and syntactic-semantic information annotated in the corpus, such as WordNet synsets, the specifier type of the nominalization, and its morphological number (singular or plural)
Iarg-AnCora: Spanish corpus annotated with implicit arguments
This article presents the Spanish Iarg-AnCora corpus (400 k-words, 13,883 sentences) annotated with the implicit arguments of deverbal nominalizations (18,397 occurrences). We describe the methodology used to create it, focusing on the annotation scheme and criteria adopted. The corpus was manually annotated and an interannotator agreement test was conducted (81 % observed agreement) in order to ensure the reliability of the final resource. The annotation of implicit arguments results in an important gain in argument and thematic role coverage (128 % on average). It is the first corpus annotated with implicit arguments for the Spanish language with a wide coverage that is freely available. This corpus can subsequently be used by machine learning-based semantic role labeling systems, and for the linguistic analysis of implicit arguments grounded on real data. Semantic analyzers are essential components of current language technology applications, which need to obtain a deeper understanding of the text in order to make inferences at the highest level to obtain qualitative improvements in the results
Proceedings
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 268 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891