Search CORE

6 research outputs found

Semantic annotation of nouns in Sensem corpus

Author: Castellón Masalles Irene
Climent Roca Salvador
Coll Florit Marta
Fisas Elizalde Beatriz
Julià Salas Albert
Lloberes Salvatella Marina
Rigau Claramunt German
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2010
Field of study

El objetivo principal del proyecto es la anotación semántica de los sustantivos argumentales del corpus SenSem con los sentidos de WordNet. El objetivo último de la investigación es la adquisición de preferencias semánticas.The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.Esta investigación se ha llevado a cabo gracias a los proyectos FFI2008-02579-E/FILO y TIN2009-14715-C04 del Ministerio de Ciencia e Innovación

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A multi-layered annotated corpus of scientific papers

Author: Fisas Elizalde Beatriz
Ronzano Francesco
Saggion Horacio
Publication venue: ELRA (European Language Resources Association)
Publication date: 01/01/2016
Field of study

Comunicació presentada a la Tenth International Conference on Language Resources and Evaluation (LREC 2016), celebrada els dies 23 a 28 de maig de 2016 a Portorož, Eslovènia.Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary.To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.The research leading to these results has received funding from the European Project Dr. Inventor (FP7-ICT-2013.8.1 - grant agreement no 611383) and is partly supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

UPF Digital Repository

A multi-layered annotated corpus of scientific papers

Author: Fisas Elizalde Beatriz
Ronzano Francesco
Saggion Horacio
Publication venue: ELRA (European Language Resources Association)
Publication date
Field of study

RECERCAT

CollFrEn: rich bilingual English–French collocation resource

Author: Codina Filbà Joan
Espinosa-Anke Luis
Fisas Elizalde Beatriz
Wanner Leo
Publication venue: ACL (Association for Computational Linguistics)
Publication date: 01/01/2020
Field of study

Comunicació presentada a: Joint Workshop on Multiword Expressions and Electronic Lexicons celebrat el 13 de desembre de 2020 de manera virtual.Collocations in the sense of idiosyncratic lexical co-occurrences of two syntactically bound words traditionally pose a challenge to language learners and many Natural Language Processing (NLP) applications alike. Reliable ground truth (i.e., ideally manually compiled) resources are thus of high value. We present a manually compiled bilingual English–French collocation resource with 7,480 collocations in English and 6,733 in French. Each collocation is enriched with information that facilitates its downstream exploitation in NLP tasks such as machine translation, word sense disambiguation, natural language generation, relation classification, and so forth. Our proposed enrichment covers: the semantic category of the collocation (its lexical function), its vector space representation (for each individual word as well as their joint collocation embedding), a subcategorization pattern of both its elements, as well as their corresponding BabelNet id, and finally, indices of their occurrences in large scale reference corpora.This work has been supported by the European Commission in the context of its H2020 Program under the contract numbers 870930-RIA, 825079-STARTS, and 779962-RIA

UPF Digital Repository

The IULA Treebank

Author: Arias Badia Blanca
Bel Rafecas Núria
Fisas Elizalde Beatriz
Lorente Casafont Mercè
Marimon Montserrat
Villegas Marta
Vivaldi J. (Jorge), 1952-
Vázquez Silvia
Publication venue: ELRA (European Language Resources Association)
Publication date: 01/01/2012
Field of study

Comunicació presentada al 8th International Conference on Language Resources and Evaluation (LREC'12), celebrat del 21 al 27 de maig de 2012 a Istanbul, Turquia.This paper describes on-going work for the construction of a new treebank for Spanish, The IULA Treebank. This new resource will contain about 60,000 richly annotated sentences as an extension of the already existing IULA Technical Corpus which is only PoS tagged. In this paper we have focused on describing the work done for defining the annotation process and the treebank design principles. We report on how the used framework, the DELPH-IN processing framework, has been crucial in the design principles and in the bootstrapping strategy followed, especially in what refers to the use of stochastic modules for reducing parsing overgeneration. We also report on the different evaluation experiments carried out to guarantee the quality of the already available results.This work was co-funded by the Ramón y Cajal program of the Spanish Ministerio de Ciencia e Innovación, the EU UNER - Competitiveness and Innovation Framework Program, METANET (CIP-PSP-270893), and the UPFIULA PhD grant program. We wish to thank also to Stephan Oepen for his assistance in using the DELPH-IN environment and Lluís Padró for his assistance in the significance calculation

UPF Digital Repository

The IULA Spanish LSP treebank: building and browsing

Author: Arias Badia Blanca
Bel Rafecas Núria
Fisas Elizalde Beatriz
Lorente Mercè
Marimon Montserrat
Marimon Montserrat
Morell Carlos
Vivaldi J. (Jorge), 1952-
Vázquez Silvia
Publication venue: ELRA (European Language Resources Association)
Publication date: 01/01/2014
Field of study

Comunicació presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjavík, Islàndia.This paper presents the IULA Spanish LSP Treebank, a dependency treebank of over 41,000 sentences of different domains (Law, Economy, Computing Science, Environment, and Medicine), developed in the framework of the European project METANET4U. Dependency annotations in the treebank were automatically derived from manually selected parses produced by an HPSG-grammar by a deterministic conversion algorithm that used the identifiers of grammar rules to identify the heads, the dependents, and some dependency types that were directly transferred onto the dependency structure (e.g., subject, specifier, and modifier), and the identifiers of the lexical entries to identify the argument-related dependency functions (e.g. direct object, indirect object, and oblique complement). The treebank is accessible with a browser that provides concordance-based search functions and delivers the results in two formats: (i) a column-based format, in the style of CoNLL-2006 shared task, and (ii) a dependency graph, where dependency relations are noted by an oriented arrow which goes from the dependent node to the head node. The IULA Spanish LSP Treebank is the first technical corpus of Spanish annotated at surface syntactic level following the dependency grammar theory. The treebank has been made publicly and freely available from the META-SHARE platform with a Creative Commons CC-by licence.This work was co-funded by the Ramon y Cajal program of the Spanish Ministerio de Ciencia e Innovacion, the EU UNER - Competitiveness and Innovation Framework Program, METANET (CIP-PSP-270893), and the UPF-IULA PhD grant program

UPF Digital Repository