6 research outputs found

    Semantic annotation of nouns in Sensem corpus

    Get PDF
    El objetivo principal del proyecto es la anotaci贸n sem谩ntica de los sustantivos argumentales del corpus SenSem con los sentidos de WordNet. El objetivo 煤ltimo de la investigaci贸n es la adquisici贸n de preferencias sem谩nticas.The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.Esta investigaci贸n se ha llevado a cabo gracias a los proyectos FFI2008-02579-E/FILO y TIN2009-14715-C04 del Ministerio de Ciencia e Innovaci贸n

    A multi-layered annotated corpus of scientific papers

    No full text
    Comunicaci贸 presentada a la Tenth International Conference on Language Resources and Evaluation (LREC 2016), celebrada els dies 23 a 28 de maig de 2016 a Portoro啪, Eslov猫nia.Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary.To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.The research leading to these results has received funding from the European Project Dr. Inventor (FP7-ICT-2013.8.1 - grant agreement no 611383) and is partly supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    A multi-layered annotated corpus of scientific papers

    No full text
    Comunicaci贸 presentada a la Tenth International Conference on Language Resources and Evaluation (LREC 2016), celebrada els dies 23 a 28 de maig de 2016 a Portoro啪, Eslov猫nia.Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary.To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.The research leading to these results has received funding from the European Project Dr. Inventor (FP7-ICT-2013.8.1 - grant agreement no 611383) and is partly supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    CollFrEn: rich bilingual English鈥揊rench collocation resource

    No full text
    Comunicaci贸 presentada a: Joint Workshop on Multiword Expressions and Electronic Lexicons celebrat el 13 de desembre de 2020 de manera virtual.Collocations in the sense of idiosyncratic lexical co-occurrences of two syntactically bound words traditionally pose a challenge to language learners and many Natural Language Processing (NLP) applications alike. Reliable ground truth (i.e., ideally manually compiled) resources are thus of high value. We present a manually compiled bilingual English鈥揊rench collocation resource with 7,480 collocations in English and 6,733 in French. Each collocation is enriched with information that facilitates its downstream exploitation in NLP tasks such as machine translation, word sense disambiguation, natural language generation, relation classification, and so forth. Our proposed enrichment covers: the semantic category of the collocation (its lexical function), its vector space representation (for each individual word as well as their joint collocation embedding), a subcategorization pattern of both its elements, as well as their corresponding BabelNet id, and finally, indices of their occurrences in large scale reference corpora.This work has been supported by the European Commission in the context of its H2020 Program under the contract numbers 870930-RIA, 825079-STARTS, and 779962-RIA

    The IULA Treebank

    No full text
    Comunicaci贸 presentada al 8th International Conference on Language Resources and Evaluation (LREC'12), celebrat del 21 al 27 de maig de 2012 a Istanbul, Turquia.This paper describes on-going work for the construction of a new treebank for Spanish, The IULA Treebank. This new resource will contain about 60,000 richly annotated sentences as an extension of the already existing IULA Technical Corpus which is only PoS tagged. In this paper we have focused on describing the work done for defining the annotation process and the treebank design principles. We report on how the used framework, the DELPH-IN processing framework, has been crucial in the design principles and in the bootstrapping strategy followed, especially in what refers to the use of stochastic modules for reducing parsing overgeneration. We also report on the different evaluation experiments carried out to guarantee the quality of the already available results.This work was co-funded by the Ram贸n y Cajal program of the Spanish Ministerio de Ciencia e Innovaci贸n, the EU UNER - Competitiveness and Innovation Framework Program, METANET (CIP-PSP-270893), and the UPFIULA PhD grant program. We wish to thank also to Stephan Oepen for his assistance in using the DELPH-IN environment and Llu铆s Padr贸 for his assistance in the significance calculation

    The IULA Spanish LSP treebank: building and browsing

    No full text
    Comunicaci贸 presentada al 9th International Conference on Language Resources and Evaluation (LREC'14), celebrat del 26 al 31 de maig de 2014 a Reykjav铆k, Isl脿ndia.This paper presents the IULA Spanish LSP Treebank, a dependency treebank of over 41,000 sentences of different domains (Law, Economy, Computing Science, Environment, and Medicine), developed in the framework of the European project METANET4U. Dependency annotations in the treebank were automatically derived from manually selected parses produced by an HPSG-grammar by a deterministic conversion algorithm that used the identifiers of grammar rules to identify the heads, the dependents, and some dependency types that were directly transferred onto the dependency structure (e.g., subject, specifier, and modifier), and the identifiers of the lexical entries to identify the argument-related dependency functions (e.g. direct object, indirect object, and oblique complement). The treebank is accessible with a browser that provides concordance-based search functions and delivers the results in two formats: (i) a column-based format, in the style of CoNLL-2006 shared task, and (ii) a dependency graph, where dependency relations are noted by an oriented arrow which goes from the dependent node to the head node. The IULA Spanish LSP Treebank is the first technical corpus of Spanish annotated at surface syntactic level following the dependency grammar theory. The treebank has been made publicly and freely available from the META-SHARE platform with a Creative Commons CC-by licence.This work was co-funded by the Ramon y Cajal program of the Spanish Ministerio de Ciencia e Innovacion, the EU UNER - Competitiveness and Innovation Framework Program, METANET (CIP-PSP-270893), and the UPF-IULA PhD grant program