Search CORE

96 research outputs found

AnCora-Nom: A Spanish lexicon of deverbal nominalizations

Author: Peris Morant Aina
Taulé Delor Mariona
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 26/02/2019
Field of study

This paper describes a new lexical resource: Ancora-Nom, a Spanish lexicon of deverbal nominalizations. At present, it contains 1,655 lexical entries and 3,094 senses. Each sense has a denotation type associated, and the mapping of nominal complements with arguments and the corresponding theta roles is also annotated. A particular interest of this lexicon is that it has been automatically extracted from the annotated AnCora-Es corpus. AnCora-Nom was derived taking into account the information directly related to nominalizations, but also the morphological and syntactic-semantic information annotated in the corpus, such as WordNet synsets, the specifier type of the nominalization, and its morphological number (singular or plural)

Diposit Digital de la Universitat de Barcelona

AnCora-Nom: un léxico de nominalizaciones deverbales del español

Author: Peris Morant Aina
Taulé Delor Mariona
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2011
Field of study

En este artículo se describe un nuevo recurso: AnCora-Nom, un léxico de nominalizaciones deverbales del español. Actualmente, contiene 1.655 entradas léxicas y 3.094 sentidos, donde cada sentido tiene asociado el tipo denotativo y la estructura argumental con los papeles temáticos correspondientes. Este léxico se ha extraído automáticamente a partir de la información anotada en el corpus AnCora-Es. AnCora-Nom se derivó teniendo en cuenta no sólo la información estrictamente relacionada con las nominalizaciones deverbales sino también con información morfológica y sintáctico-semántica previamente anotada en el corpus.This paper describes a new lexical resource: Ancora-Nom, a Spanish lexicon of deverbal nominalizations. At present, it contains 1,655 lexical entries and 3,094 senses. Each sense has a denotation type associated, and the mapping of nominal complements with arguments and the corresponding theta roles is also annotated. A particular interest of this lexicon is that it has been automatically extracted from the annotated AnCora-Es corpus. AnCora-Nom was derived taking into account the information directly related to nominalizations, but also the morphological and syntactic-semantic information annotated in the corpus.This research has received support from the projects Text-Knowledge 2.0 (TIN2009-13391-C04-04) and AnCora-Net (FFI2009-06497-E/FILO) from the Spanish Ministry of Science and Innovation, and a FPU grant (AP2007-01028) from the Spanish Ministry of Education

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Text as scene: discourse deixis and bridging relations

Author: Martí Antonín Maria Antònia
Recasens Potau Marta
Taulé Delor Mariona
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2007
Field of study

En este artículo se presenta un nuevo marco, “el texto como escena”, que establece las bases para la anotación de dos relaciones de correferencia: la deixis discursiva y las relaciones de bridging. La incorporación de lo que llamamos escenas textuales y contextuales proporciona unas directrices de anotación más flexibles, que diferencian claramente entre tipos de categorías generales. Un marco como éste, capaz de tratar la deixis discursiva y las relaciones de bridging desde una perspectiva común, tiene como objetivo mejorar el bajo grado de acuerdo entre anotadores obtenido por esquemas de anotación anteriores, que son incapaces de captar las referencias vagas inherentes a estos dos tipos de relaciones. Las directrices aquí presentadas completan el esquema de anotación diseñado para enriquecer el corpus español CESS-ECE con información correferencial y así construir el corpus CESS-Ancora.This paper presents a new framework, “text as scene”, which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis and bridging relations from a common perspective aims at improving the poor reliability scores obtained by previous annotation schemes, which fail to capture the vague references inherent in both these links. The guidelines presented here complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, thus building the CESS-Ancora corpus.This paper has been supported by the FPU grant (AP2006-00994) from the Spanish Ministry of Education and Science. It is based on work supported by the CESS-ECE (HUM2004-21127), Lang2World (TIN2006- 15265-C06-06), and Praxem (HUM2006- 27378-E) projects

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The use of the past tense aspect in Spanish by study At-Home and Study-Abroad Chinese learners in semi-guided written tasks

Author: Díaz Rodríguez Lourdes
Sun Yuliang
Taulé Delor Mariona
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 21/10/2020
Field of study

This work focuses on the influence of L2 acquisition environments (At-Home and Study-Abroad) on the language proficiency of L1 Mandarin Chinese learners of Spanish. We chose the use of Spanish past tense aspect (pretérito indefinido and pretérito imperfecto) as the entry point to analyze Chinese learners proficiency in three semi-guided writing tasks. Our results reveal that the different teaching objectives in these acquisition environments promote a different development of Chinese learners' language capacities in Spanish: the At-Home learners have a more native-like performance when factors at the discourse level are taken into account, whereas the Study-Abroad learners have a more native-like performance when factors at the lexical level are taken into account.However, the usage pattern of the Spanish past tense aspect by learners in both environments share prototypical associations at the lexical and discourse levels. Keywords:past tense aspect, acquisition environment, L2 Spanish, L1 Mandarin Chines

Diposit Digital de la Universitat de Barcelona

Iarg-AnCora: Spanish corpus annotated with implicit arguments

Author: Peris Morant Aina
Rodríguez Hontoria Horacio
Taulé Delor Mariona
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2020
Field of study

This article presents the Spanish Iarg-AnCora corpus (400 k-words, 13,883 sentences) annotated with the implicit arguments of deverbal nominalizations (18,397 occurrences). We describe the methodology used to create it, focusing on the annotation scheme and criteria adopted. The corpus was manually annotated and an interannotator agreement test was conducted (81 % observed agreement) in order to ensure the reliability of the final resource. The annotation of implicit arguments results in an important gain in argument and thematic role coverage (128 % on average). It is the first corpus annotated with implicit arguments for the Spanish language with a wide coverage that is freely available. This corpus can subsequently be used by machine learning-based semantic role labeling systems, and for the linguistic analysis of implicit arguments grounded on real data. Semantic analyzers are essential components of current language technology applications, which need to obtain a deeper understanding of the text in order to make inferences at the highest level to obtain qualitative improvements in the results

Diposit Digital de la Universitat de Barcelona

Empirical methods for the study of denotation in nominalizations in Spanish

Author: Peris Morant Aina
Rodríguez Hontoria Horacio
Taulé Delor Mariona
Publication venue: 'MIT Press - Journals'
Publication date: 07/12/2020
Field of study

This article deals with deverbal nominalizations in Spanish; concretely, we focus on the denotative distinction between event and result nominalizations. The goals of this work is twofold: first, to detect the most relevant features for this denotative distinction; and, second, to build an automatic classification system of deverbal nominalizations according to their denotation. We have based our study on theoretical hypotheses dealing with this semantic distinction and we have analyzed them empirically by means of Machine Learning techniques which are the basis of the ADN-Classifier. This is the first tool that aims to automatically classify deverbal nominalizations in event, result, or underspecified denotation types in Spanish. The ADN-Classifier has helped us to quantitatively evaluate the validity of our claims regarding deverbal nominalizations. We set up a series of experiments in order to test the ADN-Classifier with different models and in different realistic scenarios depending on the knowledge resources and natural language processors available. The ADN-Classifier achieved good results (87.20% accuracy)

Diposit Digital de la Universitat de Barcelona

Text as Scene: Discourse Deixis and Bridging Relations

Author: Martí Antonin M. Antònia
Recasens Potau Marta
Taulé Delor Mariona
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 05/03/2019
Field of study

This paper presents a new framework, "text as scene", which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis and bridging relations from a common perspective aims at improving the poor reliability scores obtained by previous annotation schemes, which fail to capture the vague references inherent in both these links. The guidelines presented here complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, thus building the CESS-Ancora corpus

Diposit Digital de la Universitat de Barcelona

Tecnologies de la llengua i les seves aplicacions

Author: Civit Torruella Montserrat
Martí Antonín M. Antònia
Taulé Delor Mariona
Publication venue: 'Universidade da Coruna'
Publication date: 01/01/2004
Field of study

[Resumo] A investigación en Lingüística Computacional e Procesamento da Lenguaje Natural deu lugar estes últimos anos ás denominadas Tecnoloxías da Linguaxe, cuxo obxectivo principal é o desenvolvemento de sistemas informáticos capaces de recoñeceren, comprenderen e xeraren linguaxe humana en todas as súas formas. Con esta finalidade, desenvolveuse unha serie de aplicacións, como a Tradución Automática, a Extracción e Recuperación da Información, a Clasificación de Documentos etc., que procesan a información para facilitaren o acceso, organización e transmisión do coñecemento que xera a chamada Sociedade da Información en que vivimos. Como noutras disciplinas científicas, na área da Lingüística Computacional e do Procesamento da Linguaxe Natural pasouse dunha etapa inicial centrada na investigación básica de carácter experimental a outra en que se interaxe máis coa sociedade e, por tanto, máis interesada na creación de produtos e aplicacións que resolvan problemas reais. Isto significa desenvolver sistemas e recursos capaces de analizaren a linguaxe sen restricións, isto é, que ofrezan unha ampla cobertura lingüística. Neste artigo preséntase de xeito introdutorio os recursos (lingüísticos) e as aplicacións máis características que se desenvolven actualmente no marco das Tecnoloxías da Linguaxe. En concreto, salientaremos dos recursos necesarios os analizadores e desambiguadores morfolóxicos e sintácticos, os lexicóns computacionais e os corpus lingüísticos, nomeadamente os etiquetados. Canto ás aplicacións, centrarémonos básicamente na Recuperación e Extracción da Información e na Tradución Automática.[Abstract] In the last years, research on Computational Linguistics and Natural Language Processing has led to Language Technologies, whose main goal is to develop computer systems capable to recognize, understand and generate human languages in all their forms. For this purpose, several applications have been developed, such as Machine Translation, Information Retrieval and Information Extraction or Document Classification. These applications process the language in order to ease access to knowledge, its organization or its transmission, activities needed by our Information Society. As in other disciplines, Computational Linguistics and Natural Language Processing have gone from a first period of basic, experimental research to another in which new products and real applications have to be created, in order to solve interaction problems. This means that we need to develop systems and resources capable to deal with unrestricted language, that is, broad-coverage systems and resources. This paper presents an introduction to linguistics resources as well as the main applications being developed nowadays in the Language Technologies framework. More concretely, it emphasizes morphological analyzers, taggers, syntactic parsers, computational lexicons and linguistic annotated corpora. As for applications, stress is laid on Information Retrieval, Information Extraction and Machine Translation

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

DISCOver: DIStributional approach based on syntactic dependencies for discovering COnstructions

Author: Kovatchev Venelin
Martí Antonin M. Antònia
Salamó Llorente Maria
Taulé Delor Mariona
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/10/2020
Field of study

One of the goals in Cognitive Linguistics is the automatic identification and analysis of constructions, since they are fundamental linguistic units for understanding language. This article presents DISCOver, an unsupervised methodology for the automatic discovery of lexico-syntactic patterns that can be considered as candidates for constructions. This methodology follows a distributional semantic approach. Concretely, it is based on our proposed pattern-construction hypothesis: those contexts that are relevant to the definition of a cluster of semantically related words tend to be (part of) lexico-syntactic constructions. Our proposal uses Distributional Semantic Models for modelling the context taking into account syntactic dependencies. After a clustering process, we linked all those clusters with strong relationships and we use them as a source of information for deriving lexico-syntactic patterns, obtaining a total number of 220,732 candidates from a 100 million token corpus of Spanish. We evaluated the patterns obtained intrinsically, applying statistical association measures and they were also evaluated qualitatively by experts. Our results were superior to the baseline in both quality and quantity in all cases. While our experiments have been carried out using a Spanish corpus, this methodology is language independent and only requires a large corpus annotated with the parts of speech and dependencies to be applied

Diposit Digital de la Universitat de Barcelona

AnCora-Net: integración multilingüe de recursos lingüísticos

Author: Borrega Oriol
Martí Antonin M. Antònia
Taulé Delor Mariona
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 26/02/2019
Field of study

AnCora-Net es un léxico verbal multilingüe creado a partir de la integración de los léxicos verbales AnCora-Verb, del catalán y español, en el Unified Verb Index del inglés. El Unified Verb Index aúna diferentes fuentes de conocimiento del inglés de amplia cobertura que son sin duda un referente en representación semántica. La integración de nuestros recursos con los del inglés nos permite enriquecer el contenido de los léxicos AnCora-Verb con información semántica codificada para el inglés. Asimismo, el Unified Verb Index también se enriquece con la incorporación de los léxicos AnCora-Verb, del catalán y español, dando lugar a un recurso multilingüe que puede ser útil para estudios comparativos

Diposit Digital de la Universitat de Barcelona