    Is This a Joke? Detecting Humor in Spanish Tweets

    While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84% and a recall of 69%.Comment: Preprint version, without referra

    Lexicon for natural language generation in spanish adapted to alternative and augmentative communication

    In this paper we present Elsa, the first lexicon for Spanish with morphological, syntactic and semantic information automatically generated from a well-known pictogram resource and especially tailored for Augmentative and Alternative Communication (AAC). This lexicon, focusing on that specific icon set widely used within AAC applications, is motivated by the need to improve Natural Language Generation (NLG) systems to aid people who have been diagnosed to suffer from communication disorders. In addition, we design an automatic lexicon extension procedure by means of a training process to complete the linguistic data. For this we used a dataset composed of novels and tales in Spanish, with pictogram representations, since the lexicon is meant for AAC applications for children with disabilities. Moreover, we provide the algorithms used to build our lexicon and a use case of Elsa within an NLG system to observe the usability of our proposal.Agencia Estatal de Investigación | Ref. TEC2016-76465-C2-2-RXunta de Galicia | Ref. GRC2014/04

    Bootstrapping a Portuguese WordNet from Galician, Spanish and English wordnets

    Series: Lecture notes in computer science, ISSN 0302-9743, vol. 8854In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora. The process generated a total of 56~770 synsets and 97~058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a precision varying from 53\% to 75\% percent, depending on the cut-line. The results were satisfying and comparable to similar experiments using the WN-Toolkit.PEst-OE/EEI/UI0752/2014, TIN2012-38584-C06-01, TIN2012-38584-C06-0

    TASS2018: Medical knowledge discovery by combining terminology extraction techniques with machine learning classification

    En este artículo presentamos la aproximación seguida por el equipo UPF-UPC en la tarea TASS 2018 Task 3 challenge. Nuestra aproximación puede calificarse, de acuerdo a los códigos propuestos por la organización, como H-KBS, ya que utiliza métodos basados en conocimiento y aprendizaje supervisado. El pipeline utilizado incluye: i) Un pre-proceso standard de los documentos usando Freeling (etiquetado morfosintáctico y análisis de dependencias); ii) El uso de una herramienta de etiquetado sequencial basada en CRF para completar las subtareas A (identificación de frases) y B (clasificación de frases), y iii) El abordaje de la subtarea C (extracción de relaciones semánticas) usando una aproximación híbrida que integra dos classificadores basados en Regresión Logística, y dos extractores léxicos para pares entity/entity y relaciones is-a y same-as.In this paper we present the procedure followed to complete the run submitted by the UPF-UPC team to the TASS 2018 Task 3 challenge. Such procedure may be classified, according the organization’s codes, as H-KB-S as it takes profit from a knowledge based methodology as well as some supervised methods. Our pipeline includes: i) A standard pre-process of the documents using Freeling tool suite (POS tagging and dependency parsing); ii) Use of a CRF sequence labelling tool for completing both subtasks A (key phrase identification) and B (key phrase classification), and iii) Facing the subtask C (setting semantic relationships) by using a hybrid approach that uses two Logistic Regression classifiers, followed by lexical shallow relation extractors for entity/entity pairs related by is-a and same-as relations.Peer ReviewedPostprint (published version

    Big data and automatic detection of topics: social network texts

    This paper proposes the analysis of the influence of terms that express feelings in the automatic detection of topics in social networks. This proposal uses an ontology-based methodology which incorporates the ability to identify and eliminate those terms that present a sentimental orientation in social network texts, which can negatively influence the detection of topics. To this end, two resources were used to analyze feelings in order to detect these terms. The proposed system was evaluated with real data sets from the Twitter and Facebook social networks in English and Spanish respectively, demonstrating in both cases the influence of sentimentally oriented terms in the detection of topics in social network texts

    Verb similarity: comparing corpus and psycholinguistic data

    Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers' organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.La similitud, que desempeña un papel clave en campos como la ciencia cognitiva, la psicolingüística y el procesamiento del lenguaje natural, es un concepto amplio y multifacético. En este trabajo analizamos cómo dos enfoques que pertenecen a diferentes perspectivas, la visión del corpus y la visión psicolingüística, articulan la semejanza entre los sentidos verbales en español. Específicamente, comparamos la similitud entre los sentidos verbales basados en su estructura argumental, que se capta a través de roles semánticos, con su similitud definida por las asociaciones de palabras. Abordamos la cuestión de si la estructura del argumento verbal, que refleja la expresión de los acontecimientos, y las asociaciones de palabras, que están relacionadas con la organización de los hablantes del léxico mental, forman similitud entre los verbos de una manera congruente, un tema que no ha sido explorado previamente. Mientras que encontramos correlaciones significativas entre las similitudes de los sentidos verbales obtenidas de estos dos enfoques, nuestros hallazgos también resaltan algunas discrepancias entre ellos y la importancia del grado de abstracción de la anotación del corpus y las representaciones psicolingüísticas.La similitud, que exerceix un paper clau en camps com la ciència cognitiva, la psicolingüística i el processament del llenguatge natural, és un concepte ampli i multifacètic. En aquest treball analitzem com dos enfocaments que pertanyen a diferents perspectives, la visió del corpus i la visió psicolingüística, articulen la semblança entre els sentits verbals en espanyol. Específicament, comparem la similitud entre els sentits verbals basats en la seva estructura argumental, que es capta a través de rols semàntics, amb la seva similitud definida per les associacions de paraules. Abordem la qüestió de si l'estructura de l'argument verbal, que reflecteix l'expressió dels esdeveniments, i les associacions de paraules, que estan relacionades amb l'organització dels parlants del lèxic mental, formen similitud entre els verbs d'una manera congruent, un tema que no ha estat explorat prèviament. Mentre que trobem correlacions significatives entre les similituds dels sentits verbals obtingudes d'aquests dos enfocaments, les nostres troballes també ressalten algunes discrepàncies entre ells i la importància del grau d'abstracció de l'anotació del corpus i les representacions psicolingüístiques

    Desambiguación Verbal Automática: un estudio sobre el rendimiento de la información semántica argumental

    Una de las tareas fundamentales para la resolución de la ambigüedad en el ámbito del Procesamiento del Lenguaje Natural es la Desambiguación Semántica Automática; especialmente la tarea específica de Desambiguación Verbal Automática (DVA). En la presente investigación se lleva a cabo una tarea experimental con la finalidad de comprobar la viabilidad de una aproximación a la DVA basada en la información semántica de los argumentos verbales. Los buenos resultados obtenidos indicarían la necesidad de tener en cuenta este tipo de información en futuras propuestas de DVA