4 research outputs found

    Negation Processing in Spanish and its Application to Sentiment Analysis

    Get PDF
    El Procesamiento del Lenguaje Natural es el área de la Inteligencia Artificial que tiene como objetivo desarrollar mecanismos computacionalmente eficientes para facilitar la comunicación entre personas y máquinas por medio del lenguaje natural. Para que las máquinas sean capaces de procesar, comprender y generar lenguaje humano hay que tener en cuenta una amplia gama de fenómenos lingüísticos, como la negación, la ironía o el sarcasmo, que se utilizan para dar a las palabras un significado diferente. Esta tesis doctoral se centra en el estudio de la negación, un fenómeno lingüístico complejo que utilizamos en nuestra comunicación diaria. A diferencia de la mayoría de los estudios existentes hasta el momento se realiza sobre textos en español, ya que es la segunda lengua con más hablantes nativos, la tercera más utilizada en Internet, y no existen sistemas de procesamiento de negación disponibles en esta lengua.Natural Language Processing is the area of Artificial Intelligence that aims to develop computationally efficient mechanisms to facilitate communication between people and machines through natural language. To ensure that machines are capable of processing, understanding and generating human language, a wide range of linguistic phenomena must be taken into account, such as negation, irony or sarcasm, which are used to give words a different meaning. This doctoral thesis focuses on the study of negation, a complex linguistic phenomenon that we use in our daily communication. In contrast to most of the existing studies to date, it is carried out on Spanish texts, because i) it is the second language with most native speakers, ii) it is the third language most used on the Internet, and iii) there are no negation processing systems available on this language.Tesis Univ. Jaén. Departamento de Informática. Leída el 13 de septiembre de 2019

    Contributions to information extraction for spanish written biomedical text

    Get PDF
    285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field

    Análisis de sentimientos para textos cortos en español, una revisión del estado del arte

    Get PDF
    Actualmente las redes sociales son el medio de comunicación más utilizado por los usuarios en general. El análisis automático de las opiniones y comentarios emitidos en temas de interés como, por ejemplo: ciencia, tecnología, política, etc., requieren una revisión exhaustiva que gracias al análisis de sentimientos se logra determinar lo que los usuarios quieren expresar. En la actualidad existen una gran cantidad de herramientas que permiten realizar este análisis de sentimientos para textos cortos. Sin embargo, la mayoría se enfoca en el idioma inglés. Este artículo tiene como objetivo el estudio de herramientas y Corpus utilizados para el análisis de sentimientos de textos en español mediante un estudio del estado del arte. Entre los resultados más importantes se encontró que el corpus más utilizado es el TASS, Además se realizó una comparación entre los métodos utilizados, entre ellos se pueden destacar SentiWordNet, Bayes, iSol, siendo el más eficiente SVM.In general, social networks are currently used by users for communication, comments, and opinions. So, the automatic analysis of them on topics of interest requires an exhaustive review by machines. With sentiment analysis, it is possible to determine what users want to express. Currently, many tools allow this sentiment analysis to be carried out for short texts. However, most of them are focused on the English language. This article aims to study the tools and Corpus used to analyze sentiments of texts in Spanish through state of art. Among the most important results, it was found that the most used Corpus is the TASS. In addition, a comparison was made between the methods used, including SentiWordNet, Bayes, iSol, the most efficient being SVM
    corecore