4 research outputs found
Negation Processing in Spanish and its Application to Sentiment Analysis
El Procesamiento del Lenguaje Natural es el área de la Inteligencia Artificial que tiene como objetivo desarrollar mecanismos computacionalmente eficientes para facilitar la comunicación entre personas y máquinas por medio del lenguaje natural. Para que las máquinas sean capaces de procesar, comprender y generar lenguaje humano hay que tener en cuenta una amplia gama de fenómenos lingüísticos, como la negación, la ironía o el sarcasmo, que se utilizan para dar a las palabras un significado diferente.
Esta tesis doctoral se centra en el estudio de la negación, un fenómeno lingüístico complejo que utilizamos en nuestra comunicación diaria. A diferencia de la mayoría de los estudios existentes hasta el momento se realiza sobre textos en español, ya que es la segunda lengua con más hablantes nativos, la tercera más utilizada en Internet, y no existen sistemas de procesamiento de negación disponibles en esta lengua.Natural Language Processing is the area of Artificial Intelligence that aims to develop computationally efficient mechanisms to facilitate communication between people and machines through natural language. To ensure that machines are capable of processing, understanding and generating human language, a wide range of linguistic phenomena must be taken into account, such as negation, irony or sarcasm, which are used to give words a different meaning.
This doctoral thesis focuses on the study of negation, a complex linguistic phenomenon that we use in our daily communication. In contrast to most of the existing studies to date, it is carried out on Spanish texts, because i) it is the second language with most native speakers, ii) it is the third language most used on the Internet, and iii) there are no negation processing systems available on this language.Tesis Univ. Jaén. Departamento de Informática. Leída el 13 de septiembre de 2019
Contributions to information extraction for spanish written biomedical text
285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field
Análisis de sentimientos para textos cortos en español, una revisión del estado del arte
Actualmente las redes sociales son el medio
de comunicación más utilizado por los
usuarios en general. El análisis automático
de las opiniones y comentarios emitidos en
temas de interés como, por ejemplo:
ciencia, tecnología, política, etc., requieren
una revisión exhaustiva que gracias al
análisis de sentimientos se logra determinar
lo que los usuarios quieren expresar. En la
actualidad existen una gran cantidad de
herramientas que permiten realizar este
análisis de sentimientos para textos cortos.
Sin embargo, la mayoría se enfoca en el
idioma inglés.
Este artículo tiene como objetivo el estudio
de herramientas y Corpus utilizados para el
análisis de sentimientos de textos en
español mediante un estudio del estado del
arte. Entre los resultados más importantes
se encontró que el corpus más utilizado es
el TASS, Además se realizó una
comparación entre los métodos utilizados,
entre ellos se pueden destacar
SentiWordNet, Bayes, iSol, siendo el más
eficiente SVM.In general, social networks are currently
used by users for communication,
comments, and opinions. So, the automatic
analysis of them on topics of interest
requires an exhaustive review by machines.
With sentiment analysis, it is possible to
determine what users want to express.
Currently, many tools allow this sentiment
analysis to be carried out for short texts.
However, most of them are focused on the
English language.
This article aims to study the tools and
Corpus used to analyze sentiments of texts
in Spanish through state of art. Among the
most important results, it was found that the
most used Corpus is the TASS. In addition,
a comparison was made between the
methods used, including SentiWordNet,
Bayes, iSol, the most efficient being SVM