47 research outputs found
CRiSOL: Opinion Knowledge-base for Spanish
El presente trabajo se centra en la clasificación de polaridad de comentarios de hoteles en español (COAH) y presenta un nuevo recurso léxico, CRiSOL. Este nuevo recurso toma como base la lista de palabras de opinión iSOL, a la cual incluye los valores de polaridad de los synsets de SentiWordNet. Debido a que SentiWordNet no es un recurso para español, se ha tenido que usar como pivote la versión española de WordNet incluida en el Repositorio Central Multilingüe (MCR). Se ha desarrollado un clasificador de la polaridad no supervisada para evaluar la validez de CRiSOL. Los resultados obtenidos con CRiSOL superan los obtenidos por los lexicones base iSOL y SentiWordNet por separado, lo cual nos anima a seguir trabajando en esta línea.In this paper we focus on Spanish polarity classification in a corpus of hotel reviews (COAH) and we introduce a new lexical resource called CRiSOL. This new resource is built on the list of Spanish opinion words iSOL. CRiSOL appends to each word of iSOL the polarity value of the related synset of SentiWordNet. Due to the fact that SentiWordNet is not a Spanish linguistic resource, a Spanish version of WordNet had to be used. The Spanish version of WordNet chosen was Multilingual Central Repository (MCR). An unsupervised classifier has been developed with the aim of assessing the validity of CRiSOL. The results reached by CRiSOL are higher than the ones reached by iSOL and SentiWordNet, so that encourage us to continue this research line.Esta investigación ha sido parcialmente financiada por el Fondo Europeo de Desarrollo Regional (FEDER), el proyecto ATTOS (TIN2012-38536-C03-0) del Gobierno de España y el proyecto AORESCU (P11-TIC-7684 MO) del gobierno autonómico de la Junta de Andalucía. Por último, el proyecto CEATIC (CEATIC-2013-01) de la Universidad de Jaén también ha financiado parcialmente este artículo
Integrating Spanish lexical resources by meta-classifiers for polarity classification
In this paper we focus on unsupervised sentiment analysis in Spanish. The lack of resources for languages other than English, as for example Spanish, adds more complexity to the task. However, we take advantage of some good already existing lexical resources. We have carried out several experiments using different unsupervised approaches in order to compare the different methodologies for solving the problem of the Spanish polarity classification in a corpus of movie reviews. Among all these approaches, perhaps the newest one integrates SentiWordNet with the Multilingual Central Repository to tackle polarity detection directly over the Spanish corpus. However, the results obtained were not as promising as we expected, and so we carried out another group of experiments combining all the methods using meta-classifiers. The results obtained with stacking outperformed the individual experiments and encourage us to continue in this way
Language technologies applied to document simplification for helping autistic people
People affected by Autism Spectrum Disorders (ASD) have impairments in social interaction because they lack an adequate theory of mind. A significant percentile has inadequate reading comprehension skills. We present a multilingual tool called Open Book (OB) that applies Human Language Technologies (HLT) in order to identify reading comprehension obstacles in text documents and propose more simple alternatives with the aim of assisting the reading comprehension of users. OB involves several text transformations at lexical, syntactic and semantic level. In this paper we focus on three challenging components of the OB tool: the image retrieval component, the idiom detection component and the summarization module. There are very few studies that involve simplification by showing images associated to difficult concepts. In addition, the treatment of figurative language such as idioms or metaphors is one of the most challenging areas in Natural Language Processing (NLP). Finally, although text summarization is a more widely studied field in NLP, its application to text simplification remains as an open research issue. Thus, we focus on the integration of these three modules in our OB tool. We present the motivation for building these components and we describe how they are integrated in the whole system. Moreover, the usability and the usefulness of OB have been evaluated and analysed showing that the tool helps to produce texts easier to understand for autistic people
Negation Scope Identification in Spanish Reviews
El análisis de opiniones es una tarea a la que le quedan muchos frentes abiertos aún para que se pueda considerar resuelta. Entre ellos destaca el tratamiento de la negación, dado que una opinión negativa puede ser expresada con términos positivos negados. La negación es una característica particular de cada idioma, por lo que su tratamiento debe ajustarse a las singularidades del idioma en cuestión. En este artículo se presenta una aproximación lingüística para la identificación del ámbito de la negación en español, que se ha aplicado en un sistema de clasificación de la polaridad de opiniones sobre películas de cine.Sentiment Analysis is a task that still has several opened challenges. One of those challenges is the treatment of the negation, because a negative opinion can be built using negated positive words. Negation is a particular feature of each language, thus it must be considered differently per each language. In this article is shown a linguistic approach for the negation scope identification with the aim of integrating it in a polarity classification system in the domain of movie reviews.Este trabajo ha sido parcialmente financiado por el Fondo Europeo de Desarrollo Regional (FEDER), el proyecto ATTOS (TIN2012-38536-C03-0) del Gobierno de España, el proyecto AORESCU (TIC-07684) del Gobierno regional de la Junta de Andalucía y el proyecto CEATIC-2013-01 de la Universidad de Jaén
A Spanish Semantic Orientation Approach to Domain Adaptation for Polarity Classification
One of the problems of opinion mining is the domain adaptation of the sentiment classifiers. There are several approaches to tackling this problem. One of these is the integration of a list of opinion bearing words for the specific domain. This paper presents the generation of several resources for domain adaptation to polarity detection. On the other hand, the lack of resources in languages different from English has orientated our work towards developing sentiment lexicons for polarity classifiers in Spanish. The results show the validity of the new sentiment lexicons, which can be used as part of a polarity classifier
Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches
Sentiment polarity detection is one of the most popular tasks related to Opinion Mining. Many papers have been presented describing one of the two main approaches used to solve this problem. On the one hand, a supervised methodology uses machine learning algorithms when training data exist. On the other hand, an unsupervised method based on a semantic orientation is applied when linguistic resources are available. However, few studies combine the two approaches. In this paper we propose the use of meta-classifiers that combine supervised and unsupervised learning in order to develop a polarity classification system. We have used a Spanish corpus of film reviews along with its parallel corpus translated into English. Firstly, we generate two individual models using these two corpora and applying machine learning algorithms. Secondly, we integrate SentiWordNet into the English corpus, generating a new unsupervised model. Finally, the three systems are combined using a meta-classifier that allows us to apply several combination algorithms such as voting system or stacking. The results obtained outperform those obtained using the systems individually and show that this approach could be considered a good strategy for polarity classification when we work with parallel corpora
eSOLHotel: Building an Spanish opinion lexicon adapted to the tourism domain
Desde que la web 2.0 es el mayor contenedor de opiniones en todos los idiomas sobre distintos temas o asuntos, el estudio del Análisis de Sentimientos ha crecido exponencialmente. En este trabajo nos centramos en la clasificación de polaridad de opiniones en español y se presenta un nuevo recurso léxico adaptado al dominio turístico (eSOLHotel). Este nuevo lexicón usa el enfoque basado en corpus. Se han realizado varios experimentos usando una aproximación no supervisada para la clasificación de polaridad de las opiniones en la categoría de hoteles del corpus SFU. Los resultados obtenidos con el nuevo lexicón eSOLHotel superan los resultados obtenidos con otro lexicón de propósito general y nos animan a seguir trabajando en esta línea.Since Web 2.0 is the largest container for subjective expressions about different topics or issues expressed in all languages, the study of Sentiment Analysis has grown exponentially. In this work, we focus on Spanish polarity classification of hotel reviews and a new domain-dependent lexical resource (eSOLHotel) is presented. This new lexicon has been compiled following a corpus-based approach. We have carried out several experiments using an unsupervised approach for the polarity classification over the category of hotels from corpus SFU. The results obtained with the new lexicon eSOLHotel outperform the results with other general purpose lexicon.Esta investigación ha sido parcialmente financiada por el Fondo Europeo de Desarrollo Regional (FEDER), el proyecto ATTOS (TIN2012-38536-C03-0) del Gobierno de España y el proyecto AORESCU (P11-TIC-7684 MO) del gobierno autonómico de la Junta de Andalucía. Por último, el proyecto CEATIC (CEATIC-2013-01) de la Universidad de Jaén también ha financiado parcialmente este artículo
Studying the Scope of Negation for Spanish Sentiment Analysis on Twitter
Polarity classification is a well-known Sentiment Analysis task. However, most research has been oriented towards developing supervised or unsupervised systems without paying much attention to certain linguistic phenomena such as negation. In this paper we focus on this specific issue in order to demonstrate that dealing with negation can improve the final system. Although we can find some studies of negation detection, most of them deal with English documents. On the contrary, our study is focused on the scope of negation in Spanish Sentiment Analysis. Thus, we have built an unsupervised polarity classification system based on integrating external knowledge. In order to evaluate the influence of negation we have implemented a specific module for negation detection by applying several rules. The system has been tested considering and without considering negation, using a corpus of tweets written in Spanish. The results obtained reveal that the treatment of negation can greatly improve the accuracy of the final system. Moreover, we have carried out a comprehensive statistical study in order to demonstrate our approach. To the best of our knowledge, this is the first work which statistically demonstrates that taking into account negation significantly improves the polarity classification of Spanish tweets
COPOS: Corpus de Opiniones de Pacientes en Español. Aplicación de Técnicas de Análisis de Sentimientos
Every day more users are interested in the opinion that other patients have about a physician or about health topics in general. According to a study in 2015, 62% of Spanish people access the Internet in order to be informed about topics related to health. This paper is focused on Spanish Sentiment Analysis in the medical domain. Although Sentiment Analysis has been studied for different domains, health issues have hardly been examined in Opinion Mining and even less with Spanish comments or opinions. Thus we have generated a corpus by crawling the website Masquemedicos with Spanish opinions about medical entities written by patients. We present this new resource, called COPOS (Corpus Of Patient Opinions in Spanish). To the best of our knowledge, this is the first attempt to deal with Spanish opinions written by patients about medical attention. In order to demonstrate the validity of the corpus presented, we have also carried out different experiments with the main methodologies applied in polarity classification (Semantic Orientation and Machine Learning). The results obtained encourage us to continue analysing and researching Opinion Mining in the medical domain.Cada día son más los usuarios interesados en la opinión que otros pacientes tienen sobre un médico o sobre temas de salud en general. De acuerdo con un estudio de 2015, el 62% de la población española consulta información en Internet acerca de temas relacionados con la salud. Este trabajo está centrado en el Análisis de Sentimientos en español aplicado al dominio médico. Aunque el Análisis de Sentimientos ha sido estudiado en diferentes dominios, el dominio de la salud apenas ha sido investigado, especialmente en opiniones escritas en español. Por ello, hemos generado un corpus en español con opiniones de pacientes sobre médicos a partir de la extracción de las mismas del portal web Masquemedicos. Este corpus ha sido denominado COPOS (Corpus Of Patient Opinions in Spanish - Corpus de Opiniones de Pacientes en Español). Hasta donde sabemos, es la primera vez que se intenta trabajar con opiniones en español sobre atención médica escritas por pacientes. Para demostrar la validez de este recurso, hemos realizado diferentes experimentos con las principales metodologías aplicadas en la tarea de clasificación de polaridad (Orientación Semántica y Aprendizaje Automático). Los resultados obtenidos nos animan a seguir investigando en el Análisis de Sentimientos en este dominio.This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), REDES project (TIN2015-65136-C2-1-R) from the Spanish Government and by a Grant from the Ministerio de Educación Cultura y Deporte (MECD - scholarship FPU014/00983)
Negation in Spanish: analysis and typology of negation patterns
En este artículo se presentan los criterios aplicados para la anotación del corpus SFU ReviewSP-NEGcon negación y la tipología lingüística correspondiente. Esta tipología presenta la ventaja de ser fácilmente expresable en términos de un tagset para la anotación de corpus, de presentar tipos claramente delimitados, evitando así la ambigüedad en el proceso de anotación, y de presentar una amplia cobertura, es decir, que ha servido para resolver todos los casos que han aparecido. El corpus contiene 400 comentarios y 198.551 palabras. Actualmente está anotado en un 75% y, de un total de 6.331 oraciones revisadas, se han identificado 2.953 estructuras de negación.In this paper we present the criteria applied for the annotation of the SFU ReviewSP-NEGcorpus and the corresponding linguistic typology. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoid the ambiguity in the annotation process, and they present a wide coverage (i.e. they covered/solved all the cases occurring in the corpus). The corpus consists of 400 reviews and 198,551 words. Currently, we have annotated 75% and from a total of 6,331 annotated sentences 2,953 contain at least one negation.Financiado por fondos FEDER, los proyectos: TIN2015-65136-C2-1-R y TIN2015-71147-C2-2 del MINECO y FPU014/00983 del MECD