12 research outputs found

    Psycholinguistic Patterns Detection for Analyzing the Subjective Language in Spanish

    Get PDF
    Tesis doctoral titulada “Detección de patrones psicolingüísticos para el análisis de lenguaje subjetivo en español”, defendida por María del Pilar Salas Zárate en la Universidad de Murcia y elaborada bajo la dirección de los doctores Rafael Valencia García (Universidad de Murcia) y Miguel Ángel Rodríguez García (Universidad King Abdulah). La defensa tuvo lugar el 23 de mayo de 2017 ante el tribunal formado por los doctores Jesualdo Tomás Fernández Breis (Presidente, Universidad de Murcia), Alejandro Rodríguez González (Secretario, Universidad Politécnica de Madrid) y José Antonio Miñarro Giménez (Vocal, Medical University of Graz) y la tesis obtuvo la mención Cum Laude y Doctora Internacional.Ph.D. thesis entitled “Psycholinguistic patterns detection for analyzing the subjective language in Spanish” written by María del Pilar Salas Zárate at the University of Murcia under the supervision of the Ph.D. Rafael Valencia García (University of Murcia) and Ph.D. Miguel Ángel Rodríguez García (University). The viva voice was held on the 23rd may 2017 and the members of the commission were the Ph.D. Jesualdo Tomás Fernández Breis (President, University of Murcia), Ph.D. Alejandro Rodríguez González (Secretary, Polytechnic University of Madrid) and Ph.D. José Antonio Miñarro Giménez (Vocal, University of Graz) and the thesis obtained the mention Cum Laude and International Doctor

    Sentiment Analysis in Spanish for Improvement of Products and Services: A Deep Learning Approach

    Get PDF
    Sentiment analysis is an important area that allows knowing public opinion of the users about several aspects. This information helps organizations to know customer satisfaction. Social networks such as Twitter are important information channels because information in real time can be obtained and processed from them. In this sense, we propose a deep-learning-based approach that allows companies and organizations to detect opportunities for improving the quality of their products or services through sentiment analysis. This approach is based on convolutional neural network (CNN) and word2vec. To determine the effectiveness of this approach for classifying tweets, we conducted experiments with different sizes of a Twitter corpus composed of 100000 tweets. We obtained encouraging results with a precision of 88.7%, a recall of 88.7%, and an -measure of 88.7% considering the complete dataset.publishedVersio

    IXHEALTH: An advanced speech recognition system to interact with healthcare information systems

    Get PDF
    El objetivo del proyecto IXHEALTH es desarrollar una plataforma multilingüe basada en reconocimiento del habla que permita a profesionales de la salud llevar a cabo tareas tales como la redacción de informes médicos, así como interactuar con sistemas de información sanitarios mediante comandos de voz. Todo ello, bajo un mecanismo de seguridad basado en biometría de voz que evite que personas no autorizadas editen información sensible gestionada por este tipo de sistemas. Este proyecto ha sido desarrollado por la empresa VOCALI en conjunto con el grupo de investigación TECNOMOD de la Universidad de Murcia, y financiado por el Instituto de Fomento de la Región de Murcia.The IXHEALTH project aims to develop a multilingual platform based on speech recognition that allows healthcare professionals to perform transcription and dictation activities for the generation of medical reports, as well as to interact with healthcare information systems by means of voice commands. These tasks are performed through a biometric voice-based security mechanism that avoids non-allowed users to edit sensitive data managed by this kind of systems. This project has been developed by the VOCALI enterprise in conjunction with the TECNOMOD research group from the University of Murcia, and it has been founded by the Institute of Promotion from the Region of Murcia.Este trabajo ha sido financiado por el Instituto de fomento de la Región de Murcia (Ref. 2015.08.ID+I.0011

    Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach

    Get PDF
    In recent years, some methods of sentiment analysis have been developed for the health domain; however, the diabetes domain has not been explored yet. In addition, there is a lack of approaches that analyze the positive or negative orientation of each aspect contained in a document (a review, a piece of news, and a tweet, among others). Based on this understanding, we propose an aspect-level sentiment analysis method based on ontologies in the diabetes domain. The sentiment of the aspects is calculated by considering the words around the aspect which are obtained through N-gram methods (N-gram after, N-gram before, and N-gram around). To evaluate the effectiveness of our method, we obtained a corpus from Twitter, which has been manually labelled at aspect level as positive, negative, or neutral. The experimental results show that the best result was obtained through the N-gram around method with a precision of 81.93%, a recall of 81.13%, and an F-measure of 81.24%

    Detección de patrones psicolingüísticos para el análisis de lenguaje subjetivo en español

    No full text
    OBJETIVOS. La clasificación automática de opiniones requiere un esfuerzo multidisciplinario, donde la lingüística y el procesamiento del lenguaje natural juegan un rol importante. Un aspecto importante a considerar en la clasificación de opiniones es el lenguaje figurado tal como la ironía, el sarcasmo y la sátira, ya que el doble sentido expresado en una opinión o comentario puede invertir la polaridad de la opinión. El objetivo principal de esta tesis es la detección de patrones psicolingüísticos para el análisis de lenguaje subjetivo en español. Específicamente, se establecieron 4 objetivos específicos: 1) diseño de un método para la detección de patrones psicolingüísticos para el análisis de sentimientos; 2) diseño de un método para la detección de patrones psicolingüísticos para el análisis de textos satíricos y no satíricos; 3) validación del método para el análisis de sentimientos en diversos dominios como el turístico y películas; 4) validación del método para la detección automática de la sátira en el dominio de noticias. METODOLOGÍA. Para lograr este objetivo, primero se lleva a cabo un estudio del estado del arte que incluye tecnologías de procesamiento de lenguaje natural, análisis de sentimientos y lenguaje subjetivo. Específicamente, los diferentes niveles de procesamiento, principales enfoques del análisis de sentimientos, niveles de procesamiento de la opinión, bases de conocimiento, recursos lingüísticos disponibles y principales técnicas para la detección del lenguaje figurado. Posteriormente, se realiza el diseño e implementación de un método para el análisis de sentimientos y detección de la sátira basados en características psicolingüísticas. Finalmente, la propuesta se valida en diferentes dominios. Concretamente, el método de análisis de sentimientos se aplica al dominio turístico y de películas; y el método de detección de la sátira se aplica en el dominio de noticias en redes sociales. RESULTADOS. Como resultado se obtiene: • Un método para la clasificación de sentimientos y detección de la sátira. Este método permite clasificar opiniones como positivas, negativas, neutras, muy positivas y muy negativas y tweets como satíricos y no satíricos. • Un proceso para el pre-procesamiento de tweets en español. • Un corpus en el dominio del turismo. El corpus contiene 1600 opiniones sobre hoteles, restaurantes, museos, entre otros temas, las cuales son clasificadas con su respectiva polaridad (positivo, negativo, neutro, muy positivo, muy negativo). • Un corpus de tweets satíricos y no satíricos. Este corpus consiste en un conjunto de 10000 tweets etiquetados como satíricos y no satíricos extraídos desde diversas cuentas de Twitter. • Un conjunto de características psicolingüísticas para la clasificación de sentimientos y detección de la sátira. CONCLUSIONES. La clasificación automática de opiniones requiere un esfuerzo donde la lingüística y el procesamiento del lenguaje natural juegan un rol importante. Gracias a estas disciplinas fue posible entender de mejor manera el lenguaje humano, clasificar las opiniones y resumir los sentimientos expresados en textos. Por otro lado, el lenguaje figurado es uno de los temas más difíciles del PLN, ya que a diferencia del lenguaje literal, el escritor toma ventaja de diversas figuras lingüísticas tales como la metáfora, la analogía, la ambigüedad, entre otros, para proyectar significados más complejos. Este tipo de lenguaje es difícil de entender no sólo para las computadoras, sino también para el ser humano. Esta tesis describió́ un método para la detección de patrones psicolingüísticos para el análisis de sentimientos y la detección automática de la sátira. Las características psicolingüísticas, junto con técnicas de procesamiento de lenguaje natural y minería de datos, resultaron ser efectivas para la detección de sentimientos y de la sátira. Además, la validación de los métodos en diversos dominios ha demostrado la efectividad de nuestro enfoque para clasificar opiniones y tweets. AIMS OF THE THESIS. The linguistic and natural language processing play an important role in the automatic classification of opinions. Furthermore, the figurative language is an important aspect to be considered in sentiment analysis, because of the double meaning expressed in the opinion can reverse the polarity of an opinion. The main goal of this thesis is to detect psycholinguistic patterns for the analysis of subjective language in Spanish. Four specific aims are established: 1) design of a method for detecting psycholinguistic patterns for sentiment analysis; 2) design of a method for detecting psycholinguistic patterns for the analysis of satirical texts; 3) validation of the method for sentiment analysis in different contexts, namely, tourism and movies domains; 4) validation of the method for automatic detection of satire in the news domain. METHODOLOGY. The methodology proposed is based on the analysis of the state of the art. This analysis includes technologies such as natural language processing, sentiment analysis, and subjective language. Furthermore, this task involves the analysis of the different levels of natural language processing, sentiment analysis approaches, levels of processing of opinions, knowledge bases, available linguistic resources, and main techniques for the detection of figurative language. Subsequently, a psycholinguistic features-based method for the sentiment analysis and detection of satire is designed and implemented. Finally, the proposal is validated in different domains. Specifically, the method of sentiment analysis is applied to the tourist and movies domain, and the method of satire detection is applied in the news domain in social networks. RESULTS. The main contributions of this work are: • A method for sentiment analysis and detection of satire. This method classifies opinions as positive, negative, neutral, very positive and very negative; and tweets as satirical and non-satirical. • A process for the pre-processing of tweets in Spanish. • A corpus in the tourism domain. The corpus contains 1600 reviews about hotels, restaurants, museums, among other topics, which are classified with their respective polarity (positive, negative, neutral, very positive, very negative). • A corpus of satirical and non-satirical tweets. This corpus consists of 10000 tweets tagged as satirical and non-satirical. These tweets were extracted from different Twitter accounts. • A set of psycholinguistic features for the sentiment analysis and detection of satire. CONCLUSIONS. The automatic classification of opinions requires a multidisciplinary approach where linguist and natural language processing need to be involved. Theses disciplines allow understanding the human language, classify opinions and summarize the sentiment expressed about a product, and other aspects. However, the figurative language expressed in some texts uses linguistic figures such as metaphor, analogy, and ambiguity, among others. This fact makes difficult to understand this kind of language, not only for computers but also by humans. This thesis described a method for the detection of psycholinguistic patterns for sentiment analysis and the automatic detection of satire. The psycholinguistic features, in conjunction with natural language processing and data mining technologies, demonstrated to be effective for the detection of sentiments and satire. In addition, the validation of the method in different domains verified its effectiveness for the classification of opinions and tweets

    Detecting Depression Signs on Social Media: A Systematic Literature Review

    No full text
    Among mental health diseases, depression is one of the most severe, as it often leads to suicide; due to this, it is important to identify and summarize existing evidence concerning depression sign detection research on social media using the data provided by users. This review examines aspects of primary studies exploring depression detection from social media submissions (from 2016 to mid-2021). The search for primary studies was conducted in five digital libraries: ACM Digital Library, IEEE Xplore Digital Library, SpringerLink, Science Direct, and PubMed, as well as on the search engine Google Scholar to broaden the results. Extracting and synthesizing the data from each paper was the main activity of this work. Thirty-four primary studies were analyzed and evaluated. Twitter was the most studied social media for depression sign detection. Word embedding was the most prominent linguistic feature extraction method. Support vector machine (SVM) was the most used machine-learning algorithm. Similarly, the most popular computing tool was from Python libraries. Finally, cross-validation (CV) was the most common statistical analysis method used to evaluate the results obtained. Using social media along with computing tools and classification methods contributes to current efforts in public healthcare to detect signs of depression from sources close to patients

    Internet of Things-Driven Data Mining for Smart Crop Production Prediction in the Peasant Farming Domain

    No full text
    Internet of Things (IoT) technologies can greatly benefit from machine-learning techniques and artificial neural networks for data mining and vice versa. In the agricultural field, this convergence could result in the development of smart farming systems suitable for use as decision support systems by peasant farmers. This work presents the design of a smart farming system for crop production, which is based on low-cost IoT sensors and popular data storage services and data analytics services on the cloud. Moreover, a new data-mining method exploiting climate data along with crop-production data is proposed for the prediction of production volume from heterogeneous data sources. This method was initially validated using traditional machine-learning techniques and open historical data of the northeast region of the state of Puebla, Mexico, which were collected from data sources from the National Water Commission and the Agri-food Information Service of the Mexican Government

    EduRP: an Educational Resources Platform based on Opinion Mining and Semantic Web

    No full text
    Educational platforms have become important tools for e-learning; nonetheless, finding the appropriate educational resources to use often represents a tedious task for learners. Opinions in the educational domain are important information for decision making; they allow teachers to improve the teaching process and enable students to decide on the best educational resources. The large amount of data that is daily generated on the Web makes it difficult, however, to analyze opinions manually. Multiple opinion mining approaches are being proposed as a solution to this problem; this research work introduces EduRP, an education platform that integrates opinion mining techniques and ontology-based user profiling techniques. We specifically propose an opinion mining approach for Spanish text which consists of three main steps: 1) collect opinions from the EduRP platform, 2) process the opinions to normalize the text, and 3) obtain the polarity of the opinions using a machine learning approach. We also propose a profile customization approach that uses Semantic Web technologies, specifically ontologies, to integrate socio-demographic data from different social networks and from the platform itself. Finally, we assess the performance of our system under precision, recall, and F-measure metrics, obtaining average values of 81.85%, 81.80% and 81.54, respectively

    AgriEnt: A Knowledge-Based Web Platform for Managing Insect Pests of Field Crops

    No full text
    In the agricultural context, there is a great diversity of insects and diseases that affect crops. Moreover, the amount of data available on data sources such as the Web regarding these topics increase every day. This fact can represent a problem when farmers want to make decisions based on this large and dynamic amount of information. This work presents AgriEnt, a knowledge-based Web platform focused on supporting farmers in the decision-making process concerning crop insect pest diagnosis and management. AgriEnt relies on a layered functional architecture comprising four layers: the data layer, the semantic layer, the web services layer, and the presentation layer. This platform takes advantage of ontologies to formally and explicitly describe agricultural entomology experts’ knowledge and to perform insect pest diagnosis. Finally, to validate the AgriEnt platform, we describe a case study on diagnosing the insect pest affecting a crop. The results show that AgriEnt, through the use of the ontology, has proven to produce similar answers as the professional advice given by the entomology experts involved in the evaluation process. Therefore, this platform can guide farmers to make better decisions concerning crop insect pest diagnosis and management
    corecore