3,240 research outputs found

    Sentiment analysis on twitter for the portuguese language

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaWith the growth and popularity of the internet and more specifically of social networks, users can more easily share their thoughts, insights and experiences with others. Messages shared via social networks provide useful information for several applications, such as monitoring specific targets for sentiment or comparing the public sentiment on several targets, avoiding the traditional marketing research method with the use of surveys to explicitly get the public opinion. To extract information from the large amounts of messages that are shared, it is best to use an automated program to process these messages. Sentiment analysis is an automated process to determine the sentiment expressed in natural language in text. Sentiment is a broad term, but here we are focussed in opinions and emotions that are expressed in text. Nowadays, out of the existing social network websites, Twitter is considered the best one for this kind of analysis. Twitter allows users to share their opinion on several topics and entities, by means of short messages. The messages may be malformed and contain spelling errors, therefore some treatment of the text may be necessary before the analysis, such as spell checks. To know what the message is focusing on it is necessary to find these entities on the text such as people, locations, organizations, products, etc. and then analyse the rest of the text and obtain what is said about that specific entity. With the analysis of several messages, we can have a general idea on what the public thinks regarding many different entities. It is our goal to extract as much information concerning different entities from tweets in the Portuguese language. Here it is shown different techniques that may be used as well as examples and results on state-of-the-art related work. Using a semantic approach, from these messages we were able to find and extract named entities and assigning sentiment values for each found entity, producing a complete tool competitive with existing solutions. The sentiment classification and assigning to entities is based on the grammatical construction of the message. These results are then used to be viewed by the user in real time or stored to be viewed latter. This analysis provides ways to view and compare the public sentiment regarding these entities, showing the favourite brands, companies and people, as well as showing the growth of the sentiment over time

    Resumen de la Tarea de Análisis de Sentimientos Basado en Aspectos en Portugués (ABSAPT) en IberLEF 2022

    Get PDF
    Este artículo presenta la Tarea sobre Análisis de Sentimientos basado en Aspectos en Portugués (ABSAPT), realizada en el IberLEF 2022. Les pedimos a los participantes que desarrollaran sistemas capaces de identificar aspectos (AE) y extraer la polaridad (ASC) en textos escritos en portugués. Doce equipos se inscribieron en la tarea, entre los cuales cinco presentaron predicciones e informes técnicos. El sistema con mejor rendimiento logró un valor de precisión (Acc) de 0,67 para la subtarea de AE (Equipo Deep Learning Brasil) y un valor de precisión equilibrada (Bacc) de 0,82 para la subtarea de ASC (Equipo Deep Learning Brasil).This paper presents the task on Aspect-Based Sentiment Analysis in Portuguese (ABSAPT), held within Iberian Languages Evaluation Forum (IberLEF 2022). We asked the participants to develop systems capable of extracting aspects (AE) and classifying sentiment of aspects (ASC) in texts. We created one corpora containing reviews about hotels. Twelve teams registered to the task, among which five submitted predictions and technical reports. The best performing system achieved an Accuracy (Acc) value of 0.67 in AE sub-task (Team Deep Learning Brasil) and a Balanced Accuracy (Bacc) value of 0.82 in ASC sub-task (Team Deep Learning Brasil).This work was financed in part by the following Brazilian research agencies: CAPES and CNPq

    Terminology Extraction for and from Communications in Multi-disciplinary Domains

    Get PDF
    Terminology extraction generally refers to methods and systems for identifying term candidates in a uni-disciplinary and uni-lingual environment such as engineering, medical, physical and geological sciences, or administration, business and leisure. However, as human enterprises get more and more complex, it has become increasingly important for teams in one discipline to collaborate with others from not only a non-cognate discipline but also speaking a different language. Disaster mitigation and recovery, and conflict resolution are amongst the areas where there is a requirement to use standardised multilingual terminology for communication. This paper presents a feasibility study conducted to build terminology (and ontology) in the domain of disaster management and is part of the broader work conducted for the EU project Sland \ub4 ail (FP7 607691). We have evaluated CiCui (for Chinese name \ub4 \u8bcd\u8403, which translates to words gathered), a corpus-based text analytic system that combine frequency, collocation and linguistic analyses to extract candidates terminologies from corpora comprised of domain texts from diverse sources. CiCui was assessed against four terminology extraction systems and the initial results show that it has an above average precision in extracting terms

    Rating prediction on yelp academic dataset using paragraph vectors

    Get PDF
    This work studies the application of Paragraph Vectors to the Yelp Academic Dataset reviews in order to predict user ratings for different categories of businesses like auto repair, restaurants or veterinarians. Paragraph Vectors is a word embeddings techniques were each word or piece of text is converted to a continuous low dimensional space. Then, the opinion mining or sentiment analysis is observed as a classification task, where each user review is associated with a label the rating - and a probabilistic model is built with a logistic classifier. Following the intuition that the semantic information present in textual user reviews is generally more complex and complete than the numeric rating itself, this work applies Paragraph Vectors successfully toYelp dataset and evaluates its results.info:eu-repo/semantics/acceptedVersio

    Is the polarity of content producers strongly influenced by the results of the event?

    Full text link
    This paper presents an approach to compare two types of data, subjective data (Polarity of Pan American Games 2011 event by country) and objective data (the number of medals won by each participating country), based on the Pearson corre- lation. When dealing with events described by people, knowledge acquisition is difficult because their structure is heterogeneous and subjective. A first step towards knowing the polarity of the information provided by people consists in automatically classifying the posts into clusters according to their polarity. The authors carried out a set of experiments using a corpus that consists of 5600 posts extracted from 168 Internet resources related to a specific event: the 2011 Pan American games. The approach is based on four components: a crawler, a filter, a synthesizer and a polarity analyzer. The PanAmerican approach automatically classifies the polarity of the event into clusters with the following results: 588 positive, 336 neutral, and 76 negative. Our work found out that the polarity of the content produced was strongly influenced by the results of the event with a correlation of .74. Thus, it is possible to conclude that the polarity of content is strongly affected by the results of the event. Finally, the accuracy of the PanAmerican approach is: .87, .90, and .80 according to the precision of the three classes of polarity evaluated

    Extração de informação aplicada a comentários da área do turismo

    Get PDF
    Motivation: The primary motivation of this dissertation was to show that it is possible to construct an NLP solution for the Portuguese language capable of helping in the hotel industry. Objective(s): The main objective of this dissertation was to extract useful information from hotel commentaries using NLP. Method: An NLP pipeline was created to extract useful information, and then sentimental analyse was used to characterise that information. Results: After processing all the commentaries of a hotel was possible to extract what people like or dislike about it. Conclusions: The two main conclusions were that is possible to create a Portuguese NLP pipeline for the hotel industry, and that is possible to extract useful information from thousands of commentaries.Motivação: A principal motivação por trás desta tese foi mostrar que é possível escrever um programa para NLP usando a língua portuguesa. Objetivo(s): O principal objetivo desta tese foi extrair informação hotel dos comentários feitos a hotéis usando NLP. Método: Foi criado um pipeline de NLP para extrair informação útil. Depois foi usado análise de sentimentos para caracterizar essa informação. Resultados: Depois de todos os comentários serem processados foi possível descobrir o que as pessoas gostam ou desgostam sobre um hotel. Conclusões: As duas principais conclusões foram que era possível fazer NLP em português e que era possível extrair informação útil de milhar de comentários.Mestrado em Engenharia Eletrónica e Telecomunicaçõe

    Framework for collaborative knowledge management in organizations

    Get PDF
    Nowadays organizations have been pushed to speed up the rate of industrial transformation to high value products and services. The capability to agilely respond to new market demands became a strategic pillar for innovation, and knowledge management could support organizations to achieve that goal. However, current knowledge management approaches tend to be over complex or too academic, with interfaces difficult to manage, even more if cooperative handling is required. Nevertheless, in an ideal framework, both tacit and explicit knowledge management should be addressed to achieve knowledge handling with precise and semantically meaningful definitions. Moreover, with the increase of Internet usage, the amount of available information explodes. It leads to the observed progress in the creation of mechanisms to retrieve useful knowledge from the huge existent amount of information sources. However, a same knowledge representation of a thing could mean differently to different people and applications. Contributing towards this direction, this thesis proposes a framework capable of gathering the knowledge held by domain experts and domain sources through a knowledge management system and transform it into explicit ontologies. This enables to build tools with advanced reasoning capacities with the aim to support enterprises decision-making processes. The author also intends to address the problem of knowledge transference within an among organizations. This will be done through a module (part of the proposed framework) for domain’s lexicon establishment which purpose is to represent and unify the understanding of the domain’s used semantic

    Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach

    Get PDF
    In recent years, some methods of sentiment analysis have been developed for the health domain; however, the diabetes domain has not been explored yet. In addition, there is a lack of approaches that analyze the positive or negative orientation of each aspect contained in a document (a review, a piece of news, and a tweet, among others). Based on this understanding, we propose an aspect-level sentiment analysis method based on ontologies in the diabetes domain. The sentiment of the aspects is calculated by considering the words around the aspect which are obtained through N-gram methods (N-gram after, N-gram before, and N-gram around). To evaluate the effectiveness of our method, we obtained a corpus from Twitter, which has been manually labelled at aspect level as positive, negative, or neutral. The experimental results show that the best result was obtained through the N-gram around method with a precision of 81.93%, a recall of 81.13%, and an F-measure of 81.24%
    corecore