532 research outputs found

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Structured sentiment analysis in social media

    Get PDF

    Applying transfer learning to sentiment analysis in social media

    Get PDF
    Context: Sentiment analysis is an NLP technique that can be used to automatically obtain the sentiment of a crowd of end-users regarding a software application. However, applying sentiment analysis is a difficult task, especially considering the need of obtaining enough good quality data for training a Machine Learning (ML) model. To address this challenge, transfer learning can help us save time and get better performance results with a limited amount of data. Objective: In this paper, we aim at identifying to which degree transfer learning improves the results of sentiment analysis of messages shared by end-users in social media. Method: We propose a tool-supported framework able to monitor and analyze the sentiment of tweets with different ML models and settings. Using the proposed framework, we apply transfer learning and conduct a set of experiments with multiple datasets. Results: The performance of different ML models with transfer learning from different datasets are obtained and discussed, showing how different factors affect the results, and discussing how they have to be considered when applying transfer learning.This work has been partially supported by the Spanish project DOGO4ML (contract PID2020-117191RB-I00).Peer ReviewedPostprint (author's final draft

    Reliable Sentiment Analysis in Social Media

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Sentiment analysis in social media is critical yet challenging because the source materials (i.e., reviews posted in social media) are with high complexity, low quality, and uncertain credibility. For example, words and sentences in a textual review may couple with each other, and they may have heterogeneous meanings under different contexts or in different language locales. These couplings and heterogeneities essentially determine the sentiment polarity of the review but are too complex to be captured and modeled. Also, social reviews contain a large number of informal words and typos (a.k.a., noise) but a rare number of vocabularies (a.k.a., sparsity). As a result, most of the existing natural language processing (NLP) methods may fail to represent social reviews effectively. Furthermore, a large proportion of social reviews are posted by fraudsters. These fraud reviews manipulate social opinion, and thus, they disturb sentiment analysis. This research focuses on reliable sentiment analysis in social media. It systematically investigates the sentiment analysis techniques to tackle three major challenges in social media: high data complexity, low data quality, and uncertain credibility. Specifically, this research focuses on two research problems: general sentiment analysis in social media and fraudulent sentiment analysis in social media. The general sentiment analysis targets on tackling high data complexity and low-quality of social articles that are credible. The fraudulent sentiment analysis handles the uncertain credibility issue, which is common and profoundly affects the precise sentiment analysis in social media. Based on these investigations, this research proposes a serial of methods to achieve reliable sentiment analysis: It studies the polarity-shift characteristics and non-IID characteristics in general paragraphs to capture the sentiment more accurately. It further models multi-granularity noise and sparsity in short text, which is the most common data in social media, for robust short text sentiment analysis. Finally, it tackles the uncertain credibility problem in social media by studying fraudulent sentiment analysis in both supervised and unsupervised scenarios. This research evaluates the performance and properties of the proposed reliable sentiment analysis methods by extensive experiments on large real-world data sets. It demonstrates that the proposed methods are superior and reliable in social media sentiment analysis

    Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

    Full text link
    A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different processes are required to prepare the data before the implementation of the algorithms. For example, different misspellings of same word enlarge the word vector space unnecessarily, thereby it leads to reduce the success of the algorithm and increase the computational power requirement. This paper presents an improved Turkish dataset with an effective spelling correction algorithm based on Hadoop [2]. The collected data is recorded on the Hadoop Distributed File System and the text based data is processed by MapReduce programming model. This method is suitable for the storage and processing of large sized text based social media data. In this study, movie reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and data clusters have been created. Various methods compared such as Levenshtein and Fuzzy String Matching have been proposed to create a public dataset from collected data. Experimental results show that the proposed algorithm, which can be used as an open source dataset in sentiment analysis studies, have been performed successfully to the detection and correction of spelling errors.Comment: Presented at CMES201

    Sentiment Analysis in Social Media Platforms: The Contribution of Social Relationships

    Get PDF
    The massive amount of data in social media platforms is a key source for companies to analyze customer sentiment and opinions. Many existing sentiment analysis approaches solely rely on textual contents of a sentence (e.g. words) for sentiment identification. Consequently, current sentiment analysis systems are ineffective for analyzing contents in social media because people may use non-standard language (e.g., abbreviations, misspellings, emoticons or multiple languages) in online platforms. Inspired by the attribution theory that is grounded in social psychology, we propose a sentiment analysis framework that considers the social relationships among users and contents. We conduct experiments to compare the proposed approach against the existing approaches on a dataset collected from Facebook. The results indicate that we can more accurately classify sentiment of sentences by utilizing social relationships

    Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts

    Get PDF
    Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology. In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data. Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.JRC.G.2-Global security and crisis managemen

    Exploring Sentiment Analysis in Social Media: A Natural Language Processing Case Study

    Get PDF
    Social media plays an integral role in our daily lives, influencing and reflecting global perspectives through the consumption and creation of content. Platforms like YouTube are incredibly active, with a constant influx of video uploads, views, and comments. While the YouTube app allows us to browse videos and comments, it offers only a limited glimpse into the interests and trends of others. Analysing this vast data pool, encompassing diverse language styles, presents a significant challenge. This article delves into the YouTube Data API and its application in Python for accessing raw data. The process involves data cleaning using advanced Natural Language Processing (NLP) techniques, harnessing Python-based machine learning to explore social media interactions, and automating the extraction of trends and influential factors. The journey towards trend analysis is meticulously detailed, featuring examples that leverage a variety of open-source Python tools
    • …
    corecore