158 research outputs found

    Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

    Full text link
    Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive emoji uses in sentiment analysis remains light. In this paper, we propose a novel scheme for Twitter sentiment analysis with extra attention on emojis. We first learn bi-sense emoji embeddings under positive and negative sentimental tweets individually, and then train a sentiment classifier by attending on these bi-sense emoji embeddings with an attention-based long short-term memory network (LSTM). Our experiments show that the bi-sense embedding is effective for extracting sentiment-aware embeddings of emojis and outperforms the state-of-the-art models. We also visualize the attentions to show that the bi-sense emoji embedding provides better guidance on the attention mechanism to obtain a more robust understanding of the semantics and sentiments

    Klasifikasi Respons Terhadap Vaksinasi Covid-19 Berdasarkan Tweets Menggunakan Attention-Based Long Short Term Memory

    Get PDF
    Media sosial memudahkan masyarakat dalam mendapatkan informasi dan menuangkan pendapat, saran atau kritiknya dalam peristiwa tertentu. Vaksinasi virus COVID-19 di Indonesia yang sedang hangat diperbicangkan dan mendapatkan beragam respons dari masyarakat baik pro maupun kontra, dapat dimanfaatkan untuk melakukan analisis terhadap respons tersebut. Untuk mendukung analisis tersebut, dilakukan klasifikasi respons dari masyarakat Indonesia terhadap vaksinasi COVID-19 menjadi tiga kelas yaitu negatif, netral, dan positif. Untuk proses klasifikasi respons tersebut, diimplementasikan metode Attentional-based Long Short Term Memory atau A-LSTM. Disisi lain, penelitian ini juga mengimplementasikan Bidirectional Encoder Representation Transformer (BERT) sebagai metode pada proses tokenisasi untuk memperoleh representasi fitur dari data Tweet sehingga membantu proses pelatihan A-LSTM. Proses evaluasi dilakukan dengan menggunakan dataset Tweets Bahasa Indonesia dari media sosial Twitter dimulai dari diangkatnya isu vaksinasi COVID-19 di Indonesia. Hasil dari metode ini menunjukkan kinerja yang baik dengan nilai akurasi sebesar 82%

    A Survey of Sentiment Analysis and Sarcasm Detection: Challenges, Techniques, and Trends

    Get PDF
    In recent years, more people have been using the internet and social media to express their opinions on various subjects, such as institutions, services, or specific ideas. This increase highlights the importance of developing automated tools for accurate sentiment analysis. Moreover, addressing sarcasm in text is crucial, as it can significantly impact the efficacy of sentiment analysis models. This paper aims to provide a comprehensive overview of the conducted research on sentiment analysis and sarcasm detection, focusing on the time from 2018 to 2023. It explores the challenges faced and the methods used to address them. It conducts a comparison of these methods. It also aims to identify emerging trends that will likely influence the future of sentiment analysis and sarcasm detection, ensuring their continued effectiveness. This paper enhances the existing knowledge by offering a comprehensive analysis of 40 research works, evaluating performance, addressing multilingual challenges, and highlighting future trends in sarcasm detection and sentiment analysis. It is a valuable resource for researchers and experts interested in the field, facilitating further advancements in sentiment analysis techniques and applications. It categorizes sentiment analysis methods into ML, lexical, and hybrid approaches, highlighting deep learning, especially Recurrent Neural Networks (RNNs), for effective textual classification with labeled or unlabeled data

    Understanding Emojis for Financial Sentiment Analysis

    Get PDF
    Social media content has been widely used for financial forecasting and sentiment analysis. However, emojis as a new “lingua franca” on social media are often omitted during standard data pre-processing processes, we thus speculate that they may carry additional useful information. In this research, we study the effect of emojis in facilitating financial sentiment analysis and explore the most effective way to handle them during model training. Experiments are conducted on two datasets from stock and crypto markets. Various machine learning models, deep learning models, and the state-of-the-art GPT-based model are used, and we compare their performances across different emoji encodings. Results show a consistent increase in model performances when emojis are converted to their descriptive phrases, and significant enhancements after refining the descriptive terms of the most important emojis before fitting them into the models. Our research shows that emojis are a valuable source for better understanding financial social media texts that cannot be omitted

    Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs

    Full text link
    Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion by collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new deep convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts. The constructed datasets are used to evaluate the presented model. Furthermore, some models, such as LSTM, CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext, Glove, and Word2vec, investigated our dataset and evaluated the results. Results: The results demonstrate the benefit of our dataset and the proposed model (72% accuracy), displaying meaningful improvement in sentiment classification performance

    Computational Sarcasm Analysis on Social Media: A Systematic Review

    Full text link
    Sarcasm can be defined as saying or writing the opposite of what one truly wants to express, usually to insult, irritate, or amuse someone. Because of the obscure nature of sarcasm in textual data, detecting it is difficult and of great interest to the sentiment analysis research community. Though the research in sarcasm detection spans more than a decade, some significant advancements have been made recently, including employing unsupervised pre-trained transformers in multimodal environments and integrating context to identify sarcasm. In this study, we aim to provide a brief overview of recent advancements and trends in computational sarcasm research for the English language. We describe relevant datasets, methodologies, trends, issues, challenges, and tasks relating to sarcasm that are beyond detection. Our study provides well-summarized tables of sarcasm datasets, sarcastic features and their extraction methods, and performance analysis of various approaches which can help researchers in related domains understand current state-of-the-art practices in sarcasm detection.Comment: 50 pages, 3 tables, Submitted to 'Data Mining and Knowledge Discovery' for possible publicatio