49 research outputs found

    A powerful comparison of deep learning frameworks for Arabic sentiment analysis

    Get PDF
    Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain

    Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

    Get PDF
    Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset

    Multidimensional opinion mining from social data

    Get PDF
    Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This thesis focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm, and irony, from user-generated content represented across multiple social media platforms and in various media formats, like textual, visual, and audio. Mining people’s social opinions from social sources, such as social media platforms and newswires commenting sections, is a valuable business asset that can be utilised in many ways and in multiple domains, such as Politics, Finance, and Government. The main objective of this research is to investigate how a multidimensional approach to Social Opinion Mining affects fine-grained opinion search and summarisation at an aspect-based level and whether such a multidimensional approach outperforms single dimension approaches in the context of an extrinsic human evaluation conducted in a real-world context: the Malta Government Budget, where five social opinion dimensions are taken into consideration, namely subjectivity, sentiment polarity, emotion, irony, and sarcasm. This human evaluation determines whether the multidimensional opinion summarisation results provide added-value to potential end-users, such as policy-makers and decision-takers, thereby providing a nuanced voice to the general public on their social opinions on topics of a national importance. Results obtained indicate that a more fine-grained aspect-based opinion summary based on the combined dimensions of subjectivity, sentiment polarity, emotion, and sarcasm or irony is more informative and more useful than one based on sentiment polarity only. This research contributes towards the advancement of intelligent search and information retrieval from social data and impacts entities utilising Social Opinion Mining results towards effective policy formulation, policy-making, decision-making, and decision-taking at a strategic level

    Social media analytics with applications in disaster management and COVID-19 events

    Get PDF
    Social media such as Twitter offers a tremendous amount of data throughout an event or a disastrous situation. Leveraging social media data during a disaster is beneficial for effective and efficient disaster management. Information extraction, trend identification, and determining public reactions might help in the future disaster or even avert such an event. However, during a disaster situation, a robust system is required that can be deployed faster and process relevant information with satisfactory performance in real-time. This work outlines the research contributions toward developing such an effective system for disaster management, where it is paramount to develop automated machine-enabled methods that can provide appropriate tags or labels for further analysis for timely situation-awareness. In that direction, this work proposes machine learning models to identify the people who are seeking assistance using social media during a disaster and further demonstrates a prototype application that can collect and process Twitter data in real-time, identify the stranded people, and create rescue scheduling. In addition, to understand the people’s reactions to different trending topics, this work proposes a unique auxiliary feature-based deep learning model with adversarial sample generation for emotion detection using tweets related to COVID-19. This work also presents a custom Q&A-based RoBERTa model for extracting related phrases for emotions. Finally, with the aim of polarization detection, this research work proposes a deep learning pipeline for political ideology detection leveraging the tweet texts and the expressed emotions in the text. This work also studies and conducts the historical emotion and polarization analysis of the COVID-19 pandemic in the USA and several individual states using tweeter data --Abstract, page iv

    A survey on opinion summarization technique s for social media

    Get PDF
    The volume of data on the social media is huge and even keeps increasing. The need for efficient processing of this extensive information resulted in increasing research interest in knowledge engineering tasks such as Opinion Summarization. This survey shows the current opinion summarization challenges for social media, then the necessary pre-summarization steps like preprocessing, features extraction, noise elimination, and handling of synonym features. Next, it covers the various approaches used in opinion summarization like Visualization, Abstractive, Aspect based, Query-focused, Real Time, Update Summarization, and highlight other Opinion Summarization approaches such as Contrastive, Concept-based, Community Detection, Domain Specific, Bilingual, Social Bookmarking, and Social Media Sampling. It covers the different datasets used in opinion summarization and future work suggested in each technique. Finally, it provides different ways for evaluating opinion summarization

    An Evaluation of Text Representation Techniques for Fake News Detection Using: TF-IDF, Word Embeddings, Sentence Embeddings with Linear Support Vector Machine.

    Get PDF
    In a world where anybody can share their views, opinions and make it sound like these are facts about the current situation of the world, Fake News poses a huge threat especially to the reputation of people with high stature and to organizations. In the political world, this could lead to opposition parties making use of this opportunity to gain popularity in their elections. In the medical world, a fake scandalous message about a medicine giving side effects, hospital treatment gone wrong or even a false message against a practicing doctor could become a big menace to everyone involved in that news. In the world of business, one false news becoming a trending topic could definitely disrupt their future business earnings. The detection of such false news becomes very important in today’s world, where almost everyone has an access to use a mobile phone and can cause enough disruption by creating one false statement and making it a viral hit. Generation of fake news articles gathered more attention during the US Presidential Elections in 2016, leading to a high number of scientists and researchers to explore this NLP problem with deep interest and a sense of urgency too. This research intends to develop and compare a Fake News classifier using Linear Support Vector Machine Classifier built on traditional text feature representation technique Term Frequency Inverse Document Frequency (Ahmed, Traore & Saad, 2017), against a classifier built on the latest developments for text feature representations such as: word embeddings using ‘word2vec’ and sentence embeddings using ‘Universal Sentence Encoder’
    corecore