    A Review on Opinion Mining: Approaches, Practices and Application

    Opinion Mining also known as Sentiment Analysis (SA) has recently become the focus of many researchers, because analysis of online text is useful and demanded in many different applications. Analysis of social sentiments is a trending topic in this era because users share their emotions in more suitable format with the help of micro blogging services like twitter. Twitter provides information about individual's real-time feelings through the data resources provided by persons. The essential task is to extract user's tweets and implement an analysis and survey. However, this extracted information can very helpful to make prediction about the user's opinion towards specific policies. The motive of this paper is to perform a survey on sentiment analysis algorithms that shows the utilizing of different ML and Lexicon investigation methodologies and their accuracy. Our paper also focuses on the three kinds of machine learning algorithms for Sentiment Analysis- Supervised, Unsupervised Algorithms

    Analysis of College by Using Data Mining and Security

    Since its been few years social media has captured the attention of the entire world as it is thundering fast in sending thoughts across globe, user friendly Opinion and reviews are the most critical factor in formulating views and influencing the success product or services. Though it is difficult to analyze these information based on opinions and reviews because of humongous or disorganized nature. With rapid growth in user of Social Media in recent years, the researcher get attracted towards the use of social media data for sentiments analysis of people or particular product or person or event

    A Supervised Approach for Sentiment Analysis using Skipgrams and its Application to Sentiment Visualisation in Social Media

    In this Ph.D. thesis we propose, as fundamental research, the design, development and evaluation of a supervised approach for sentiment analysis. This work is based on the hypothesis that an efficient use of the skipgram modelling can improve sentiment analysis tasks and reduce the resources they need. In summary, it consists on a supervised approach that uses machine learning techniques and skipgrams as information units, mainly focused on skipgram selection and filtering. This approach will be evaluated and compared to current state-of-the-art techniques. In addition, as applied research we propose a sentiment visualisation tool, strongly integrated with our sentiment analysis approach. This tool is oriented in the context of social media, measuring reputation and user interactions in real time.This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible" with grant reference PROMETEU/2018/089, and by the Spanish Government and FEDER through the project RTI2018-094653-B-C22: “Modelang: Modeling the behavior of digital entities by Human Language Technologies" (“LIVING-LANG: Living Digital Entities by Human Language Technologies")

    Using Sentiment Analysis to track reaction to the Global Game Jam Theme published in Proceedings of the International Conference on Game Jams, Hackathons, and Game Creation Events

    In this paper, we examine the Global Game Jam Theme and the reaction of the 'jammers' to the release. The Theme is one of the main drivers for creative aspect of the Game Jam, it sets the tone of the games that are developed at the Jam. This paper introduces an experiment which uses 'sentiment analysis' to gauge the positive or negative reaction to the theme over the last 7 years of the Global Game Jam. The results of this study show that the 2012 theme had the the highest sentiment. Finally, we suggest that the 'sentiment analysis' or 'context analysis' could be used to gather data sets for other studies such as development practices

    A survey on opinion summarization technique s for social media

    The volume of data on the social media is huge and even keeps increasing. The need for efficient processing of this extensive information resulted in increasing research interest in knowledge engineering tasks such as Opinion Summarization. This survey shows the current opinion summarization challenges for social media, then the necessary pre-summarization steps like preprocessing, features extraction, noise elimination, and handling of synonym features. Next, it covers the various approaches used in opinion summarization like Visualization, Abstractive, Aspect based, Query-focused, Real Time, Update Summarization, and highlight other Opinion Summarization approaches such as Contrastive, Concept-based, Community Detection, Domain Specific, Bilingual, Social Bookmarking, and Social Media Sampling. It covers the different datasets used in opinion summarization and future work suggested in each technique. Finally, it provides different ways for evaluating opinion summarization

    Aspect-Based Sentiment Analysis using Machine Learning and Deep Learning Approaches

    Sentiment analysis (SA) is also known as opinion mining, it is the process of gathering and analyzing people's opinions about a particular service, good, or company on websites like Twitter, Facebook, Instagram, LinkedIn, and blogs, among other places. This article covers a thorough analysis of SA and its levels. This manuscript's main focus is on aspect-based SA, which helps manufacturing organizations make better decisions by examining consumers' viewpoints and opinions of their products. The many approaches and methods used in aspect-based sentiment analysis are covered in this review study (ABSA). The features associated with the aspects were manually drawn out in traditional methods, which made it a time-consuming and error-prone operation. Nevertheless, these restrictions may be overcome as artificial intelligence develops. Therefore, to increase the effectiveness of ABSA, researchers are increasingly using AI-based machine learning (ML) and deep learning (DL) techniques. Additionally, certain recently released ABSA approaches based on ML and DL are examined, contrasted, and based on this research, gaps in both methodologies are discovered. At the conclusion of this study, the difficulties that current ABSA models encounter are also emphasized, along with suggestions that can be made to improve the efficacy and precision of ABSA systems

    Sentic Computing for Aspect-Based Opinion Summarization Using Multi-Head Attention with Feature Pooled Pointer Generator Network

    Neural sequence to sequence models have achieved superlative performance in summarizing text. But they tend to generate generic summaries that under-represent the opinion-sensitive aspects of the document. Additionally, the sequence to sequence models are prone to test-train discrepancy (exposure-bias) arising from the differential summary decoding processes in the training and testing phases. The models use ground truth summary words in the decoder training phase and predicted outputs in the testing phase. This inconsistency leads to error accumulation and substandard performance. To address these gaps, a cognitive aspect-based opinion summarizer, Feature Pooled Pointer Generator Network (FP2GN), is proposed which selectively attends to thematic and contextual cues to generate sentiment-aware review summaries. This study augments the pointer generator framework with opinion feature extraction, feature pooling, and mutual attention mechanism for opinion summarization. The proposed model FP2GN identifies the aspect terms in review text using sentic computing (SenticNet 5 and concept frequency-inverse opinion frequency) and statistical feature engineering. These aspect terms are encoded into context embeddings using weighted average feature pooling, which is processed in a pointer-generator framework inspired stacked Bi-LSTM encoder–decoder model with multi-head self-attention. The decoder system uses temporal and mutual attention mechanisms to ensure the appropriate representation of input-sequence. The study also proffers the use of teacher forcing ratio to curtail the exposure-bias-related error-accumulation. The model achieves ROUGE-1 score of 86.04% and ROUGE-L score of 88.51% on the Amazon Fine Foods dataset. An average gain of 2% over other methods is observed. The proposed model reinforces pointer generator network architecture with opinion feature extraction, feature pooling, and mutual attention mechanism to generate human-readable opinion summaries. Empirical analysis substantiates that the proposed model is better than the baseline opinion summarizers

    Data Summarization with Social Contexts

    While social data is being widely used in various applications such as sentiment analysis and trend prediction, its sheer size also presents great challenges for storing, sharing and processing such data. These challenges can be addressed by data summarization which transforms the original dataset into a smaller, yet still useful, subset. Existing methods find such subsets with objective functions based on data properties such as representativeness or informativeness but do not exploit social contexts, which are distinct characteristics of social data. Further, till date very little work has focused on topic preserving data summarization, despite the abundant work on topic modeling. This is a challenging task for two reasons. First, since topic model is based on latent variables, existing methods are not well-suited to capture latent topics. Second, it is difficult to find such social contexts that provide valuable information for building effective topic-preserving summarization model. To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset. We take Twitter data as a case study. Through analyzing Twitter data, we discover two social contexts which are important for topic generation and dissemination, namely (i) CrowdExp topic score that captures the influence of both the crowd and the expert users in Twitter and (ii) Retweet topic score that captures the influence of Twitter users' actions. We conduct extensive experiments on two real-world Twitter datasets using two applications. The experimental results show that, by leveraging social contexts, our proposed solution can enhance topic-preserving data summarization and improve application performance by up to 18%

    Perbaikan Kinerja Praproses Karakter Berulang Dalam Mengenali Kata Pada Klasifikasi Sentimen Berbahasa Indonesia

    Data yang relevan didapatkan melalui tahap praproses dengan menghilangkan noise agar data yang akan diolah sesuai dengan kebutuhan. Penghilangan noise tersebut dilakukan dengan menghapus karakter berulang, karena karakter ini sering dijumpai pada data twitter akibat kesalahan penulisan. Permasalahan akan muncul ketika memproses kata yang berulang, sehingga menyebabkan kata akan kehilangan makna dan tidak dapat diproses dengan baik. Penelitian ini bertujuan untuk melakukan modifikasi penghapusan karakter berulang dengan menambahkan pengukuran similarity dan mengukur tingkat kesamaan dengan kamus. Ada empat jenis pengulangan (kata baku mengandung pengulangan yang mengalami kesalahan pengulangan karakter lebih dari satu jenis, mengandung pengulangan yang tidak mengalami kesalahan pengulangan karakter, tidak mengandung pengulangan yang mengalami kesalahan pengulangan karakter, dan tidak mengandung pengulangan yang mengalami kesalahan pengulangan karakter lebih dari satu jenis) yang akan diselesaikan menggunakan modifikasi penghapusan karakter untuk meningkatkan kualitas hasil analisis sentiment menggunakan (SVM). Penelitian ini menggunakan tiga cara pengujian yaitu membandingkan tanpa, dengan, dan modifikasi penghapusan karakter berulang. Hasil pengujian menunjukkan bahwa modifikasi yang dilakukan menunjukan performa klasifikasi paling baik dengan nilai akurasi sebesar 74.46%, sedangkan dengan metode illicker menghasilkan nilai 71.71%, dan dengan metode jaccard menghasilkan nilai 68.04%. Modifikasi yang dilakukan memiliki peran yang signifikan dari aspek kesalahan makna dari kata, hasil terbaik dari modifikasi penghapusan karakter dengan kata dikenali sebesar 59%. Selain itu modifikasi yang dilakukan dapat meningkatkan kinerja pada tahap stemming dan stop words. Peningkatan kinerja stemming dibuktikan dengan jumlah kata yang dapat dikenali sebesar 682 kata. Di sisi lain peningkatan kinerja stop words dibuktikan dengan terdapat 86 kata yang dapat direduksi sehingga dapat menurunkan tingkat keberagaman kata yang memiliki arti dan maksud yang sama. ================================================================================================== Relevant data is obtained through the pre-process by removing the noise so that the data to be processed in accordance with the needs. Noise removal is done by deleting repetitive characters, as the characters are often encountered in twitter data due to errors. This study aims to analyze the relevant results of the pre-process removal of repeated characters in the Indonesian sentiment classification. This is obtained by modifying the removal of characters repeatedly to calculate the similarity to determine the level of similarity with the dictionary. There are four types of characters repetitions were analyzed using repetitive character removal modifications to improve the quality of sentiment results using Support Vector Machines (SVM). Three ways of testing are done to analyze the deletion of repetitive characters by comparing: without, with, and modification of repetitive character removal. The test results show that the modifications performed show the best classification performance with an accuracy of 74.46%, whereas with Illecker method produces a value of 71.71%, and Jaccard method produces a value of 68.04%. The modification performed has a significant role in the aspect of the meaning of the word, the best result of the character removal modification with a recognizable word of 59%. In addition, modifications made to improve performance at stemming and stop words. Improved stemming performance is evidenced by the number of words that can be recognized for 682 words. On the other hand improvement in performance of stop words is evidenced by 86 words that can be reduced so as to decrease the level of diversity of words that have the same meanin