    Streaming Big Data Analysis for Real-Time Sentiment based Targeted Advertising

    Big Data constituting from the information shared in the various social network sites have great relevance for research to be applied in diverse fields like marketing, politics, health or disaster management. Social network sites like Facebook and Twitter are now extensively used for conducting business, marketing products and services and collecting opinions and feedbacks regarding the same. Since data gathered from these sites regarding a product/brand are up-to-date and are mostly supplied voluntarily, it tends to be more realistic, massive and reflects the general public opinion. Its analysis on real time can lead to accurate insights and responding to the results sooner is undoubtedly advantageous than responding later.  In this paper, a cloud based system for real time targeted advertising based on tweet sentiment analysis is designed and implemented using the big data processing engine Apache Spark, utilizing its streaming library. Application is meant to promote cross selling and provide better customer support

    Internet rumor audience response prediction algorithm based on machine learning in big data environment

    Rumors are an important factor affecting social stability in some special times. Therefore, the dissemination and prevention and control mechanisms of rumors have always been issues of concern to the academic community and have long been highly valued and widely discussed by experts and scholars. However, in combination with the Internet as a new type of media, although people have begun to pay attention to online rumors, research on it is still relatively fragmented, especially in the cross-domain research specific to the social influence of online rumors, and there is no clear indication of online rumors. The specific definition also did not analyze in detail the internal connection between its influence and group behavior. Therefore, this article will combine actual cases to explore and analyze the spread and influence process of online rumors and show its social influence, hoping to enrich the research of online rumors. Nowadays, the Internet has become the most important carrier to reflect the public grievances. Internet users have expressed their opinions on hot issues such as enterprises, people’s livelihood, and government management, which has formed a powerful public opinion pressure, which has far exceeded the traditional media. The hidden dangers of security cannot be ignored. Therefore, how to monitor network public opinion from a large amount of network data is a difficult problem that needs to be solved urgently. Firstly, this consists of four modules: information collection, web page preprocessing, public opinion analysis, and public information report. Secondly, text clustering, the core technology of network public opinion, is optimized, and single-pass algorithm based on double threshold is proposed. Then the dual-threshold single-pass algorithm is optimized based on the MapReduce parallel computing model, and finally a network public opinion collection technology is formed under the background of big data. Simulation results can greatly improve the performance of text clustering and can effectively optimize the design using the parallel computing model based on MapReduce. The average miss rate after optimization is 0.7569 times, the average false alarm rate is 0.5556 times, and C det is 0.5714 times. It proves that the collection technology based on machine learning under the background of big data is effective and has good performance

    Evolutionary Multiobjective Feature Selection for Sentiment Analysis

    AuthorSentiment analysis is one of the prominent research areas in data mining and knowledge discovery, which has proven to be an effective technique for monitoring public opinion. The big data era with a high volume of data generated by a variety of sources has provided enhanced opportunities for utilizing sentiment analysis in various domains. In order to take best advantage of the high volume of data for accurate sentiment analysis, it is essential to clean the data before the analysis, as irrelevant or redundant data will hinder extracting valuable information. In this paper, we propose a hybrid feature selection algorithm to improve the performance of sentiment analysis tasks. Our proposed sentiment analysis approach builds a binary classification model based on two feature selection techniques: an entropy-based metric and an evolutionary algorithm. We have performed comprehensive experiments in two different domains using a benchmark dataset, Stanford Sentiment Treebank, and a real-world dataset we have created based on World Health Organization (WHO) public speeches regarding COVID-19. The proposed feature selection model is shown to achieve significant performance improvements in both datasets, increasing classification accuracy for all utilized machine learning and text representation technique combinations. Moreover, it achieves over 70% reduction in feature size, which provides efficiency in computation time and space

    Analisis Konstruksi Framing Berita Kekerasan Seksual Pada Media Cnnindonesia.Com Dan Kompas.Com (Edisi September-Oktober 2021)

    The mass media have a big role in distributing news to the public, therefore it is important for the mass media to package news appropriately, according to facts, as well as objective and professional. This research, which is a descriptive qualitative research, aims to explore the framing constructed by online mass media CNNIndonesia.com and Kompas.com on the topic of sexual violence regarding father abuse of children in Luwu Timur which was reported on October 7, 2021. The data selected were based on purposive techniques. sampling which is the first report so that it can analyze how the two media construct language on reporting on sexual violence with limited information. This study uses the framing theory of Zhongdang Pan and Gerald M. Kosicki. The results of this study indicate that the two media form a provocative impulse but use language that is in accordance with the rules of mass media. The two media also collided with two sources in the news who exchanged differences of opinion regarding the closed harassment case. Keywords: framing analysis, news text construction, sexual violence, mass media &nbsp

    Exploiting BERT and RoBERTa to Improve Performance for Aspect Based Sentiment Analysis

    Sentiment Analysis also known as opinion mining is a type of text research that analyses people’s opinions expressed in written language. Sentiment analysis brings together various research areas such as Natural Language Processing (NLP), Data Mining, and Text Mining, and is fast becoming of major importance to companies and organizations as it is started to incorporate online commerce data for analysis. Often the data on which sentiment analysis is performed will be reviews. The data can range from reviews of a small product to a big multinational corporation. The goal of performing sentiment analysis is to extract information from those reviews to gauge public opinion for market research, monitor brand and product reputation, and understand customer experiences. Reviews written on the online platform are often in the form of free text and they do not have any standard structure. Dealing with unstructured data is a challenging problem. Sentiment analysis can be done at different levels, and the focus of this research is on aspect-level sentiment analysis. In aspect-level sentiment analysis, there are two tasks that need to be addressed. The first task is aspect identification which is the process of discovering those attributes of the object that people are commenting on. These attributes of the object are called aspects. The second task is the sentiment classification of those reviews using these extracted aspects. For the sentiment analysis, transformer-based pre-trained models such as BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (A robustly optimized BERT) are used in this research as they make use of embedding vector space that is rich in context. The purpose of this research is to propose a framework for extracting the aspects from the data which can be applied to these pre-trained models. For the first part of the experiment, both the BERT and RoBERTa models are developed without the aspect-based approach. For the second part of the experiment, the aspect-based approach is applied to the same models and their results are compared and evaluated against the equivalent models. The experiment results show that aspect-based approach has increased the performance of the models by almost 1% than the traditional models and the BERT model with the aspect-based approach had the highest accuracy and performance among all the models evaluated in this research.

    Behavior Analysis and Recognition of Hidden Populations in Online Social Network Based on Big Data Method

    Hidden populations refer to the minority groups that not well-known to the public. Traditional statistical survey methods are difficult to apply in the study of hidden populations because of that the hidden populations individuals are very troublesome to be found and they are not willing to share the inner opinion with the others. On the other hand, with the development of the Web 2.0, the hidden populations gather and share their views in online social networks due to the openness and anonymity of the Internet. So, this paper analyzes the behavioral characteristics of the hidden populations based on their data in online social networks. This paper uses the lesbian population as an example and analyzes the behavioral characteristics of lesbian by analyzing the data of the lesbian population in Douban Group. First, the activity data on lesbian are collected from Douban Group. Second, behavior characteristics of lesbian are analysed, the regional characteristic, temporal characteristic and text characteristic are mined out by big data method. Third, a lesbian recognition model is proposed based on the above analytical characteristics, and the effectiveness of the recognition model is varified by experiment study. The research of this paper is helpful to understand the behavioral characteristics of hidden populations deeply, and provides decision-making basis of management and service for hidden populations

    Teknologi Opinion Mining untuk Mendukung Strategic Planning

    Banjir data di era Big Data sudah tidak bisa terelakkan lagi. Termasuk di dalamnya data yang sangat melimpah di media sosial daring. Peluang inilah yang ditangkap sebagai alasan utama pada penelitian ini. Opinion mining sebagai salah satu teknologi dalam mengolah data teks untuk memperoleh arah informasi dari komentar/opini masyarakat. Mengambil obyek penelitian UIN Sunan Ampel Surabaya, penelitian ini bertujuan untuk menganalisis opini masyarakat tentang kampus Islam terbesar di Surabaya. Sehingga bisa menjadi pendukung keputusan bagi pihak manajemen untuk merumuskan perencanaan strategis terwujudnya visi World Class University. Penelitian ini menggunakan 4009 data sampel berbahasa Indonesia yang diambil dari opini masyarakat di media sosial Twitter dalam kurun waktu dua tahun terakhir (2017 – 2018). Dari 4009 data dihasilkan 31837 jenis kata setelah melalui proses stop-word removal. Berdasarkan analisis sentiment menggunakan pendekatan Vader dan Liu yang divisualisasikan melalui grafik K-Means, dihasilkan bahwa opini publik terhadap UIN Sunan Ampel mengarah pada sentimen ’netral’ sebesar 97,54%, sedangkan sentiment positif =2,16%, dan sentiment negatif = 0,34%. Hasil tersebut membuktikan bahwa Information Capital tentang UIN Sunan Ampel perlu diperkuat menuju nilai “positif”. Sehingga diperlukan upaya maksimal untuk membangun innovation and commercially supremacy, perception (public relation) dan scalability strategies supaya internal operation bisa handal untuk ketercapaian visi misi UIN Sunan Ampel Surabaya. AbstractData deluge in Big Data era is inevitable, this including a very abundant data in online social media. This phenomenon  was chosen as the main background reason in this research. Opinion mining is as one of the technologies in processing text data to obtain information direction from public comments/opinions. Taking the object of research at Sunan Ampel Islamic State University Surabaya, this study aims to analyze public community opinion toward the biggest Islamic campus in Surabaya. Hopefully,  it would be beneficial as decisional support for management in formulating strategic planning to manifest the World Class University vision. This study uses 4009 Indonesian language sample data taken from public opinion on Twitter social media in the past two years (2017 - 2018). Out from 4009 data, 31837 types of words are obtained after going through a stop-word removal process. Based on sentiment analysis by Vader and Liu’s approach which was visualized by K-Means graphs, the finding was that 97,54% of public opinion toward Sunan Ampel Islamic State University Surabaya led to a 'neutral' sentiment, while positive = 2,16% and negative=0,34%. These results prove that Information Capital about Sunan Ampel UIN needs to be strengthened towards "positive" image. For this reason, maximum effort is needed to build innovation and commercialization of supremacy, perception (public relations) and scalability strategies so that internal operations can be reliable in achieving the vision of Sunan Ampel Islamic State University Surabaya

    Real Time Sentiment Change Detection of Twitter Data Streams

    In the past few years, there has been a huge growth in Twitter sentiment analysis having already provided a fair amount of research on sentiment detection of public opinion among Twitter users. Given the fact that Twitter messages are generated constantly with dizzying rates, a huge volume of streaming data is created, thus there is an imperative need for accurate methods for knowledge discovery and mining of this information. Although there exists a plethora of twitter sentiment analysis methods in the recent literature, the researchers have shifted to real-time sentiment identification on twitter streaming data, as expected. A major challenge is to deal with the Big Data challenges arising in Twitter streaming applications concerning both Volume and Velocity. Under this perspective, in this paper, a methodological approach based on open source tools is provided for real-time detection of changes in sentiment that is ultra efficient with respect to both memory consumption and computational cost. This is achieved by iteratively collecting tweets in real time and discarding them immediately after their process. For this purpose, we employ the Lexicon approach for sentiment characterizations, while change detection is achieved through appropriate control charts that do not require historical information. We believe that the proposed methodology provides the trigger for a potential large-scale monitoring of threads in an attempt to discover fake news spread or propaganda efforts in their early stages. Our experimental real-time analysis based on a recent hashtag provides evidence that the proposed approach can detect meaningful sentiment changes across a hashtags lifetime

    AAPOR Report on Big Data

    In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges
