131 research outputs found
A Model to Measure the Spread Power of Rumors
Nowadays, a significant portion of daily interacted posts in social media are
infected by rumors. This study investigates the problem of rumor analysis in
different areas from other researches. It tackles the unaddressed problem
related to calculating the Spread Power of Rumor (SPR) for the first time and
seeks to examine the spread power as the function of multi-contextual features.
For this purpose, the theory of Allport and Postman will be adopted. In which
it claims that there are two key factors determinant to the spread power of
rumors, namely importance and ambiguity. The proposed Rumor Spread Power
Measurement Model (RSPMM) computes SPR by utilizing a textual-based approach,
which entails contextual features to compute the spread power of the rumors in
two categories: False Rumor (FR) and True Rumor (TR). Totally 51 contextual
features are introduced to measure SPR and their impact on classification are
investigated, then 42 features in two categories "importance" (28 features) and
"ambiguity" (14 features) are selected to compute SPR. The proposed RSPMM is
verified on two labelled datasets, which are collected from Twitter and
Telegram. The results show that (i) the proposed new features are effective and
efficient to discriminate between FRs and TRs. (ii) the proposed RSPMM approach
focused only on contextual features while existing techniques are based on
Structure and Content features, but RSPMM achieves considerably outstanding
results (F-measure=83%). (iii) The result of T-Test shows that SPR criteria can
significantly distinguish between FR and TR, besides it can be useful as a new
method to verify the trueness of rumors
Analysis and Extraction of Tempo-Spatial Events in an Efficient Archival CDN with Emphasis on Telegram
This paper presents an efficient archival framework for exploring and
tracking cyberspace large-scale data called Tempo-Spatial Content Delivery
Network (TS-CDN). Social media data streams are renewing in time and spatial
dimensions. Various types of websites and social networks (i.e., channels,
groups, pages, etc.) are considered spatial in cyberspace. Accurate analysis
entails encompassing the bulk of data. In TS-CDN by applying the hash function
on big data an efficient content delivery network is created. Using hash
function rebuffs data redundancy and leads to conclude unique data archive in
large-scale. This framework based on entered query allows for apparent
monitoring and exploring data in tempo-spatial dimension based on TF-IDF score.
Also by conformance from i18n standard, the Unicode problem has been dissolved.
For evaluation of TS-CDN framework, a dataset from Telegram news channels from
March 23, 2020 (1399-01-01), to September 21, 2020 (1399-06-31) on topics
including Coronavirus (COVID-19), vaccine, school reopening, flood, earthquake,
justice shares, petroleum, and quarantine exploited. By applying hash on
Telegram dataset in the mentioned time interval, a significant reduction in
media files such as 39.8% for videos (from 79.5 GB to 47.8 GB), and 10% for
images (from 4 GB to 3.6 GB) occurred. TS-CDN infrastructure in a web-based
approach has been presented as a service-oriented system. Experiments conducted
on enormous time series data, including different spatial dimensions (i.e.,
Khabare Fouri, Khabarhaye Fouri, Akhbare Rouze Iran, and Akhbare Rasmi Telegram
news channels), demonstrate the efficiency and applicability of the implemented
TS-CDN framework
Analysis of Information Spreading by Social Media Based on Emotion and Empathy
The number of social media users has increased exponentially in recent times, and various types of social media platforms are being introduced. While social media has become a convenient communication tool, its use has caused various social problems. Some users who cannot imagine the emotions their posts may induce in readers cause what is termed as “the flaming phenomenon.” In some cases, users intentionally repeat strong remarks for self-advertisement. To identify the cause of this phenomenon, it is necessary to analyze the posted contents or the personalities of the users who cause the flaming. However, it is difficult to reach a generalized conclusion because each case varies depending on the circumstances and individual. In this chapter, we study the phenomenon of information spreading via communication on social media by conducting a detailed analysis of replies and number of retweets in Japanese, and we reveal the relation between the feedback on such posts and the emotions or empathy they result in
A Tutorial on Event Detection using Social Media Data Analysis: Applications, Challenges, and Open Problems
In recent years, social media has become one of the most popular platforms
for communication. These platforms allow users to report real-world incidents
that might swiftly and widely circulate throughout the whole social network. A
social event is a real-world incident that is documented on social media.
Social gatherings could contain vital documentation of crisis scenarios.
Monitoring and analyzing this rich content can produce information that is
extraordinarily valuable and help people and organizations learn how to take
action. In this paper, a survey on the potential benefits and applications of
event detection with social media data analysis will be presented. Moreover,
the critical challenges and the fundamental tradeoffs in event detection will
be methodically investigated by monitoring social media stream. Then,
fundamental open questions and possible research directions will be introduced
An Exploratory Study of COVID-19 Misinformation on Twitter
During the COVID-19 pandemic, social media has become a home ground for
misinformation. To tackle this infodemic, scientific oversight, as well as a
better understanding by practitioners in crisis management, is needed. We have
conducted an exploratory study into the propagation, authors and content of
misinformation on Twitter around the topic of COVID-19 in order to gain early
insights. We have collected all tweets mentioned in the verdicts of
fact-checked claims related to COVID-19 by over 92 professional fact-checking
organisations between January and mid-July 2020 and share this corpus with the
community. This resulted in 1 500 tweets relating to 1 274 false and 276
partially false claims, respectively. Exploratory analysis of author accounts
revealed that the verified twitter handle(including Organisation/celebrity) are
also involved in either creating (new tweets) or spreading (retweet) the
misinformation. Additionally, we found that false claims propagate faster than
partially false claims. Compare to a background corpus of COVID-19 tweets,
tweets with misinformation are more often concerned with discrediting other
information on social media. Authors use less tentative language and appear to
be more driven by concerns of potential harm to others. Our results enable us
to suggest gaps in the current scientific coverage of the topic as well as
propose actions for authorities and social media users to counter
misinformation.Comment: 20 pages, nine figures, four tables. Submitted for peer review,
revision
Sentiment Analysis of Tweets using Unsupervised Learning Techniques and the K-Means Algorithm
Abstract: Today, web content such as images, text, speeches, and videos are user-generated, and social networks have become increasingly popular as a means for people to share their ideas and opinions. One of the most popular social media for expressing their feelings towards events that occur is Twitter. The main objective of this study is to classify and analyze the content of the affiliates of the Pension and Funds Administration (AFP) published on Twitter. This study incorporates machine learning techniques for data mining, cleaning, tokenization, exploratory analysis, classification, and sentiment analysis. To apply the study and examine the data, Twitter was used with the hashtag #afp, followed by descriptive and exploratory analysis, including metrics of the tweets. Finally, a content analysis was carried out, including word frequency calculation, lemmatization, and classification of words by sentiment, emotions, and word cloud. The study uses tweets published in the month of May 2022. Sentiment distribution was also performed in three polarity classes: positive, neutral, and negative, representing 22%, 4%, and 74% respectively. Supported by the unsupervised learning method and the K-Means algorithm, we were able to determine the number of clusters using the elbow method. Finally, the sentiment analysis and the clusters formed indicate that there is a very pronounced dispersion, the distances are not very similar, even though the data standardization work was carried out
Different valuable tools for Arabic sentiment analysis: a comparative evaluation
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain
- …