899 research outputs found
Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter
Over the past few years, online bullying and aggression have become
increasingly prominent, and manifested in many different forms on social media.
However, there is little work analyzing the characteristics of abusive users
and what distinguishes them from typical social media users. In this paper, we
start addressing this gap by analyzing tweets containing a great large amount
of abusiveness. We focus on a Twitter dataset revolving around the Gamergate
controversy, which led to many incidents of cyberbullying and cyberaggression
on various gaming and social media platforms. We study the properties of the
users tweeting about Gamergate, the content they post, and the differences in
their behavior compared to typical Twitter users.
We find that while their tweets are often seemingly about aggressive and
hateful subjects, "Gamergaters" do not exhibit common expressions of online
anger, and in fact primarily differ from typical users in that their tweets are
less joyful. They are also more engaged than typical Twitter users, which is an
indication as to how and why this controversy is still ongoing. Surprisingly,
we find that Gamergaters are less likely to be suspended by Twitter, thus we
analyze their properties to identify differences from typical users and what
may have led to their suspension. We perform an unsupervised machine learning
analysis to detect clusters of users who, though currently active, could be
considered for suspension since they exhibit similar behaviors with suspended
users. Finally, we confirm the usefulness of our analyzed features by emulating
the Twitter suspension mechanism with a supervised learning method, achieving
very good precision and recall.Comment: In 28th ACM Conference on Hypertext and Social Media (ACM HyperText
2017
Data Sets: Word Embeddings Learned from Tweets and General Data
A word embedding is a low-dimensional, dense and real- valued vector
representation of a word. Word embeddings have been used in many NLP tasks.
They are usually gener- ated from a large text corpus. The embedding of a word
cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and
have unique lexical and semantic features that are different from other types
of text. Therefore, it is necessary to have word embeddings learned
specifically from tweets. In this paper, we present ten word embedding data
sets. In addition to the data sets learned from just tweet data, we also built
embedding sets from the general data and the combination of tweets with the
general data. The general data consist of news articles, Wikipedia data and
other web data. These ten embedding models were learned from about 400 million
tweets and 7 billion words from the general text. In this paper, we also
present two experiments demonstrating how to use the data sets in some NLP
tasks, such as tweet sentiment analysis and tweet topic classification tasks
Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams
Online social media are complementing and in some cases replacing
person-to-person social interaction and redefining the diffusion of
information. In particular, microblogs have become crucial grounds on which
public relations, marketing, and political battles are fought. We introduce an
extensible framework that will enable the real-time analysis of meme diffusion
in social media by mining, visualizing, mapping, classifying, and modeling
massive streams of public microblogging events. We describe a Web service that
leverages this framework to track political memes in Twitter and help detect
astroturfing, smear campaigns, and other misinformation in the context of U.S.
political elections. We present some cases of abusive behaviors uncovered by
our service. Finally, we discuss promising preliminary results on the detection
of suspicious memes via supervised learning based on features extracted from
the topology of the diffusion networks, sentiment analysis, and crowdsourced
annotations
A Survey on Cybercrime Using Social Media
There is growing interest in automating crime detection and prevention for large populations as a result of the increased usage of social media for victimization and criminal activities. This area is frequently researched due to its potential for enabling criminals to reach a large audience. While several studies have investigated specific crimes on social media, a comprehensive review paper that examines all types of social media crimes, their similarities, and detection methods is still lacking. The identification of similarities among crimes and detection methods can facilitate knowledge and data transfer across domains. The goal of this study is to collect a library of social media crimes and establish their connections using a crime taxonomy. The survey also identifies publicly accessible datasets and offers areas for additional study in this area
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
STREAM-EVOLVING BOT DETECTION FRAMEWORK USING GRAPH-BASED AND FEATURE-BASED APPROACHES FOR IDENTIFYING SOCIAL BOTS ON TWITTER
This dissertation focuses on the problem of evolving social bots in online social networks, particularly Twitter. Such accounts spread misinformation and inflate social network content to mislead the masses. The main objective of this dissertation is to propose a stream-based evolving bot detection framework (SEBD), which was constructed using both graph- and feature-based models. It was built using Python, a real-time streaming engine (Apache Kafka version 3.2), and our pretrained model (bot multi-view graph attention network (Bot-MGAT)). The feature-based model was used to identify predictive features for bot detection and evaluate the SEBD predictions. The graph-based model was used to facilitate multiview graph attention networks (GATs) with fellowship links to build our framework for predicting account labels from streams. A probably approximately correct learning framework was applied to confirm the accuracy and confidence levels of SEBD.The results showed that the SEBD can effectively identify bots from streams and profile features are sufficient for detecting social bots. The pretrained Bot-MGAT model uses fellowship links to reveal hidden information that can aid in identifying bot accounts. The significant contributions of this study are the development of a stream based bot detection framework for detecting social bots based on a given hashtag and the proposal of a hybrid approach for feature selection to identify predictive features for identifying bot accounts. Our findings indicate that Twitter has a higher percentage of active bots than humans in hashtags. The results indicated that stream-based detection is more effective than offline detection by achieving accuracy score 96.9%. Finally, semi supervised learning (SSL) can solve the issue of labeled data in bot detection tasks
Enhanced Spam Detection System for Twitter Social Networking Platform
Twitter social site is one of the most popular Online Social Networking Site (OSN) used by popular people such as Ministers, businessman, large companies, actors to share their information. In this site, around 500 million of tweets are posted monthly by the total 313 million Twitter active users. The widespread of Twitter has drawn the interest of spammers. These malicious actors exploit the platform for various nefarious purposes, including monitoring authentic users, disseminating harmful software, and promoting their agendas through URLs embedded in tweets. They engage in tactics like secret following and unfollowing legitimate users, all with the intent of gathering sensitive information.To resolve this problem, a secure spam detection based on machine learning approach is designed. The designed used stop word removal, word to vector model to refined and dimensionally reduced the data. To enhance the quality of the data Cosine similarity is also been applied to measure the similarity score among the tweets and based upon that Artificial Neural Network (ANN) is trained. Later on, it is used to test the efficiency by examining the performance parameters in terms of precision, recall and F-measure. Also, the comparative analysis has been performed to present the efficiency of the work. The average precision, recall and F measure of proposed spam detection model of 0.9252, 0.6107 and 0.734 are obtained
지리적 거리 정보를 활용한 가짜 팔로워 구매자 식별 방법
학위논문 (박사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김종권.The reputation of social media such as Twitter, Facebook, and Instagram now regard as one persons power in real-world. The person who has more friends or followers can influence more individuals. So the influence of users is associated with the number of friends or followers. On the demand of increasing social power, an underground market has emerged where a customer can buy fake followers. The one who purchase fake followers acts vigorously in online social network. Thus, it is hard to distinguish customer from celebrity or cyberstar. Nevertheless, there are unique characteristics of legitimate users that customers or fake followers cannot manipulate such as a small-world property. The small-world property is mainly qualified by the shortest-path and clustering coefficient. In the small-world network, most people are linked by short chains. Existing work has largely focused on extracting relationship features such as indegree, outdegree, status, hub, or authority. Even though these research explored the relationship features to classify abnormal users of fake follower markets, research that utilize the small-world property to detect abnormal users is not studied.
In this work, we propose a model that adapt the small-world property. Specifically, we study the geographical distance for 1hop-directional links using nodes geographical location to verify whether a social graph has the small-world property or not. Motivated by the difference of distance ratio for 1hop directional links, we propose a method which is designed to generate 1hop link distance ratio and classify a node as a customer or not. Experimental results on real-world Twitter dataset demonstrates that the proposed method achieves higher performance than existing models.Chapter 1 Introduction 1
1.1 Motivations 1
1.2 Fake Follower Markets 3
1.3 Research Objectives 5
1.4 Contributions 6
1.5 Thesis Organization 8
Chapter 2 Related Work 10
2.1 Small World Phenomenon 10
2.2 Online Social Abusing Attack Detection 11
2.2.1 Contents-based Detection 12
2.2.2 Social Network-based Detection 13
2.2.3 Behavior-based Detection 5
Chapter 3 Characteristic of Customers and Fake Followers 16
3.1 Data Preparation 16
3.2 Fake Follower Properties 21
3.3 Customer Properties 26
Chapter 4 Social Relationship and Geographical Distance 29
4.1 Geographical Distance in OSNs 29
4.2 Follower Ratio 34
Chapter 5 Detecting Customers 38
5.1 Key Features for Customer Detection 38
5.2 Performance matrices 40
5.3 Experiments 41
5.4 Comparison with Baseline Method 44
5.5 Comparison with Feature-based Method 47
5.6 Impact of Balanced Dataset 49
5.7 Fake Follower Detection 50
Chapter 6 Future Work 52
6.1 The Absence of Location Information 52
6.2 Hybrid Detection Method with Link Ratio and Profile Information 54
Chapter 7 Conclusion 56
Bibliography 58
국문초록 69Docto
- …